Operating System - OpenVMS
1752505 Members
5356 Online
108788 Solutions
New Discussion юеВ

Re: Disks in cluster erroneously reported as full by VMS

 
Philip Howes
Occasional Advisor

Disks in cluster erroneously reported as full by VMS

HI,
We've experienced this problem on 2 different systems. Our 2 node VMS cluster (8.3 running on an Itanium rx4640) reports that several of the shadowed disks are full. i.e. a 'show dev d' says that dsa103, dsa105, dsa107 have no free blocks. Each node has an MSA1000 attached, and the disks are shadowed between them. However, we know they are not full. When we count up the file sizes of files on the individual disks, only small ammounts of the disks are used. A reboot resolves the issue. Luckily, this has only happened on our test and post test systems, not live. (as yet). Any thoughts?
Thanks.
16 REPLIES 16
Karl Rohwedder
Honored Contributor

Re: Disks in cluster erroneously reported as full by VMS

Had some nodes crashed or were some disks unproperly dismounted? VMS marks a specific amount of disk space as occupied and puts these blocks in its extend cache. If not dismounted propely VMS has no chance to mark these blocks as free. A 'set volume/rebuild' should fix this.

regards Kalle
Andy Bustamante
Honored Contributor

Re: Disks in cluster erroneously reported as full by VMS

Just to clarify >>>Each node has an MSA1000 attached, and the disks are shadowed between them.

Does this mean each system is only connected to 1 MSA1000 or do both systems connect to each MSA.

In other words are you mixing shadowing and MSCP disk serving? That isn't a supported configuration.

Andy
If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
John Gillings
Honored Contributor

Re: Disks in cluster erroneously reported as full by VMS

Philip,

The number displayed by SHOW DEVICE as the free blocks on the disk is just a number, and it can drift. It plays no part in allocation, so if you attempt to allocate space, and there is space available, the allocation will work regardless of what SHOW DEVICE says.

$ SET VOLUME/REBUILD

should bring it back into line, if that fails, try
$ SET VOLUME/REBUILD=FORCE

If that still doesn't work, you may have "lost" files, which can be recovered with ANALYZE/DISK/REPAIR.

Note that DIRECTORY doesn't necessarily give an accurate result for disk consumption (try it on a system disk!). It can over or under estimate, depending on how you phrase the command. If you are counting up files, make sure you use the allocated size DIRECTORY/SIZE=ALL or F$FILE item ALQ.

Also note that the concept of "number of free blocks on a disk", is not as simple as it might first appear. There are legitimate reasons why multiple nodes in a cluster might correctly report different values for free space, and, in a multi user environment, the values can vary wildly from instant to instant. You should therefore avoid code with logic like:

IF sufficient-disk-space THEN
do something
ELSE
handle error
ENDIF

because it will suffer from both false positives and false negatives due to the finite time between sampling and acting. Instead it should be coded as:

do something
IF error
THEN
handle error
ENDIF
A crucible of informative mistakes
Hein van den Heuvel
Honored Contributor

Re: Disks in cluster erroneously reported as full by VMS

>> Our 2 node VMS cluster (8.3 running on an Itanium rx4640) reports that several of the shadowed disks are full.

Sounds like it is just a 'visual' problem.
Annoying, but just a number in a report.
Is there anything actually going wrong?
Failures to create or extent files?


>> A reboot resolves the issue.

Now there's a whopping big hammer!
PLEASE consider a more fine grained approach should such problems come back.
PLEASE try a simple dismount and (re)mount sequence to see if that fixes it.

Andy suggests that you might have create an unsupported setup. That may be the case.

John has a fine reply, as always. Read it carefully!

>>> Any thoughts?

1) Sounds like a bug, but could be an unsupported configuration. Check with the fine folks as OpenVMS support?

2) VERIFY whether there is a real problem or just very annoying, misleading, information.

3) NEVER reboot an OpenVMS server for such 'problem'.

Cheers,
Hein.


Philip Howes
Occasional Advisor

Re: Disks in cluster erroneously reported as full by VMS

Thanks for all your replies - there is some interesting stuff to investigate, and some good advice. We noticed the problem when our application failed, as it couldn't write to a disk. So the problem 'appeared' to be real rather than a display issue. But I'll look into that.
I took a load of logs and screen dumps before we rebooted, but i had to see whether the problem would would continue after the reboot- which is why i did that.
I've raised a call with HP, so I've got the usual weeks of log exchanges to look forward to. Will let you know how we get on.
Philip Howes
Occasional Advisor

Re: Disks in cluster erroneously reported as full by VMS

I've checked our MSCP_LOAD value and this is set to '1'. We also use Volume Shadowing between the MSA boxes (each node has a connection to each MSA). Is this wrong, as suggested in an earlier post?

Thanks,
Phil
Zeni B. Schleter
Regular Advisor

Re: Disks in cluster erroneously reported as full by VMS

Just curious. Before the reboot, you did do a /Siz=all when you totaled up the file sizes, right?

Philip Howes
Occasional Advisor

Re: Disks in cluster erroneously reported as full by VMS

From memory we used;
e.g. 'dir dsa103:[000000...]/size/grand'
Zeni B. Schleter
Regular Advisor

Re: Disks in cluster erroneously reported as full by VMS

We have had very large scratch files created that used LOTS of space but until the scratch files are closed you don't see what they are using with the /SIZ . You need /SIZ=ALL to see what is allocated.

We had two batch jobs that occasionally walked on each other. I created a batch job to look at new files with the /SIZ=all so that I finally got an idea of what kind of space was being used in a very dynamic fashion. If the jobs in your case are detached and don't exit til shutdown , they may clean up behind themselves and you will not see what was there after the reboot.

All when I was checking I think I used the modified date so that if the file was one that existed but was growing, that it would be reported,too.