Re: Disks in cluster erroneously reported as full by VMS

Philip Howes · ‎10-10-2007

HI,
We've experienced this problem on 2 different systems. Our 2 node VMS cluster (8.3 running on an Itanium rx4640) reports that several of the shadowed disks are full. i.e. a 'show dev d' says that dsa103, dsa105, dsa107 have no free blocks. Each node has an MSA1000 attached, and the disks are shadowed between them. However, we know they are not full. When we count up the file sizes of files on the individual disks, only small ammounts of the disks are used. A reboot resolves the issue. Luckily, this has only happened on our test and post test systems, not live. (as yet). Any thoughts?
Thanks.

Karl Rohwedder · ‎10-10-2007

Had some nodes crashed or were some disks unproperly dismounted? VMS marks a specific amount of disk space as occupied and puts these blocks in its extend cache. If not dismounted propely VMS has no chance to mark these blocks as free. A 'set volume/rebuild' should fix this.

regards Kalle

Andy Bustamante · ‎10-10-2007

Just to clarify >>>Each node has an MSA1000 attached, and the disks are shadowed between them.

Does this mean each system is only connected to 1 MSA1000 or do both systems connect to each MSA.

In other words are you mixing shadowing and MSCP disk serving? That isn't a supported configuration.

Andy

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net

John Gillings · ‎10-10-2007

Philip,

The number displayed by SHOW DEVICE as the free blocks on the disk is just a number, and it can drift. It plays no part in allocation, so if you attempt to allocate space, and there is space available, the allocation will work regardless of what SHOW DEVICE says.

$ SET VOLUME/REBUILD

should bring it back into line, if that fails, try
$ SET VOLUME/REBUILD=FORCE

If that still doesn't work, you may have "lost" files, which can be recovered with ANALYZE/DISK/REPAIR.

Note that DIRECTORY doesn't necessarily give an accurate result for disk consumption (try it on a system disk!). It can over or under estimate, depending on how you phrase the command. If you are counting up files, make sure you use the allocated size DIRECTORY/SIZE=ALL or F$FILE item ALQ.

Also note that the concept of "number of free blocks on a disk", is not as simple as it might first appear. There are legitimate reasons why multiple nodes in a cluster might correctly report different values for free space, and, in a multi user environment, the values can vary wildly from instant to instant. You should therefore avoid code with logic like:

IF sufficient-disk-space THEN
do something
ELSE
handle error
ENDIF

because it will suffer from both false positives and false negatives due to the finite time between sampling and acting. Instead it should be coded as:

do something
IF error
THEN
handle error
ENDIF

A crucible of informative mistakes

Hein van den Heuvel · ‎10-10-2007

>> Our 2 node VMS cluster (8.3 running on an Itanium rx4640) reports that several of the shadowed disks are full.

Sounds like it is just a 'visual' problem.
Annoying, but just a number in a report.
Is there anything actually going wrong?
Failures to create or extent files?

>> A reboot resolves the issue.

Now there's a whopping big hammer!
PLEASE consider a more fine grained approach should such problems come back.
PLEASE try a simple dismount and (re)mount sequence to see if that fixes it.

Andy suggests that you might have create an unsupported setup. That may be the case.

John has a fine reply, as always. Read it carefully!

>>> Any thoughts?

1) Sounds like a bug, but could be an unsupported configuration. Check with the fine folks as OpenVMS support?

2) VERIFY whether there is a real problem or just very annoying, misleading, information.

3) NEVER reboot an OpenVMS server for such 'problem'.

Cheers,
Hein.

Philip Howes · ‎10-10-2007

Thanks for all your replies - there is some interesting stuff to investigate, and some good advice. We noticed the problem when our application failed, as it couldn't write to a disk. So the problem 'appeared' to be real rather than a display issue. But I'll look into that.
I took a load of logs and screen dumps before we rebooted, but i had to see whether the problem would would continue after the reboot- which is why i did that.
I've raised a call with HP, so I've got the usual weeks of log exchanges to look forward to. Will let you know how we get on.

Philip Howes · ‎10-10-2007

I've checked our MSCP_LOAD value and this is set to '1'. We also use Volume Shadowing between the MSA boxes (each node has a connection to each MSA). Is this wrong, as suggested in an earlier post?

Thanks,
Phil

Zeni B. Schleter · ‎10-11-2007

Just curious. Before the reboot, you did do a /Siz=all when you totaled up the file sizes, right?

Philip Howes · ‎10-11-2007

From memory we used;
e.g. 'dir dsa103:[000000...]/size/grand'

Zeni B. Schleter · ‎10-11-2007

We have had very large scratch files created that used LOTS of space but until the scratch files are closed you don't see what they are using with the /SIZ . You need /SIZ=ALL to see what is allocated.

We had two batch jobs that occasionally walked on each other. I created a batch job to look at new files with the /SIZ=all so that I finally got an idea of what kind of space was being used in a very dynamic fashion. If the jobs in your case are detached and don't exit til shutdown , they may clean up behind themselves and you will not see what was there after the reboot.

All when I was checking I think I used the modified date so that if the file was one that existed but was growing, that it would be reported,too.

Jan van den Ende · ‎10-11-2007

@Zeni,

>>>
All when I was checking I think I used the modified date so that if the file was one that existed but was growing, that it would be reported,too.
<<<

Unlikely, as the Modified Date only gets updated when the file is closed.

Permanently open files will never update it.

Proost.

Have one on me.

jpe

Don't rust yours pelled jacker to fine doll missed aches.

Zeni B. Schleter · ‎10-11-2007

@Jan

Going from memory in my suggestion. After the fact, I knew it was open scratch files with huge allocations from sorting. Before finding the problem , I was looking for any possible growth. I have seen uses where empty files are copied and appended to and deleted. Original date is the template creation and the modification reflects its update.

Robert Brooks_1 · ‎10-11-2007

I've checked our MSCP_LOAD value and this is set to '1'. We also use Volume Shadowing between the MSA boxes (each node has a connection to each MSA). Is this wrong, as suggested in an earlier post?

--

That looks fine to me; I have no idea what the previous poster who implied there might be a problem was thinking.

Mixing Shadowing and MSCP-serving is certainly supported! One can obviously shadow a served member unit.

What does not work is the MSCP-serving of the virtual unit (the "DS" device). The notion that it's not supported is technically true, but it flat out will not work, so it's absolutely impossible to get oneself into that type of supported configuration.

Perhaps the posted with the concern was thinking of a different problem?

-- Rob (who has spent a fair amount of time inside both SHDRIVER and DUDRIVER).

Jon Pinkley · ‎10-11-2007

>>>I've checked our MSCP_LOAD value and this is set to '1'. We also use Volume Shadowing between the MSA boxes (each node has a connection to each MSA). Is this wrong, as suggested in an earlier post?<<<<

That is fine. I am not sure what Andy Bustamante was referring when he stated "In other words are you mixing shadowing and MSCP disk serving? That isn't a supported configuration."

You cannot MSCP serve the DSA virtual units, but creating DSA virtual units from MSCP served memebers is supported. In your case, it will create a backup path in case all direct fibre paths to the node fail, as in the following:

$ sho dev/ful dga6902

I/O paths to device 3
Path PGA0.5000-1FE1-500B-89BD (SIGMA), primary path, current path.
Error count 0 Operations completed 624
Path PGA0.5000-1FE1-500B-89B9 (SIGMA).
Error count 0 Operations completed 352
Path MSCP (OMEGA).
Error count 0 Operations completed 0

$

Jon

it depends

Andy Bustamante · ‎10-11-2007

Robert Brooks is correct and I was in error. I've been making an assumption concerning shadowing and MSCP serving. One of the reasons I'm hear is to continue learning.

Andy Bustamante

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net

Wim Van den Wyngaert · ‎10-11-2007

Something like this done by your applic ?

WSYS01/MGRWVW>sh dev sd

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt

WSYS01$DKA0: Mounted 0 WSYS01_SYST 600472 502 1
WSYS01/MGRWVW>copy nl: sd:[000000]wim.lis/alloc=10000
WSYS01/MGRWVW>sh dev sd

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt

WSYS01$DKA0: Mounted 0 WSYS01_SYST 590552 497 1
WSYS01/MGRWVW>open/read/shared x sd:[000000]wim.lis
WSYS01/MGRWVW>del sd:[000000]wim.lis;
DELETE SD:[000000]WIM.LIS;1 ? [N]: y
%DELETE-I-FILDEL, SD:[000000]WIM.LIS;1 deleted (10000 blocks)
WSYS01/MGRWVW>sh dev sd

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt

WSYS01$DKA0: Mounted 0 WSYS01_SYST 590552 500 1
WSYS01/MGRWVW>close x
WSYS01/MGRWVW>sh dev sd

Device Device Error Volume Free Trans Mnt
Name Status Count Label Blocks Count Cnt

WSYS01$DKA0: Mounted 0 WSYS01_SYST 600552 499 1

Wim

Wim

Wim Van den Wyngaert · ‎10-11-2007

WWhat I wanted to say is that your application may have files in use that are already deleted and thus do not show up in dir.

Wim

Wim

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Disks in cluster erroneously reported as full by VMS

Disks in cluster erroneously reported as full by VMS