Operating System - OpenVMS
1748109 Members
4691 Online
108758 Solutions
New Discussion

Purge/log disk1:[000000...] hangs.

 
SOLVED
Go to solution
tobyjug
Advisor

Purge/log disk1:[000000...] hangs.

I have an ES45 running OpenVMS v7.3-2 (cannot be upgraded due to the installed application) with SAN attached disks. After the nightly backup a cleanup job runs. It now hungs on the above command. There have been no application or OpenVMS changes resently. There are no HW errors logged & nothing gets written to the log files.

Purge/log disk1:[*...] run interactively also hangs.  Dir disk1:[*...] gets to a particular sub-directory and also hangs.

If I set def to the sub-directory & do a dir it works, The directory has .TXT files (used by the app) & I can type the last file listed by the dir disk1:[*...] I can also type the next files that should have been listed.

The application has a lot of files open on this disk, if I have to run an **bleep**/disk do I need to stop the app so no files are open ?

What are the chances of losing data files if I do a /repair ?

Note: The application has not been affected, at least the users have not reported one.

13 REPLIES 13
John Gillings
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

Sounds like a broken directory. DUMP/DIRECTORY might reveal something.

 

Are the hung processes looping or waiting? How do you get out of the hang?

 

ANALYZE/DISK/REPAIR is very unlikely to lose data. Remember directories are just files which point to other files. You can delete them without deleting the contained files. They'll become "lost" and can be recovered with ANALYZE/DISK

 

A few options... 

 

Try RENAMEing files out of the bad directory into another one. Most likely *.* will hang just like DIR and PURGE, but you may be able to take chunks like A*.*, B*.*, etc...

 

If that moves everything, delete the old directory and RENAME the new one back to the old name.

 

If you can't RENAME anything, or end up with files that can't be moved, DELETE the bad directory. You'll need to set protection to D and SET FILE/NODIRECTORY for the DELETE to work. You can then to an ANALYZE/DISK/REPAIR to recover the lost files into [SYSLOST] and RENAME [000000]SYSLOST.DIR to the name of the bad directory, then SET DIRECTORY/OWNER to the correct owner.

 

(note that to the prudish HP "naughty word" filter doesn't like the ANALYZE command to be contracted - lucky they don't host a protology forum) 

A crucible of informative mistakes
John Gillings
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

Oh, and also note that you can safely RENAME open files. To do this live, RENAME the bad directory and immediately create the new directory with the old name. There will be a few timing windows, but unless the file turnover is very high, the application probably won't notice.

A crucible of informative mistakes
Andy Bustamante
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

In addition to "what John said" above. 

 

You state that the directory command exectueswhen you SET DEFAULT to the directory that hangs on purge.  What happens if you you execute the purge in just that directory?  How many files are in this directory?  OpenVMS may have slow performance in deleting files out of a very large directory, especially if an application is creating files at the same time. If you execute DIR/SIZ=ALL hanging_directory.dir what is result?

 

 

 

 

 

 

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
tobyjug
Advisor

Re: Purge/log disk1:[000000...] hangs.

Thanks John.

Sorry about the analyze thing. I wasn't thinking.

hung processes are looping. <CTL> Y or deleting the batch job will kill them without a problem.

 

I'll  give you your recommendations a go & get back to you. I'll have to arrange an outage for the application.

tobyjug
Advisor

Re: Purge/log disk1:[000000...] hangs.

Hi Andy,

 

60 files 19438 block in the dodgy directory. No files in this directory require purging.

dir/siz=all comes back with "dodgydirectory.dir  4/19 " "

Total of 1 file, 4/19 blocks."

Bob Blunt
Respected Contributor

Re: Purge/log disk1:[000000...] hangs.

Even though it is no longer being created within HP I'm surprised that noone has suggested DFU to help figure out what might be causing (and possibly giving you a tool to fix) your problem.

 

http://www.digiater.nl/dfu.html

 

Can't really add anything to the recommendations that you've already seen with the exception of the above.  If your process is looping then I'd suspect that it was something directory-oriented as well.  DFU can figure that out for you.  If the directory file is overly large (not an uncommon problem, especially with application-generated text or logfiles) then that can cause similar-looking delays but the directory would have to be REALLY big, in excess of 1200 VMS blocks unless it really got scrozzled.

 

bob

Hoff
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

I've seen directory contention trigger this, I've seen poor application designs trigger this, and disk errors, and wacky disk firmware bugs, controller-level errors, and various other triggers.  All odd stuff, and all usually rare, but stuff does happen.

 

Is this directory active from other applications?  References to batch jobs implies that's the case, and one of those processes holding a directory lock at a low priority can slam access.  (If so, shut that stuff off and see if the PURGE goes through.)  

 

Is this a directory filled with log files?

 

Are there any I/O or disk errors being logged?

 

Recreate the directory.

 

How recent and how complete are your backups?  If you're not sure of that, BACKUP /IMAGE the disk out to a saveset, and back in.

 

The other detail to look at is the locking activity when the hang arises; AMDS or ANALYZE /SYSTEM locking-related commands.

 

The performance on big directories is vastly better with recent OpenVMS releases, but still trailing competitive implementations.  But the classic directory size problems should not hit a directory with that few blocks allocated.

tobyjug
Advisor

Re: Purge/log disk1:[000000...] hangs.

Found out the files in this directory are "control files" for want of a better term. They are only updated if we change reference codes, user access levels or add remove users from the app.

 

I made sure none of the files were open. Created newdir.dir - copied all files from origdir.dir to newdir.dir

Renamed origdir.dir to olddir.dir -

Renamed newdir.dir to origdir.dir

Deleted all files out of oldir.dir

Deleted olddir.dir.

No error or issues with any of the above. response time was normal.

 

Did a dir disk1:[*...]  process hung in the same place. the directory list 56 of the 60 files in origdir.dir 

The path is only disk1:[topdir.origdir] so it is not like it is buried 30 levels down.

Hoff
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

Check the lock ownership, when things wedge up.  (DEC AMDS or Availability Manager are the easiest there.)

 

Check the next directory in the search order, too.  See if that's the trigger.

 

Check for errors.