HPE Community read-only access December 15, 2018
This is a maintenance upgrade. You will be able to read articles and posts, but not post or reply.
Hours:
Dec 15, 4:00 am to 10:00 am UTC
Dec 14, 10:00 pm CST to Dec 15, 4:00 am CST
Dec 14, 8:00 pm PST to Dec 15, 2:00 am PST
Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Purge/log disk1:[000000...] hangs.

 
SOLVED
Go to solution
tobyjug
Advisor

Purge/log disk1:[000000...] hangs.

I have an ES45 running OpenVMS v7.3-2 (cannot be upgraded due to the installed application) with SAN attached disks. After the nightly backup a cleanup job runs. It now hungs on the above command. There have been no application or OpenVMS changes resently. There are no HW errors logged & nothing gets written to the log files.

Purge/log disk1:[*...] run interactively also hangs.  Dir disk1:[*...] gets to a particular sub-directory and also hangs.

If I set def to the sub-directory & do a dir it works, The directory has .TXT files (used by the app) & I can type the last file listed by the dir disk1:[*...] I can also type the next files that should have been listed.

The application has a lot of files open on this disk, if I have to run an **bleep**/disk do I need to stop the app so no files are open ?

What are the chances of losing data files if I do a /repair ?

Note: The application has not been affected, at least the users have not reported one.

13 REPLIES
John Gillings
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

Sounds like a broken directory. DUMP/DIRECTORY might reveal something.

 

Are the hung processes looping or waiting? How do you get out of the hang?

 

ANALYZE/DISK/REPAIR is very unlikely to lose data. Remember directories are just files which point to other files. You can delete them without deleting the contained files. They'll become "lost" and can be recovered with ANALYZE/DISK

 

A few options... 

 

Try RENAMEing files out of the bad directory into another one. Most likely *.* will hang just like DIR and PURGE, but you may be able to take chunks like A*.*, B*.*, etc...

 

If that moves everything, delete the old directory and RENAME the new one back to the old name.

 

If you can't RENAME anything, or end up with files that can't be moved, DELETE the bad directory. You'll need to set protection to D and SET FILE/NODIRECTORY for the DELETE to work. You can then to an ANALYZE/DISK/REPAIR to recover the lost files into [SYSLOST] and RENAME [000000]SYSLOST.DIR to the name of the bad directory, then SET DIRECTORY/OWNER to the correct owner.

 

(note that to the prudish HP "naughty word" filter doesn't like the ANALYZE command to be contracted - lucky they don't host a protology forum) 

A crucible of informative mistakes
John Gillings
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

Oh, and also note that you can safely RENAME open files. To do this live, RENAME the bad directory and immediately create the new directory with the old name. There will be a few timing windows, but unless the file turnover is very high, the application probably won't notice.

A crucible of informative mistakes
Andy Bustamante
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

In addition to "what John said" above. 

 

You state that the directory command exectueswhen you SET DEFAULT to the directory that hangs on purge.  What happens if you you execute the purge in just that directory?  How many files are in this directory?  OpenVMS may have slow performance in deleting files out of a very large directory, especially if an application is creating files at the same time. If you execute DIR/SIZ=ALL hanging_directory.dir what is result?

 

 

 

 

 

 

If you don't have time to do it right, when will you have time to do it over? Reach me at first_name + "." + last_name at sysmanager net
tobyjug
Advisor

Re: Purge/log disk1:[000000...] hangs.

Thanks John.

Sorry about the analyze thing. I wasn't thinking.

hung processes are looping. <CTL> Y or deleting the batch job will kill them without a problem.

 

I'll  give you your recommendations a go & get back to you. I'll have to arrange an outage for the application.

tobyjug
Advisor

Re: Purge/log disk1:[000000...] hangs.

Hi Andy,

 

60 files 19438 block in the dodgy directory. No files in this directory require purging.

dir/siz=all comes back with "dodgydirectory.dir  4/19 " "

Total of 1 file, 4/19 blocks."

Bob Blunt
Respected Contributor

Re: Purge/log disk1:[000000...] hangs.

Even though it is no longer being created within HP I'm surprised that noone has suggested DFU to help figure out what might be causing (and possibly giving you a tool to fix) your problem.

 

http://www.digiater.nl/dfu.html

 

Can't really add anything to the recommendations that you've already seen with the exception of the above.  If your process is looping then I'd suspect that it was something directory-oriented as well.  DFU can figure that out for you.  If the directory file is overly large (not an uncommon problem, especially with application-generated text or logfiles) then that can cause similar-looking delays but the directory would have to be REALLY big, in excess of 1200 VMS blocks unless it really got scrozzled.

 

bob

Hoff
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

I've seen directory contention trigger this, I've seen poor application designs trigger this, and disk errors, and wacky disk firmware bugs, controller-level errors, and various other triggers.  All odd stuff, and all usually rare, but stuff does happen.

 

Is this directory active from other applications?  References to batch jobs implies that's the case, and one of those processes holding a directory lock at a low priority can slam access.  (If so, shut that stuff off and see if the PURGE goes through.)  

 

Is this a directory filled with log files?

 

Are there any I/O or disk errors being logged?

 

Recreate the directory.

 

How recent and how complete are your backups?  If you're not sure of that, BACKUP /IMAGE the disk out to a saveset, and back in.

 

The other detail to look at is the locking activity when the hang arises; AMDS or ANALYZE /SYSTEM locking-related commands.

 

The performance on big directories is vastly better with recent OpenVMS releases, but still trailing competitive implementations.  But the classic directory size problems should not hit a directory with that few blocks allocated.

tobyjug
Advisor

Re: Purge/log disk1:[000000...] hangs.

Found out the files in this directory are "control files" for want of a better term. They are only updated if we change reference codes, user access levels or add remove users from the app.

 

I made sure none of the files were open. Created newdir.dir - copied all files from origdir.dir to newdir.dir

Renamed origdir.dir to olddir.dir -

Renamed newdir.dir to origdir.dir

Deleted all files out of oldir.dir

Deleted olddir.dir.

No error or issues with any of the above. response time was normal.

 

Did a dir disk1:[*...]  process hung in the same place. the directory list 56 of the 60 files in origdir.dir 

The path is only disk1:[topdir.origdir] so it is not like it is buried 30 levels down.

Hoff
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

Check the lock ownership, when things wedge up.  (DEC AMDS or Availability Manager are the easiest there.)

 

Check the next directory in the search order, too.  See if that's the trigger.

 

Check for errors.

tobyjug
Advisor

Re: Purge/log disk1:[000000...] hangs.

Next directory in the list may be the problem

I have 2 identically named directories, with different file IDs. Same owner & created at the same time.

 

dir/siz=all renver.dir

 

Directory $1$DGA102:[HNA_300_CCLUSER]

RENVER.DIR;1 1/19

RENVER.DIR;1 1/19

Total of 2 files, 2/38 blocks.

 

dir/siz=all renver.dir/ful

 

RENVER.DIR;1 File ID: (3293,55524,0)

Size: 1/19 Owner: [CERNER,P30INS]

Created: 30-JAN-2012 08:08:18.24

Revised: 7-FEB-2012 09:08:45.49 (4)

Expires: <None specified>

Backup: <No backup recorded>

Effective: <None specified>

Recording: <None specified>

Accessed: <None specified>

Attributes: <None specified>

Modified: <None specified>

Linkcount: 1

File organization: Sequential

Shelved state: Online

Caching attribute: Writethrough

File attributes: Allocation: 19, Extend: 0, Global buffer count: 0

Default version limit: 32767, Contiguous, Directory file

Record format: Variable length, maximum 512 bytes, longest 512 bytes

Record attributes: No carriage control, Non-spanned

RMS attributes: None

Journaling enabled: None

File protection: System:RWE, Owner:RWE, Group:RWE, World:RWE

Access Cntrl List: None

Client attributes: None

 

RENVER.DIR;1 File ID: (13148,19558,0)

Size: 1/19 Owner: [CERNER,P30INS]

Created: 30-JAN-2012 08:08:18.24

Revised: 7-FEB-2012 09:08:45.49 (4)

Expires: <None specified>

Backup: <No backup recorded>

Effective: <None specified>

Recording: <None specified>

Accessed: <None specified>

Attributes: <None specified>

Modified: <None specified>

Linkcount: 1

File organization: Sequential

Shelved state: Online

Caching attribute: Writethrough

File attributes: Allocation: 19, Extend: 0, Global buffer count: 0

Default version limit: 32767, Contiguous, Directory file

Record format: Variable length, maximum 512 bytes, longest 512 bytes

Record attributes: No carriage control, Non-spanned

RMS attributes: None

Journaling enabled: None

File protection: System:RWE, Owner:RWE, Group:RWE, World:RWE

Access Cntrl List: None

Client attributes: None

Total of 2 files, 2/38 blocks.

John Gillings
Honored Contributor
Solution

Re: Purge/log disk1:[000000...] hangs.

Two identically named files might indicate a problem in the parent directory, rather than the directories themselves.

 

So what does ANALYZE/DISK say? What about DFU?

 

Try RENAMEing the errant name with /CONFIRM, just rename ONE of them and see what you get.

 

Note that you're better off RENAMEing that COPY. You don't need to worry about the files being open, don't need to clean up any mess and won't change creation dates, ownership or other file attributes.

A crucible of informative mistakes
tobyjug
Advisor

Re: Purge/log disk1:[000000...] hangs.

Still to download DFU.

Analyze show a number of

%ANALDISK-W-BAD_NAMEORDER,

errors for the parent directory but renver is not one of them.

 



Hoff
Honored Contributor

Re: Purge/log disk1:[000000...] hangs.

There's the likely bug.  Recreate the parent directory.

 

I'd also look for XQP, RMS, I/O and update patches.