- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: System disk errors when running "analyze/disk"
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2008 12:35 AM
тАО09-10-2008 12:35 AM
Re: System disk errors when running "analyze/disk"
E.g. DFU search dev: /fid=17665
Don't know why we don't get the file name. Must be old software and HP prefers to implement new stuff.
Wim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2008 01:35 AM
тАО09-10-2008 01:35 AM
Re: System disk errors when running "analyze/disk"
I also ran DFU verify/fix but it was not able to fix the errors too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2008 03:14 AM
тАО09-10-2008 03:14 AM
Re: System disk errors when running "analyze/disk"
Oswald
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2008 03:58 AM
тАО09-10-2008 03:58 AM
Re: System disk errors when running "analyze/disk"
multalloc means that you have a serious problem with the integrity of the disk structure, and that two or more files have the same logical blocks allocated to them. So you have to delete at least one of them, but you can't be sure which one. I would be very careful with this, and preferably reinstall vms on a new volume. Or do manual recovery, but then you REALLY have to know what you do.
Jur.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2008 05:53 AM
тАО09-10-2008 05:53 AM
Re: System disk errors when running "analyze/disk"
I'm using DFU v3.2. As suggested above, I tried using it to search for other files based on file id.
Jur,
Thanks to the early responses, I have already deleted most of the errant files and has significantly the reduced the disk errors. Now, aside from quickly jumping into performing a re-installation of the OS, I need your 'expert' thoughts on how to delete/deal with the remaining errant files.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2008 09:35 AM
тАО09-10-2008 09:35 AM
Re: System disk errors when running "analyze/disk"
What event let you to run analyze disk in the first place?
We know you have a shadow set with errors, but we don't know how many members are in the shadow set, and how the members are connected to the system. Do any of the members have direct connections to multiple systems? By that I mean are any of the shadow set members connected to either a shared storage bus (shared SCSI bus, Fibre Channel. CI or DSSI)? I see your last analyze output was for a FC DG device. What type of Fibre Channel controller do you have (MSA, EVA, XP, HSG80, something else)? Was (at least) one of the shadow set members ever presented to more than one system that were not part of the same cluster?
Can you please give us a bit of information about your hardware/software configuration?
Can you do the following?
$ define/job DFU$NOSMG T ! disable the pesky SMG interface
$ dfu report $1$DGA4995:
Cut and paste the output into notepad, save as a .txt file and attach to the comment. This output will provide the info we need to be able to have you dump out the blocks of indexf.sys that contain the file headers of these files. It is possible that something has overwritten parts of the indexf.sys file.
If you go back to your original posting, you will see that the first errors are reporting files marked for delete, but then there is a block of 14 contiguous file headers (may not be contiguous on disk, but probably are) that have %ANALDISK-W-BADHEADER. These for the headers from 17711 to 17724
I will reproduce the first and last one, see the first message for the rest.
%ANALDISK-W-BADHEADER, file (17711,24624,0)
invalid file header
-ANALDISK-I-FIDNUM_ZERO, file number zero but not a valid deleted header
-ANALDISK-I-INVHEADER_BUSY, invalid file header marked "busy"
in index file bitmap
...
%ANALDISK-W-BADHEADER, file (17724,0,0)
invalid file header
-ANALDISK-I-IDLEHEADER_BUSY, idle file header marked "busy"
no user action necessary
This is the portion of the indexf.sys file that I think had a good chance of being overwritten. We can't determine what to dump without knowing more about the disk, and that is reported in the output of dfu report. (The line "First header VBN" is the most important, but unless there is something sensitive in the output, please provide the complete report.
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2008 06:19 PM
тАО09-10-2008 06:19 PM
Re: System disk errors when running "analyze/disk"
*** What event let you to run analyze disk in the first place?
I ran analyze/disk on the system when I encountered the following errors upon accessing the audit journal.
%AUDSRV-W-BADRECORD, invalid data in record 184669
%RMS-F-IRC, illegal record encountered; VBN or record number = 169176
*** Do any of the members have direct connections to multiple systems? By that I mean are any of the shadow set members connected to either a shared storage bus (shared SCSI bus, Fibre Channel. CI or DSSI)? Was (at least) one of the shadow set members ever presented to more than one system that were not part of the same cluster?
We are using EMC storage on a two-node ES40 cluster which has common system disk. Most of the shadow sets (including the system disk) have two members. The system is now up with only one volume ($1$DGA899). I'm running the DFU and analyze/disk/repair on the 2nd member, $1$DGA4995 which I mounted privately.
Most of the errant files that showed in my original posting were created at the time when HP tried to configure the second node in the cluster. I recall some problems were encountered that time and I believe it connected to the common system disk as a separate node which may have caused these errors.
*** Can you do the following?
$ define/job DFU$NOSMG T ! disable the pesky SMG interface
$ dfu report $1$DGA4995:
Pls refer to attached dfu report.
Much thanks for the help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2008 08:15 PM
тАО09-10-2008 08:15 PM
Re: System disk errors when running "analyze/disk"
I would think twice before inviting them back for system upgrades.
I would not trust anything on the disk. You did see some of the effects when you analyzed the audit file. There are probably other errors you are not yet aware of, and these errors are being copied to your system backups. Hopefully you still have a backup from prior to the addition of the second node. If you do have backups from prior to the event, write protect them and keep them safe. You may need them to restore data files from.
You even have multiply allocated blocks in the same file (17670,16,0)
%ANALDISK-W-MULTALLOC, file (17670,16,0) 0├Г┬в??├Г┬в?? ├Г ├В┬╕├Г┬в?├В┬м0├Г┬в?├В ├Г┬в?? O├Г┬в?├В┬м├Г ├Г ├В┬┐x ├Г┬в?├В┬мP y o├Г┬в?├В┬м
multiply allocated blocks
VBN 110161 to 110192
LBN 7931120 to 7931151, RVN 1
%ANALDISK-W-MULTALLOC, file (17670,16,0) 0├Г┬в??├Г┬в?? ├Г ├В┬╕├Г┬в?├В┬м0├Г┬в?├В ├Г┬в?? O├Г┬в?├В┬м├Г ├Г ├В┬┐x ├Г┬в?├В┬мP y o├Г┬в?├В┬м
multiply allocated blocks
VBN 110977 to 111008
LBN 7931120 to 7931151, RVN 1
Here LBN 7931120 to 7931151 are mapped to VBN 110161 to 110192 and mapped again to VBN 110977 to 111008, both in the same file (17670,16,0).
If you run the DFU report and look for "first header VBN", take that number, subtract one (the first file number is 1, not 0), then add the file number of a file you want to dump the header of (for example 17670 for the file with file id (17670,16,0). This will give the VBN of [000000]INDEXF.SYS that has the file header for the file.
Since on your $1$DGA4995:, the First header VBN is 827, to dump the file headers, add 826 to the file number, and dump that block of [000000]indexf.sys
For example to dump the header of the file that has the same LBNs mapped twice, (17670,16,0), dump VBN 18496 (17670+826) of [000000]indexf.sys
$ dump/file_header/block=(start:18496,count:1) $1$DGA4995:[000000]indexf.sys
or to see it in hex/ascii format
$ dump/block=(start:18496,count:1) $1$DGA4995:[000000]indexf.sys
Look at the retrieval pointers and you will see that the same LBNs are mapped more than once.
Example: (much left out for brevity), see attachment for full details.
$ dfu report disk$user1
...
First header VBN : 998
,,,
$ dir login.com;/file
Directory ROOT$USERS:[JON]
LOGIN.COM;241 (340977,27,0)
Total of 1 file.
$ vbn = (998-1)+340977
$ sho sym vbn
VBN = 341974 Hex = 000537D6 Octal = 00001233726
$ dump/file/block=(start:341974,count:1) disk$user1:[000000]indexf.sys
Dump of file DSA1200:[000000]INDEXF.SYS;1 on 10-SEP-2008 21:08:12.46
File ID (1,1,0) End of file block 444376 / Allocated 801008
Virtual block number 341974 (000537D6), 512 (0200) bytes
Header area
...
File identification: (340977,27,0)
...
Identification area
File name: LOGIN.COM;241
...
$
If you use dump without /file_header, it will dump the contents of the file header in a hex dump instead of formatting it as a file header.
$ dump/block=(start:341974,count:1) disk$user1:[000000]indexf.sys/width=80
You may be able to see some clues in the ascii text if some other file got mapped over that portion of indexf.sys.
Good luck (you will need it),
Jon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2008 10:18 PM
тАО09-10-2008 10:18 PM
Re: System disk errors when running "analyze/disk"
Right, so there probably has been a partitioned cluster. Someone clearly did not know enough about clusters.
I would for sure toss this disk and rebuild it or restore a backup from before the event as you don't know which blocks are corrupt. You may very well run into subtile (or not so subtile) issues later on. And as said by Jon, don't let these people touch your system ever again.
Jur.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО09-10-2008 11:20 PM
тАО09-10-2008 11:20 PM