Operating System - OpenVMS
1826159 Members
4291 Online
109691 Solutions
New Discussion

Re: -RMS-F-RER, file read error

 
SOLVED
Go to solution

-RMS-F-RER, file read error

Hi all,

We have OpenVMS V8.3-1H1 running on IA64. One physical 137Gb SCSI disk drive, mounted as one logical device, no partitioning.
Everything was fine so far but recently I’ve started noticing errors while copying EXE images. For example I have a folder with 100 exe images (each is ~50Mb size). While copying folder’s contents to another one I have one-two errors like this:

SYSTEM$ copy [.srv1]*.exe [.srv2]
%COPY-E-READERR, error reading SYS$SYSDEVICE:[USER.SRV1]CR080929.EXE;1
-RMS-F-RER, file read error
-SYSTEM-F-PARITY, parity error
%COPY-W-NOTCMPLT, SYS$SYSDEVICE:[USER.SRV2]CR080929.EXE;1 not completely copied

which results in partially copied exe. Though after it I can copy these ‘failed’ EXE files one by one and it goes fine in most cases.

As far as I understand it is a hardware related error which leads to the question 'Is there any chance to detect failed hardware or diagnose the system/hdd to reveal the root cause?'

Any input is appreciated.

Best regards,
Dmitry Sinelnikov
21 REPLIES 21
Volker Halle
Honored Contributor

Re: -RMS-F-RER, file read error

Dmitry,

the disk should have logged an error due to the parity error on read. You need to analyse SYS$ERRORLOG:ERRLOG.SYS to find out about the error and the affected LBA (logical block number).

You will most likely need to run SEA (part of WEBES) or even DECevent V3.4 (only available on OpenVMS Alpha) to decode/translate the error log entry and find this piece of information.

Volker.

Re: -RMS-F-RER, file read error

unfortunately there is only one log file in SYS$ERRORLOG created two months ago

$ show time
8-SEP-2009 13:57:53

$ dir /col=1 /date
Directory SYS$SYSROOT:[SYSERR]
ERRFMT_IPMI_SEL.DAT;1 17-JUL-2009 04:04:45.66
ERRLOG.SYS;1 17-JUL-2009 03:51:27.42

Total of 2 files.
Volker Halle
Honored Contributor
Solution

Re: -RMS-F-RER, file read error

Dmitry,

there it is: ERRLOG.SYS. All hardware-related errors are written/appended to that BINARY file.

You need a tool, to decode/translate the error information in that file.

OpenVMS (since V7.3-2) comes with ANAL/ERR/ELV, but this tool does not decode the details in most types of errlog entries.

You need SEA (System Event Analyzer), which comes as part of the WEBES tool suite. There is also a version of WEBES for Windows, if you don't want to install WEBES on your OpenVMS I64 system. But I'm not sure, if SEA will decode enough of the details of those disk errors to allow you to obtain the LBA numbers.

Only DECevent V3.4 will do this, but you need an OpenVMS Alpha system to install and run DECevent.

Volker.
Volker Halle
Honored Contributor

Re: -RMS-F-RER, file read error

Dmitry,

note that you could also run $ ANAL/DISK/READ on this disk, this should report all read errors, which occur on blocks allocated to any file on the disk.

Nevertheless, you would probably want to replace this disk, before the errors spread or increase. Make sure you keep a good backup of the data, but watch the backup operation log files, as BACKUP might also have problems reading those blocks.

Volker.

Re: -RMS-F-RER, file read error

Thank you, Valker.

I see numerouse errors in this file. Seems to be hdd related problem, since anal/disk/read returns numerous parity errors like this:
%ANALDISK-W-READFILE, file (9166,1,0) ACCOUNTNG.DAT;1
error reading VBN 3450886
-SYSTEM-F-PARITY, parity error

Thus I have a question regarding backup procedure. Is it possible to run it while system is running ? I have some difficulties booting from IA64 DVD (after executing fsn:\efi\boot\bootia64.efi system displays two warnings and freezes with cursor blinking, thus I can not enter DCL commands - another issue to figure out...) so I have to run BACKUP utility on a live system.

Hein van den Heuvel
Honored Contributor

Re: -RMS-F-RER, file read error


>> -SYSTEM-F-PARITY, parity error

That means trouble indeed.

But be aware of an other error you may get in the future after backup and restores involving an IO problem:

RER, file read error
FORCEDERROR, forced error flagged in last sector read

That would be the original error carried forward as a reminder that the data is not to be trusted. Cleared by writing the block (file).

>> Thus I have a question regarding backup procedure. Is it possible to run it while system is running ?

Yes, with minor only caveats in this case.
Obviously activity could happen behinds Backup's back. That may lead to stale data or inconsistencies. But in this situation you know not to expect many, if any, changes. You can, and should, accept that risk in this case by specifying /IGNORE=INTERLOCK.

>> I have some difficulties booting from IA64 DVD (after executing fsn:\efi\boot\bootia64.efi system displays two warnings and freezes with cursor blinking, thus I can not enter DCL commands - another issue to figure out...) so I have to run BACKUP utility on a live system.

That suggests to me that there may be a bigger IO problem, but it could also be the same one. If you can get (touch!) the box (guess you can to stick in the dvd), then I would power down and re-seat everything:
- memory
- disk drive
- drive cable
- pci interface
- pci cage (rx26[02]0?)

Good luck!
Hein

Volker Halle
Honored Contributor

Re: -RMS-F-RER, file read error

Dmitry,


(after executing fsn:\efi\boot\bootia64.efi system displays two warnings and freezes with cursor blinking, thus I can not enter DCL commands - another issue to figure out...)


Would you care to show us those warnings ? Maybe we can draw conclusions from seeing the real messages and further help you along...

Volker.
Willem Grooters
Honored Contributor

Re: -RMS-F-RER, file read error

On the backup issue:

BACKUP/IGNORE=INTERLOCK may cause corruption (read: data loss) when files are being written to during update: indexed or relative files, databases...
To minimise the risk, stop every process that may update a file, and stop databases during backup. If it is possible to dismount the disk from the system, mount it locally, you can safely back it up.
Keep in mind though, that files that are corrupted on disk, will be backup in that (corrupted) state.

Eventually, think about rebooting the machine Minimal to do your backup.

Do so ASAP. The disk seems broken to be. this is an error you would not want to see .
Willem Grooters
OpenVMS Developer & System Manager

Re: -RMS-F-RER, file read error

Interesting thing about these 'parity errors' is that I can still easily copy the file one by one as I wrote before. Am I right in my understanding that parity errors doesn't mean physical disk block corruption?

Some more details regarding boot from DVD:
Regular OpenVMS boot displays the same warnings:
LOADER-W-Conout device path cannot be set to multiple devices
LOADER-I-Select Console Device Paths from the Boot Manager Menu.
then freezes for about 5-10 minutes and then I see the system being loaded correctly.
After mounting DVD in EFI shell and executing 'fsn:\efi\boot\bootia64.efi' I see the same two warning as if it is a regular OpenVMS boot up with the only difference - it freezes for a longer time. I tried to wait almost half an hour with no result.

p.s. ANAL/DISK/READ finished running, shows ~100-150 parity errors (all data on a disk is about 40Gbs).
Volker Halle
Honored Contributor

Re: -RMS-F-RER, file read error

Dmitry,

please see the OpenVMS I64 V8.3-1H1 Server Upgrade and Installation manual chapter A.2 for how to set up your consoles.

ftp://ftp.hp.com/pub/openvms/doc/BA322-90077.PDF

Volker.
Volker Halle
Honored Contributor

Re: -RMS-F-RER, file read error

Dmitry,

you need to take the caches into account !

Once you've copied the files and received the parity errors, the disk block data may still be cached somewehere - most likely in the OpenVMS XFC cache. When you then COPY the files again, all the blocks will be taken out of the cache and no real disk access may happen !

This may explain what you're seeing.

Volker.
Hein van den Heuvel
Honored Contributor

Re: -RMS-F-RER, file read error

Volker wrote>> Once you've copied the files and received the parity errors, the disk block data may still be cached somewehere - most likely in the OpenVMS XFC cache. When you then COPY the files again, all the blocks will be taken out of the cache and no real disk access may happen !


That would seem wrong and NOT the OpenVMS way of doing things. That would imply that a second application could silently get bad data and build on top of that, write it back, and in the process of the write 'fix' the disk through hardware bad block re-vectoring, and all evidence would be gone.

Yikes!

If no one else replies here with more insights, then that is something I'll have to try some say. We could use the LDdriver to inject a parity error, and then test with that.

Hein.
Volker Halle
Honored Contributor

Re: -RMS-F-RER, file read error

Hein,

you may be right. The cache (XFC) theory was just a good explanation for what Dmitry is seeing. Maybe there are other caches (on disk?) involved as well ? Or the PARITY error is somewhere else and only shows up under certain IO load patterns ?!

Volker.

Re: -RMS-F-RER, file read error

Thanks again, will try to set up console.

As for the cache being involved - I'm not completely sure but it seems that cache is not the case:
1) I 'm copying 100Mb file, after parity error I see only 80Mb size chunk of the initial EXE image in a target directory - COPY operation terminated somewhere on 80th megabyte.
2) I invoke copy operation again and it copies all 100 Mb correctly. If file was cached during first copy operation - there must have been only 80Mb which were cached before the 'parity error' occurred, am I right?

Volker Halle
Honored Contributor

Re: -RMS-F-RER, file read error

Dmitry,

with your more detailled description given now, I agreee, that the 'cache theory' is wrong.

You need to decode the errlog entries to better understand, what's going wrong.

Volker.

Re: -RMS-F-RER, file read error

Finally had some time to play with 'boot form DVD' configuration... But not sure if I can write it here. Should I start another branch for this issue?
Volker Halle
Honored Contributor

Re: -RMS-F-RER, file read error

Dmitry,

please start another topic for the 'DVD booting' problem. If necessary, you can still refer to this topic by including a link.

Volker.

Re: -RMS-F-RER, file read error

Hoff
Honored Contributor

Re: -RMS-F-RER, file read error

This...

LOADER-W-Conout device path cannot be set to multiple devices
LOADER-I-Select Console Device Paths from the Boot Manager Menu.
...

can be resolved after reading:

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1338643

and then the documents referenced there...

if you have hardware service for this box, call your provider now.

Re: -RMS-F-RER, file read error

Finally, discovered some freeware utility which allows booting from DVD and performing disk-to-disk copy operation. I have attached two SCSI drives (one of them is our 'parity error'-drive, another is a new one) to x86 platform, and ran this utility to copy corrupted drive sector by sector.
Utility copied about 5-6 Gb of data and then got stuck in numerous 'read sector error'.

I guess it proves that OpenVMS system drive is non-recoverable. Also assuming BACKUP operation would fail on it too. Sad but true.
Hoff
Honored Contributor

Re: -RMS-F-RER, file read error

Freeware? The OpenVMS distro DVD disk lets you do this sort of bulk copy stuff.

But regardless, a stuffed-up disk is a stuffed-up disk.

That's what host-based volume shadowing (HBVS) is for; disks can and do fail.

And disk failure rates and failure patterns might not be as expected:

http://labs.hoffmanlabs.com/node/93