1826201 Members
2474 Online
109691 Solutions
New Discussion

Re: DISK QUESTIONS

 
John Pendergrass
New Member

DISK QUESTIONS

History: I received 4 disk errors (noted below) and despite the fact that the operation completed successfully, it would be helpful to know the following.

1)What is a SYMBOL ECC ERROR and what causes these?

2)Is there a method to map a logical block to a file name?
17 REPLIES 17
Mohamed  K Ahmed
Trusted Contributor

Re: DISK QUESTIONS

What are the errors exactly?
Willem Grooters
Honored Contributor

Re: DISK QUESTIONS

John,
1) IIRC: ECC = Error Correction Code. A method used to be able to correct minor errors in data. If the disk reads data, it will calculate a checksum and compare it with the stored ECC number. In case of mismatch, the disk firmware is able to correct this, based on the algorithm.
2) Yes.
If you want to see the blocks that hold the contents of a file, this is the command:

$ DUMP/HEADER/BLOCKS=COUNT=0

You'll find the retrieval pointers in the end of the output. These are the logical blocks where a fragment starts and the number of block that the fragment spans.

Doing it the other way requires a scan of INDEXF.SYS. the DFU utility has the facility to do this. If you know the Logical Block Number (LBN), SEARCH /LBN= will give you the file(s) that map on this block

Willem
Willem Grooters
OpenVMS Developer & System Manager
Uwe Zessin
Honored Contributor

Re: DISK QUESTIONS

In this context, a 'SYMBOL' means a group of bits. Sounds like the disk drive was able to correct the error, so it delivered the data and the operation completed successfully.
.
labadie_1
Honored Contributor

Re: DISK QUESTIONS

You should worry when you see "uncorrectable ECC error". Of course, other errors require your attention too, but I think this one is the "worst"
Guillou_2
Frequent Advisor

Re: DISK QUESTIONS

Hi,

if you have the freeware DFU you can do
dfu>search/lbn=

regards

Steph


John Eerenberg
Valued Contributor

Re: DISK QUESTIONS

If you have DECevent (or similar), you can look for "Recovered Error." If you keep getting this recoverable error often and the LBA's are fairly close, then it is time to replace the disk.

You have 4 errors so far, if you get more, think about replacing the disk. At least that is what we do in being proactive (maybe a little too). The frequency of the error is something you can discuss with your HP service rep.
It is better to STQ then LDQ
Bojan Nemec
Honored Contributor

Re: DISK QUESTIONS

Hi,

There is a small command procedure to find the file containing the logical block. In fact two command procedures:
lbn.com

$ if f$trnlnm("file_found","lnm$job").nes."" then deassign/job file_found
$e:
$ on warning then goto e
$l:
$ f = f$search(p1)
$ if f.eqs."" then goto end
$ s='f$file(f,"EOF")'
$ if s.gt.0
$ then
$ pipe dump/head/block=end=0 'f' | -
search/exact/match=and sys$pipe "Count:","LBN:" | -
@lbn1 'p2'
$ endif
$ if f$trnlnm("file_found","lnm$job").eqs."" then goto l
$ deassign/job file_found
$ write sys$output f
$end:

and
lbn1.com

$ sea = 'p1'
$l:
$ read sys$pipe l/end=end/error=end
$ l=f$edit (l,"compress,trim")
$ blocks = f$element(1," ",l)
$ lbn = f$element(3," ",l)
$ elbn = 'lbn' + 'blocks'
$ if sea.ge.lbn.and.sea.lt.elbn
$ then
$ write sys$output "Found ''sea' betwen ''lbn' and ''elbn'"
$ define/job file_found 1
$ endif
$ goto l
$end:

Run the lbn.com command procedure with a wildcard file specification as first parameter and the logical block number as the second parameter.

To search the whole system disk type:
$ @lbn sys$sysdevice:[*...]*.*;* 123456

This will (very slowly) search all the files for that LBN. Open files will not be searched! But you will receive a message like this
%SYSTEM-W-ACCONFLICT, file access conflict
\SYS$SYSROOT:[SYSEXE]ACME$SERVER_CONFIG.TMP;1\

for them.

Bojan
Wim Van den Wyngaert
Honored Contributor

Re: DISK QUESTIONS

John,

I'm not 100% sure of this answer!

I have the impression that the disk errors are reported by the disk controller. If it says recovered, it means that the layer reporting the error has recovered it and there is no problem (e.g; by disk mirror or raid).

If it was not recovered, it is possible that the shadowing software corrected it.

Without shadowing, the error was passed with a "presumably correct version of the data" to the program. If the program didn't test the IO status, it is possible that it continued. It is also possible that the data was incorrect ...

To find corrupt files : anal/disk/read disk.

To repair : delete/erase or purge/erase after you recovered the contents. I found that /noerase didn't always repaired the error.

Wim
Wim
Wim Van den Wyngaert
Honored Contributor

Re: DISK QUESTIONS

John,

I also had disks that gave 200 errors in a few weeks and after that al went fine. So, 4 is no problem.

Wim
Wim
Uwe Zessin
Honored Contributor

Re: DISK QUESTIONS

$ ANALYZE /DISK_STRUCURE /READ_CHECK

will not necessarily find 'corrupt files'. It simply reads all allocated blocks twice and compare the data (according to HELP). That only ensures that the media can be read.

If the data has been 'corrupted' by some other error it can not report this, because it does not understand the logical structure of a file.
.
John Pendergrass
New Member

Re: DISK QUESTIONS

Thanks guys for all the response. Don't know why it took so long for it to post as I submitted the question a couple of months back. I'll try DFU next time it occurs.
Wim Van den Wyngaert
Honored Contributor

Re: DISK QUESTIONS

John, Uwe,

The anal/disk/read will read all blocks and check for pariry errors. Thats's the only error I got until now. It may b e that it reads twice but in my opinion modern disks should return the same contents twice (but again, not 100% sure).

For shadow sets this could result in disk errors but those will be most probably be corrected.
Wim
Jan van den Ende
Honored Contributor

Re: DISK QUESTIONS

Wim,



To repair : delete/erase or purge/erase after you recovered the contents. I found that /noerase didn't always repaired the error.



I guess this is expectable behaviour:

On a DELETE (which purge also is) with (may be implied) /NOerase, you just mark the disk blocks as available, clean up the header, and do everything that goes with keeping it consistent. You do NOT do anything to the contents of the diskblocks themselves, so any pattern on the disk reported as bad, stays there. If you apply /ERASE, the disks blocks ARE written to. If that can be done without errors, the error is gone. An dif not, that block is re-located by the drive, and although PHYSICALLY the error is still there, it will LOGICALLY be gone. The INTERNAL disk functionality will have to move the head to the bad block replacement area the address this block, but to the world outside the drive it is PRESENTED as errorfree.

Jan
Don't rust yours pelled jacker to fine doll missed aches.
Wim Van den Wyngaert
Honored Contributor

Re: DISK QUESTIONS

Jan,

But what if the "bad block flag" is set for that file ? Is /erase still necessary ?
I wonder which utilities set the flag when encountering a bad block ...

Wim
Wim
Keith Parris
Trusted Contributor

Re: DISK QUESTIONS

If there is a Forced-Error Flag set for the sector, it will be reset when the sector is overwritten. Overwriting may take place either when you include the /ERASE qualifier on a DELETE or PURGE, or some time later when the LBN is allocated to another file and the contents are overwritten with new data at that time.
Wim Van den Wyngaert
Honored Contributor

Re: DISK QUESTIONS

Keith,

Do you mean that during delete/erase, the block is not moved to the bad block list of the disk itself nor the one of VMS ?

Wim
Wim
Keith Parris
Trusted Contributor

Re: DISK QUESTIONS

I don't think the VMS bad-block mechanism (BADBLK.SYS, etc.) gets used much these days. (Maybe on floppy disks?) Modern disks tend to have their own internal bad-block revectoring mechanisms. Generally, the bad block is revectored by the drive under control of VMS when an error that is uncorrectable (or correctable, but beyond a specified error severity threshold) is first detected; the data (as best it can reconstructed -- if it can't be fixed completely, a Forced-Error Flag is included on the sector) is moved to a new sector at the time of the error. So when you rewrite it (with the erase pattern using /ERASE, or by overwriting it as you populate a new file later), you're writing to the revectored location provided by the drive.