cancel
Showing results for 
Search instead for 
Did you mean: 

Disk error ?

Slayer Slayer
Frequent Advisor

Disk error ?

Hello All,

I was checking the /var/adm/syslog.dated files and i ´ve got the following error:

Nov 5 19:13:23 dasscws1 vmunix: AdvFS I/O error:
Nov 5 19:13:23 dasscws1 vmunix: Domain#Fileset: poa_db#db_prod
Nov 5 19:13:23 dasscws1 vmunix: Mounted on: /db_prod
Nov 5 19:13:23 dasscws1 vmunix: Volume: /dev/rz48a
Nov 5 19:13:23 dasscws1 vmunix: Tag: 0x000072a7.8002
Nov 5 19:13:23 dasscws1 vmunix: Page: 721
Nov 5 19:13:23 dasscws1 vmunix: Block: 7323728
Nov 5 19:13:23 dasscws1 vmunix: Block count: 16
Nov 5 19:13:23 dasscws1 vmunix: Type of operation: Read
Nov 5 19:13:23 dasscws1 vmunix: Error: 5
Nov 5 19:13:23 dasscws1 vmunix: To obtain the name of the file on which
Nov 5 19:13:23 dasscws1 vmunix: the error occurred, type the command:
Nov 5 19:13:23 dasscws1 vmunix: /sbin/advfs/tag2name /db_prod/.tags/29351
Nov 5 19:13:28 dasscws1 vmunix: AdvFS I/O error:
Nov 5 19:13:28 dasscws1 vmunix: Domain#Fileset: poa_db#db_prod
Nov 5 19:13:28 dasscws1 vmunix: Mounted on: /db_prod
Nov 5 19:13:28 dasscws1 vmunix: Volume: /dev/rz48a
Nov 5 19:13:28 dasscws1 vmunix: Tag: 0x000072a7.8002
Nov 5 19:13:28 dasscws1 vmunix: Page: 721
Nov 5 19:13:28 dasscws1 vmunix: Block: 7323728
Nov 5 19:13:28 dasscws1 vmunix: Block count: 16
Nov 5 19:13:28 dasscws1 vmunix: Type of operation: Read
Nov 5 19:13:28 dasscws1 vmunix: Error: 5
Nov 5 19:13:28 dasscws1 vmunix: To obtain the name of the file on which
Nov 5 19:13:28 dasscws1 vmunix: the error occurred, type the command:
Nov 5 19:13:28 dasscws1 vmunix: /sbin/advfs/tag2name /db_prod/.tags/29351

And also i check "uerf -f binary.errlog" to find something else and i found some disk errors.

********************************* ENTRY 1557. *********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 1956.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Fri Nov 5 19:36:22 2004
OCCURRED ON SYSTEM dasscws1
SYSTEM ID x000B0022
SYSTYPE x00000000

----- UNIT INFORMATION -----

CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # x0006
x0180 LUN x0
TARGET x0

Does anyone knows why these errors happened ?
And iss there a way to solve it ?
Thanks a lot

Regards

Brun
4 REPLIES
Venkatesh BL
Honored Contributor

Re: Disk error ?

Did you find the filename using the 'tag2name' command?
Hein van den Heuvel
Honored Contributor

Re: Disk error ?


>> Does anyone knows why these errors happened ?

All connections and all disk fail over time.
Often a very long time, and you may never see a problem, but sometimes it hurts.
I suspect you will also find an application (Oracle alert) error log incident to correspond with the system detected IO error.

SOmetimes you can correlate it with an event and then ignore it: we there a power problem in the IO path? Was someone playing with the wires at that time?.

And iss there a way to solve it ?

No, not really.

Assuming it was a real disk error, replace the disk involved. You may need a disk controller even entry to identify the actuall disk. You may find the controller itself had a problem.

Sometimes you want to 'risk' it and just treat it as an early warning. When it happens again (to the same disk or controller) then initiate a hardware replace.

> "uerf -f binary.errlog" to find something else and i found some disk errors.

If they are all pretty close in time, you may choose to ignore it as a 'glitch'. You should use the multipel entries as an opportunity to correlate: same disk? same area on disk? same controller? same HBA?...


fwiw,
Hein.
Slayer Slayer
Frequent Advisor

Re: Disk error ?

THanks for the response and sorry for the delay.

I found some messages at Database log files and i checked some errors and i found out that it could be a hardware problem.

Now the machine is OKAY but i am worried about thoose errors because the DataBases crashed due to those errors.

I ´ll let you know what really happend.
I am investigating it .

Thatnks a lot

Bru
Han Pilmeyer
Esteemed Contributor

Re: Disk error ?

The error log report you posted doesn't really help to explain why you got the read errors. A more detailed report would help.

It looks like you are using uerf to show the error. If you add "-o full" you get a more detailed report of the error. However, on modern systems, we strongly recommend using Compaq Analyze (CA) or DECevent (dia) depending on your hardware type.