1754020 Members
7584 Online
108811 Solutions
New Discussion юеВ

Re: SCSI Disk Errors

 
SOLVED
Go to solution
MikeL_4
Super Advisor

SCSI Disk Errors

We started receiving read errors on one of our SCSI disk drives a week ago that was not caught:

May 14 15:39:08 ignite1 vmunix: msgcnt 1 vxfs: mesg 038: vx_dataioerr - /dev/vgi
gn1/ignitelv file system file data read error in block 7399426

May 19 10:42:40 ignite1 vmunix: msgcnt 13 vxfs: mesg 008: vx_direrr: vx_readdir2
_1 - /var/opt/ignite file system dir inode 103811 block 6422535 dirent inode 0 e
rror 5


When we started doing our weekly Ignite make recoveries this past weekend they all failed due to not being able to write to the Volume.

Is there a way in the Event Monitor to trap these errors so that we receive them in our Event Monitor EMail's ??
4 REPLIES 4
Steven E. Protter
Exalted Contributor
Solution

Re: SCSI Disk Errors

Shalom,

You should be able to configure EMS with SAM to trap the issues and notify via email.

You have a bad sector on your disk and probably have block relocation set to enabled.

I'd get a good backup, id the disk with cstm,xstm or mstm and replace the disk.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Ajitkumar Rane
Trusted Contributor

Re: SCSI Disk Errors

Mike,

Try /etc/opt/resmon/lbin/monconfig to configure receiving EMS alerts


rgds,

Ajit
Amidsts difficulties lie opportunities
MikeL_4
Super Advisor

Re: SCSI Disk Errors

.
Andrew Merritt_2
Honored Contributor

Re: SCSI Disk Errors

The EMS HW Monitors are enabled by default; you should not need to run monconfig to turn this on. Run 'monconfig' and select the Check option to see that the disks affected are listed.

You do NOT configure the EMS HW monitors using SAM (SEP please note! I think you've stated this before; it's not true.).

The disk monitor, disk_em, does not monitor virtual disk arrays (XP, VA, etc.), so if that's what you have, that would explain why no EMS event was generated.

If these are directly attached SCSI disks, then you need to check you have a current version of the OnlineDiags (running 'cstm' will show the version you have installed, including the patch level).

Note also that the monitors do not generate an EMS event for every single low-level failure; there are thresholds set so that the events are only generated when the number of failures indicates a likely real problem. Hardware is resilient to a certain degree, and one or two recoverable errors are acceptable.

I notice you've closed the thread; did you find the answer?

Andrew