Operating System - Linux
1752782 Members
6324 Online
108789 Solutions
New Discussion юеВ

Re: Need Help Interpreting I/O Error Messages

 
SOLVED
Go to solution
robs58
Occasional Advisor

Need Help Interpreting I/O Error Messages

Environment:

Linux rsnperf 2.6.18-92.1.1.el5 #1 SMP Thu May 22 09:01:47 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Red Hat Enterprise Linux Server release 5.2 (Tikanga)

*********************************************

Problem:

This server seems to be throwing out
scsi errors. It is currently hooked to a
SAN via a SAN switch. Most recently, the firmware to the SAN switch was updated. A review of the server logs doesn't seem to indicate any I/O errors *BEFORE* the firmware
update. The following log message occurred
*AFTER* the switch firmware upgrade:

SCSI error: return code = 0x00010000
end_request: I/O error, dev sdg, sector 524287992
sd 1:0:1:3: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdg, sector 0
sd 1:0:1:4: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdh, sector 0
sd 1:0:1:4: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdh, sector 0
sd 1:0:1:4: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdh, sector 524287992
sd 1:0:1:4: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdh, sector 524287992
sd 1:0:1:4: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdh, sector 0

Also, the following partial vgdisply -v output
seems to confirm the error messages from the logs (listed above):

# vgdisplay -v
Finding all volume groups
/dev/sda: read failed after 0 of 4096 at 0: Input/output error
/dev/sdb: read failed after 0 of 4096 at 0: Input/output error
/dev/sdc: read failed after 0 of 4096 at 0: Input/output error
/dev/sdd: read failed after 0 of 4096 at 0: Input/output error
/dev/sde: read failed after 0 of 4096 at 0: Input/output error
/dev/sdf: read failed after 0 of 4096 at 0: Input/output error
/dev/sdg: read failed after 0 of 4096 at 0: Input/output error
/dev/sdh: read failed after 0 of 4096 at 0: Input/output error
Found duplicate PV xjjRbOG92EcZA4tIWdCgCTg1hfYPl2w1: using /dev/sdm not /dev/sdi
Found duplicate PV 0MKA80XY7DWOKcmIgohS6l8IENebjUgT: using /dev/sdn not /dev/sdj
Found duplicate PV u2VZKuhxRDOVCXunN0czzOvaF4liqmZg: using /dev/sdo not /dev/sdk
Found duplicate PV LEYBZCSyOltHJhpojFE7vaZruMDpXBio: using /dev/sdp not /dev/sdl

Partial output of dmesg:

end_request: I/O error, dev sdf, sector 0
sd 1:0:1:2: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdf, sector 0
sd 1:0:1:2: SCSI error: return code = 0x00010000
end_request: I/O error, dev sdf, sector 524287992

Has anyone seen this before or can someone point me in the right direction to begin troubleshooting this? Thanks in advance!

robs
5 REPLIES 5
Ivan Ferreira
Honored Contributor

Re: Need Help Interpreting I/O Error Messages

It seems that your linux server is not correctly configured for multipath, as you are seeing "duplicate PV". Maybe your firmware upgrade changed the behaviour of the storage. Ensure that you have configured some multipath software.

Also, if you haven't configured multipath, during firmware upgrade and controller reboots, you will lose access to your disks.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
robs58
Occasional Advisor

Re: Need Help Interpreting I/O Error Messages

Ivan,

Thanks for your earlier reply!

There was an eva firmware upgrade at the time of the incident. Several other servers encountered the event but were able to recover. Not knowing the specifics of how or if multipath was configured, it appears that multipath is not configured or not configured properly on the server.

Among other things:

the multipath daemon is not configured to start on startup and is not running
log $ /sbin/chkconfig --list multipathd
multipathd 0:off 1:off 2:off 3:off 4:off 5:off 6:off

# /sbin/service multipathd status
multipathd is stopped

the /etc/multipath.conf file does not appear to have been touched and blacklists all devices:
etc $ more multipath.conf
# This is a basic configuration file with some examples, for device mapper
. . .
blacklist {
devnode "*"
}
. . .

not sure, but ├в multipath ├в ll ├в showing no output also indicates that multipath is not configured .

Is it possible that a reboot of the system would re-acquire the LUN connections?
Ivan Ferreira
Honored Contributor
Solution

Re: Need Help Interpreting I/O Error Messages

Yes it's possible, but the real solution is to configure multipath or this event will occur again.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
JM asks
Advisor

Re: Need Help Interpreting I/O Error Messages

a messages that did pop up AT the time of the firmware update:

kernel: rport-1:0-7: blocked FC remote port time out: saving binding
kernel: rport-1:0-6: blocked FC remote port time out: saving binding


robs58
Occasional Advisor

Re: Need Help Interpreting I/O Error Messages

Points have been assigned. Thanks for your assistance!