Operating System - Linux
1758250 Members
2731 Online
108868 Solutions
New Discussion

a lot of "Buffer I/O error" and "end request, I/O error sector 0" messages on all Linux hosts

 
SOLVED
Go to solution
itai weisman
Super Advisor

a lot of "Buffer I/O error" and "end request, I/O error sector 0" messages on all Linux hosts

Hello,
I found the following error message in three linux hosts on my environment:

Nov 23 18:13:21 prod-clal1 kernel: end_request: I/O error, dev sdef, sector 0 Nov 23 18:13:21 prod-clal1 kernel: Buffer I/O error on device sdef, logical block 0

these errors are encountered constantly, regardless of system boot time, any activity on the backend array side, it is not specific to any LUN, LUN path, HBA, switch port, array port, time of the day etc. the only thing these hosts have in common (except to identical configuration) is the storage array and SAN switch they are connected to. I checked (with EMC) these switches and array and found no problem. on other system connected to this array (with diffrent OS types) I found no error.
I can't say there is any impact on these hosts beside the mass of messages in the messages file.
configuration: (fully supported by HP and EMC)
all hosts are running Enterprise Linux AS release 4 (Nahant Update 5), with kernel version of 2.6.9-55.EL (SMP) - built by gcc 3.4.6, running on Integrity rx6600 servers,
having HP FC2143 4Gb PCI-X 2.0 HBA (Emulex)
with driver version of 8.0.16.27.
these hosts have EMC PowerPath 5.3 SP1 installed,
storage array is EMC Symmetrix V-Max with microcode of 5874
any ideas?
Itai
1 REPLY 1
Matti_Kurkela
Honored Contributor
Solution

Re: a lot of "Buffer I/O error" and "end request, I/O error sector 0" messages on all Linux hosts

Perhaps something on the system is probing a dead FC path? Please run "powermt check" and allow it to remove dead paths.

If the messages are about a LUN that has been unpresented, you could use "echo 1 > /sys/block//device/delete" to tell the kernel that the LUN/path is gone and won't be coming back. A reboot would also work in this case.

Example: the log messages you posted are about device "sdef". So, if you know /dev/sdef used to refer to a LUN that has been unpresented recently, first run "powermt check", then run:

echo 1 > /sys/block/sdef/device/delete

Is the failing device a real LUN, or a Symmetrix gatekeeper device or similar? If you have some hardware health monitoring program that attempts to probe all disks, the gatekeeper devices might not respond the same as real storage LUNs.

You might want to exclude the gatekeepers (and other Symmetrix special LUNs) from monitoring if that is the case: how to do that depends on which monitoring program you're using.

MK
MK