Disk Enclosures
1753272 Members
5087 Online
108792 Solutions
New Discussion

Write I/O error on log. drive

 
fawrell
Advisor

Write I/O error on log. drive

Hello.

I have some weird problem with one disk on DL385 with Adaptec 2200S SCSI controller. The BIOS/driver/firmware for the controller is the newest one. Everything started when I was copyring some data through ftp from another server. The server suddenly got frozen, we couldn't even log on it. After a while it was ok again but in /var/log/messages we saw tons of messages like this:

kernel: SCSI error : <0 0 3 0> return code = 0x8000002
kernel: Info fld=0x0, Current sdd: sense key Hardware Error
kernel: Additional sense: Internal target failure
kernel: end_request: I/O error, dev sdd, sector 163749039
kernel: Buffer I/O error on device sdd1, logical block 20468622
kernel: lost page write due to I/O error on sdd1

And because the Adaptec diagnostic utility marked the logical drive (the logical drive was not redundant, it contained only one physical disk) and the phys. drive as failed, we replaced the failed drive with the same but new drive (its HP 146GB drive, both have same firmware). The process of deleting old logical disk and creating new one was done without any error messages and we did it without restarting the system. Then we created partition on the log. disk with fdisk. All fine just few error messages in logs:

kernel: SCSI error : <0 0 3 0> return code = 0x8000002
kernel: Info fld=0x0, Current sdd: sense key Hardware Error
kernel: Additional sense: Internal target failure
kernel: end_request: I/O error, dev sdd, sector 286749480
kernel: Buffer I/O error on device sdd, logical block 35843685

All are identical (same sector/logical block). Because the disk has got same dev name as the disk before, and because the partition was created successfully we think that kernel had some old data and this triggered the error messages. When we were accessing the disk later with fdisk, no error mesages were logged. Then we created file system on the partition without any problems just again some messages in logs like this:

kernel: Buffer I/O error on device sdd1, logical block 143371968
kernel: lost page write due to I/O error on sdd1

But the fs was created successfully and the forced fs check didn't show any errors. We tried to copy 1GB of data onto it without problems. So we thank its all ok. But at the end of copying 70GB of data the system frozen again. We had to hard restart it and this was in logs:

kernel: Buffer I/O error on device sdd1, logical block 35834863
kernel: lost page write due to I/O error on sdd1

Lots of them, logical blocks are different. But the phys. and log. drive isn't marked as failed. No other error messages are logged (just fs ones). So why it can write almost 70GB of data without any problems but it fails at the last file? There are 3 other log. drives on the same controller and they are without any problem. It fails just on this one. The phys drive there is a new one, so I don't think it is broken. What may be the reason of the problem?

Sorry for the longer post. Ty for answers!
1 REPLY 1
fawrell
Advisor

Re: Write I/O error on log. drive

Hmm I did a bit more investigation and I found out that this failing logical drive has enabled write cache and the other 3 log. drives on the same controller not. Can this problem occur due to some problems with the cache? or can it be due to some kernel bug? Using Red Hat Enterprise Linux 4, kernel 2.6.9-34.ELsmp x64.