ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

DL180 G5 showing hard drive error messages

Matt Hymowitz
Occasional Contributor

DL180 G5 showing hard drive error messages

We have a 2 DL180 G5's on which we are trying to install Open E a linux based SAN. One machine installs no problem. The second machine shows hard drive errors when it trys to replicate with the first machine. Errors shown below, and adu report attached. Active chat folks said this is not a controller error based on the ADU report. I swapped out the drives and still show errors on bay 1 and bay 3, and the SAN refuses to replicate with the other machine. What would be my next troublshooting step?
Thanks to anyone who can help with this.

lost page write due to I/O error on dm-15 Buffer I/O error on device dm-15, logical block 498654 lost page write due to I/O error on dm-15 Buffer I/O error on device dm-15, logical block 498655 lost page write due to I/O error on dm-15 Buffer I/O error on device dm-15, logical block 498656 lost page write due to I/O error on dm-15 Buffer I/O error on device dm-15, logical block 498657 lost page write due to I/O error on dm-15 Buffer I/O error on device dm-15, logical block 498658 lost page write due to I/O error on dm-15 Buffer I/O error on device dm-15, logical block 498659 lost page write due to I/O error on dm-15 cciss: cmd ffff88007f88ea00 has CHECK CONDITION sense key = 0x3 cciss: cmd ffff88007f88f8a0 has CHECK CONDITION sense key = 0x3 end_request: I/O error, dev cciss/c0d1, sector 3794155552 cciss: cmd ffff88007f890260 has CHECK CONDITION sense key = 0x3 end_request: I/O error, dev cciss/c0d1, sector 3794156304 cciss: cmd ffff88007f890740 has CHECK CONDITION sense key = 0x3 end_request: I/O error, dev cciss/c0d1, sector 3794156560 cciss: cmd ffff88007f890c20 has CHECK CONDITION sense key = 0x3 end_request: I/O error, dev cciss/c0d1, sector 3794157056 cciss: cmd ffff88007f891370 has CHECK CONDITION sense key = 0x3 end_request: I/O error, dev cciss/c0d1, sector 3794157560 cciss: cmd ffff88007f8915e0 has CHECK CONDITION sense key = 0x3 end_request: I/O error, dev cciss/c0d1, sector 3794157808 cciss: cmd ffff88007f891fa0 has CHECK CONDITION sense key = 0x3 end_request: I/O error, dev cciss/c0d1, sector 3794158560 cciss: cmd ffff88007f892480 has CHECK CONDITION sense key = 0x3 end_request: I/O error, dev cciss/c0d1, sector 3794158816 cciss: cmd ffff88007f8926f0 has CHECK CONDITION sense key = 0x3 end_request: I/O error, dev cciss/c0d1, sector 3794159064 cciss: cmd ffff88007f892960 has CHECK CONDITION sense key = 0x3 end_request: I/O error, dev cciss/c0d1, sector 3794159312 attempt to access beyond end of device dm-15: rw=1, want=6522496192, limit=8388608 attempt to access beyond end of device dm-15: rw=1, want=29518927328, limit=8388608 attempt to access beyond end of device dm-15: rw=1, want=27051679792, limit=8388608 attempt to access beyond end of device dm-15: rw=1, want=7660311216, limit=8388608 __ratelimit: 21 callbacks suppressed Buffer I/O error on device dm-15, logical block 957538901 lost page write due to I/O error on dm-15 Aborting journal on device dm-15. ext3_abort called. EXT3-fs error (device dm-15): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only block drbd1: we had at least one MD IO ERROR during bitmap IO block drbd1: disk( Inconsistent -> Failed ) block drbd1: Local IO failed in bm_rw.Detaching... block drbd1: 1800 GB (471859200 bits) marked out-of-sync by on disk bit-map. block drbd1: conn( WFReportParams -> Disconnecting ) block drbd1: error receiving ReportState, l: 4! block drbd1: disk( Failed -> Diskless ) block drbd1: Notified peer that my disk is broken. 2011/03/23 15:21:59|CCISS controler /dev/cciss/c0d0 reported: Fatal drive error, Port: 1I Box: 1 Bay: 3. 2011/03/23 15:22:29|CCISS controler /dev/cciss/c0d0 reported: Fatal drive error, Port: 1I Box: 1 Bay: 0. 2011/03/23 15:23:29|CCISS controler /dev/cciss/c0d0 reported: Fatal drive error, Port: 1I Box: 1 Bay: 0. 2011/03/23 15:24:34|CCISS controler /dev/cciss/c0d0 reported: Fatal drive error, Port: 1I Box: 1 Bay: 0. 2011/03/23 15:25:04|CCISS controler /dev/cciss/c0d0 reported: Fatal drive error, Port: 1I Box: 1 Bay: 3. 2011/03/23 15:26:04|CCISS controler /dev/cciss/c0d0 reported: Fatal drive error, Port: 1I Box: 1 Bay: 0. 2011/03/23 15:27:34|CCISS controler /dev/cciss/c0d0 reported: Fatal drive error, Port: 1I Box: 1 Bay: 0. 2011/03/23 15:28:40|CCISS controler /dev/cciss/c0d0 reported: Fatal drive error, Port: 1I Box: 1 Bay: 3. 2011/03/23 15:28:40|CCISS controler /dev/cciss/c0d0 reported: Fatal drive error, Port: 1I Box: 1 Bay: 3. 2011/03/23 15:30:10|CCISS controler /dev/cciss/c0d0 reported: Fatal drive error, Port: 1I Box: 1 Bay: 3.
6 REPLIES
Michael A. McKenney
Respected Contributor

Re: DL180 G5 showing hard drive error messages

Upgrade the firmware on controller and drives. If it still shows failure replace it. I have seen firmware cause these errors with disk failure. HP will recommend the latest drive firmware first.
Matt Hymowitz
Occasional Contributor

Re: DL180 G5 showing hard drive error messages

Trying the firmware upgrade this afternoon. Thank you for response
Terry Hutchings
Honored Contributor

Re: DL180 G5 showing hard drive error messages

How much of a load is being placed on the drives and controllers. It is possible that these errors could be due to this solution being overworked.
The truth is out there, but I forgot the URL..
Matt Hymowitz
Occasional Contributor

Re: DL180 G5 showing hard drive error messages

The drives are doing a replication over 1 gig ethernet from a second dl180 G5, which works fine. I don't see how a another identical machine could generate enough traffic (through a gigE connection) to overload the raid controller. Not sure. What do you think?

Matt Hymowitz
Occasional Contributor

Re: DL180 G5 showing hard drive error messages

I upgraded the firmware today, and we are still seeing drive fatal errors on bay 0 and bay 3. I upgraded both the controller firmware and the drive firmware.
Michael A. McKenney
Respected Contributor

Re: DL180 G5 showing hard drive error messages

I have never seen a drive be overworked. I would say firmware on the drives needs to be upgraded. The RAID array should have identical drives with identical firmware. Make sure the server and controller firmware are on the same revision.