cancel
Showing results for 
Search instead for 
Did you mean: 

Hpsa problems

skyice
Occasional Visitor

Hpsa problems

Hello,

 

My system worked perfectly for fews months but since 1-2 weeks I have a big issue. Sometimes, in /var/log/messages I got messages like that :

 

hpsa 0000:04:00.0: Abort request on C0:B0:T0:L0
hpsa 0000:04:00.0: Abort request on C0:B0:T0:L0
hpsa 0000:04:00.0: Abort request on C0:B0:T0:L0
hpsa 0000:04:00.0: Abort request on C0:B0:T0:L0
hpsa 0000:04:00.0: Abort request on C0:B0:T0:L0
hpsa 0000:04:00.0: Abort request on C0:B0:T0:L0
hpsa 0000:04:00.0: cp ffff880075976000 is reported invalid (probably means target device no longer present)
hpsa 0000:04:00.0: cp ffff880075976000 is reported invalid (probably means target device no longer present)
hpsa 0000:04:00.0: FAILED abort on device C0:B0:T0:L0
hpsa 0000:04:00.0: resetting device 0:0:0:0
hpsa 0000:04:00.0: device is ready.
hpsa 0000:04:00.0: Abort request on C0:B0:T0:L0
hpsa 0000:04:00.0: Abort request on C0:B0:T0:L0
hpsa 0000:04:00.0: Abort request on C0:B0:T0:L0
hpsa 0000:04:00.0: cp ffff880057314000 is reported invalid (probably means target device no longer present)
hpsa 0000:04:00.0: cp ffff880057314000 is reported invalid (probably means target device no longer present)
hpsa 0000:04:00.0: FAILED abort on device C0:B0:T0:L0
hpsa 0000:04:00.0: resetting device 0:0:0:0
hpsa 0000:04:00.0: device is ready.

 

This cause high load average when the problem appear ( 80+ ).

 

I run on Debian Wheezy stable and 3.13-0.bpo.1-amd64 kernel.

 

Any ideas ?

 

Thanks

6 REPLIES
JosePa
Occasional Visitor

Re: Hpsa problems

Hello.

I'm having the same problem on one of our HP Proliant DL385p G8 servers.

As anyone replied to this post or found what is causing this particular message?

Thanks

patrick_schaaf
Occasional Visitor

Re: Hpsa problems

I had exactly these symptoms on one box two months ago. First time it went away by power cycling and having somebody reseat all drives. But it reoccurred after a week. In the second installment, after half a day with these issues, finally the smartarray controller recognized a drive als faulty.... Got that drive replaced, everything fine since then.

Unfortunatelly these kernel messages do not give any hint regarding which drive exactly has the problems, and no other monitoring (smartctl, ILO) shows any hint either...

Half an hour ago, a different server has been making exactly the same noises.... Two times... with Linux (qemu processes here) getting "stuck" in CPU "wait" for some time (minutes). I/O rates were abysmal then. After an hour it finally recognized one of the drives as "Predictive Failure". But that didn't fail the drive completely - I/O rates stayed bad until I had the drive replaced.
Context here: servers were all DL380 Gen9 with the usual P440ar controller and 300 GB 10k SAS drives in RAID5 mode. Kernel is a self-built vanilla 3.14.67 at the moment, was a slightly earlier 3.14.x two month ago when the issue hit the first time

BTW: anybody know how I can, with hpssacli, throw a disk out of an array as a failure, without having a hot spare? Without a hotspare hpssacli remove always only tells me something about needing some license key...

abelliot
Occasional Visitor

Re: Hpsa problems

Hi,

Maybe same problem with HPE DL380gen9, hp 440ar

vshpere 6 update1, last HPE iso

firmware: 3.56

drivers:

the server was deploy 2 weeks ago. no problem when we install esxi and deploying VM.

but from this week, backup problem and users says that is slow. VM respond slowly and when I view disk performance on vpshere client, i have about 40ms of read latency. latency only on SAS disk (4 disk 600GB SAS 6GB 10k- raid5) , no latency on SSD drivers (raid1)

No hardware problem reporting from ilo

crystal disk mark performance on attachment

May be a ne firmware HP bug !!!!!!!!!!!!

Rom1kz
Occasional Visitor

Re: Hpsa problems

the same problem with DL380G8 server

[26485294.814356] hpsa 0000:24:00.0: Abort request on C3:B0:T0:L1
[26485294.814514] hpsa 0000:24:00.0: invalid command: LUN:0100004000000000 CDB:00000000600100000000000000000000
[26485294.814518] hpsa 0000:24:00.0: probably means device no longer present
[26485294.814599] hpsa 0000:24:00.0: invalid command: LUN:0100004000000000 CDB:00000000000001600000000000000000
[26485294.814603] hpsa 0000:24:00.0: probably means device no longer present
[26485294.814606] hpsa 0000:24:00.0: FAILED abort on device C3:B0:T0:L1
[26485294.814664] hpsa 0000:24:00.0: resetting device 3:0:0:1
[26485310.470969] hpsa 0000:24:00.0: device is ready.
[26485385.993390] hpsa 0000:24:00.0: Abort request on C3:B0:T0:L1
[26485385.993580] hpsa 0000:24:00.0: invalid command: LUN:0100004000000000 CDB:00000000d02a00000000000000000000
[26485385.993584] hpsa 0000:24:00.0: probably means device no longer present
[26485385.993666] hpsa 0000:24:00.0: invalid command: LUN:0100004000000000 CDB:0000000000002ad00000000000000000
[26485385.993669] hpsa 0000:24:00.0: probably means device no longer present
[26485385.993672] hpsa 0000:24:00.0: FAILED abort on device C3:B0:T0:L1
[26485385.993733] hpsa 0000:24:00.0: resetting device 3:0:0:1
[26485398.801924] hpsa 0000:24:00.0: device is ready.

 

whe the problem occurred no any disk operation possible

server totally stuck down

 

any workaround or any way to understand which disk is broken ?

Rom1kz
Occasional Visitor

Re: Hpsa problems

I've resolved the problem

using smartctl i got the disk whick has lot of errors

then i just replace it and problem is gone

meteozond
Occasional Visitor

Re: Hpsa problems

Sorry for necroposting but, we've got this problem too, with retired p812. Digging into hpsa source gave us that strange C1:B1:T0:L3 notation is similar to scsi device notation without letters - 1:1:0:3

> lsscsi

[0:0:0:0]    storage HP       P410i            6.64  -        
[0:1:0:0]    disk    HP       LOGICAL VOLUME   6.64  /dev/sda
[0:1:0:1]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdb
[0:1:0:2]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdc
[0:1:0:3]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdd
[0:1:0:4]    disk    HP       LOGICAL VOLUME   6.64  /dev/sde
[0:1:0:5]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdf
[1:0:0:0]    storage HP       P812             6.64  -        
[1:1:0:0]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdg
[1:1:0:1]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdh
[1:1:0:2]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdi
[1:1:0:3]    disk    HP       LOGICAL VOLUME   6.64  /dev/sdj

or

ls /sys/class/scsi_device/1:1:0:3/device/block/

sdj

Phisical device behind raid can be easily identified by "Disk Name:" value from hpacucli or hpssacli output.