ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

HP DL380 G5 - Smart Array P400 - Linux hang with high load randomly

spectotech
Occasional Visitor

HP DL380 G5 - Smart Array P400 - Linux hang with high load randomly

Since 2-3 weeks now, my main server is hanging for no apparent reasons. Was working without issues for more than 4 months in a row before that. Everytime, a simple reboot fix the problem.

Current setup :

  • HP DL380 G5, 2 x Xeon 4C 3GHz, 16GB memory, 6 x 146GB in RAID 0+1
  • Slackware 14.0

I leave the server with PuTTy openned and top running, when it hang (about 1 to 3 times a day), I see a high load, about more than 60, all web services (HTTP, DNS, SMTP, IMAP, POP3, etc) are unresponsive. When connecting with PuTTy, I'm able to log, but the prompt never appears, same thing on local prompt (keyboard + screen). Also, I've seen that green LEDs on drives are flashing simultaneously at a frequency of about 0.5Hz - 1Hz (normally they flash way faster and in random order).

I first suspected DDoS attacks, etc, added many fail2ban validations, external firewall TCP requests limitations, etc. After, I verified firmwares versions (including P400), upgraded all to latest versions, the problem still occurs. I've also synched root to another DL380 G5 (same hardware except 4 x 450GB drives) to replace the server, same issue again.

Now I'm wondering if it could be the CCISS driver that could have an issue in the version I'm using?

Here are some informations that may be useful :

hpapucli

=> ctrl all show status

Smart Array P400 in Slot 1
Controller Status: OK
Cache Status: OK
Battery/Capacitor Status: OK

=> ctrl all show detail

Smart Array P400 in Slot 1
Bus Interface: PCI
Slot: 1
Serial Number: P61620G9SVM38V
Cache Serial Number: PA2270H9SVI198
RAID 6 (ADG) Status: Enabled
Controller Status: OK
Hardware Revision: D
Firmware Version: 6.86
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 15 secs
Surface Scan Mode: Idle
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Cache Ratio: 25% Read / 75% Write
Drive Write Cache: Disabled
Total Cache Size: 512 MB
Total Cache Memory Available: 464 MB
No-Battery Write Cache: Disabled
Cache Backup Power Source: Batteries
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True

=> ctrl all show config

Smart Array P400 in Slot 1 (sn: P61620G9SVM38V)

array A (SAS, Unused Space: 0 MB)


logicaldrive 1 (838.3 GB, RAID 1+0, OK)

physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SAS, 450 GB, OK)
physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SAS, 450 GB, OK)
physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SAS, 450 GB, OK)
physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SAS, 450 GB, OK)

root@hyperion:~# modinfo cciss
filename: /lib/modules/3.2.29/kernel/drivers/block/cciss.ko
license: GPL
version: 3.6.26
description: Driver for HP Smart Array Controllers
author: Hewlett-Packard Company
srcversion: D553A90CDE37829B37A9C27
alias: pci:v0000103Cd00003230sv0000103Csd0000323Dbc*sc*i*
alias: pci:v0000103Cd00003230sv0000103Csd00003237bc*sc*i*
alias: pci:v0000103Cd00003238sv0000103Csd00003215bc*sc*i*
alias: pci:v0000103Cd00003238sv0000103Csd00003214bc*sc*i*
alias: pci:v0000103Cd00003238sv0000103Csd00003213bc*sc*i*
alias: pci:v0000103Cd00003238sv0000103Csd00003212bc*sc*i*
alias: pci:v0000103Cd00003238sv0000103Csd00003211bc*sc*i*
alias: pci:v0000103Cd00003230sv0000103Csd00003235bc*sc*i*
alias: pci:v0000103Cd00003230sv0000103Csd00003234bc*sc*i*
alias: pci:v0000103Cd00003230sv0000103Csd00003223bc*sc*i*
alias: pci:v0000103Cd00003220sv0000103Csd00003225bc*sc*i*
alias: pci:v00000E11d00000046sv00000E11sd0000409Dbc*sc*i*
alias: pci:v00000E11d00000046sv00000E11sd0000409Cbc*sc*i*
alias: pci:v00000E11d00000046sv00000E11sd0000409Bbc*sc*i*
alias: pci:v00000E11d00000046sv00000E11sd0000409Abc*sc*i*
alias: pci:v00000E11d00000046sv00000E11sd00004091bc*sc*i*
alias: pci:v00000E11d0000B178sv00000E11sd00004083bc*sc*i*
alias: pci:v00000E11d0000B178sv00000E11sd00004082bc*sc*i*
alias: pci:v00000E11d0000B178sv00000E11sd00004080bc*sc*i*
alias: pci:v00000E11d0000B060sv00000E11sd00004070bc*sc*i*
depends:
intree: Y
vermagic: 3.2.29 SMP mod_unload
parm: cciss_tape_cmds:number of commands to allocate for tape devices (default: 6) (int)
parm: cciss_simple_mode:Use 'simple mode' rather than 'performant mode' (int)