ProLiant Servers (ML,DL,SL)
1826376 Members
4612 Online
109692 Solutions
New Discussion

Re: hpsa driver reset messages

 
DekPlen
Advisor

hpsa driver reset messages

Evening all.

Our group has 1 x dl380p GEN8s and  2x  DL380e With a mixture of Samsung and Crucial SSD drives

All 3 Systems and running ProxMox 7.4.1 based on Debian 12

If a VM is performing Very High IO such as restoring a postgres database from a GZIP file or if dd'ing a LVM volume to or from itself I will occassionally see the reset errors, such as:

root@hlvbp011:/etc/pki/tls/certs# grep hpsa /var/log/messages
Feb 28 11:43:45 hlvbp011 kernel: [9133678.782746] hpsa 0000:0a:00.0: scsi 0:1:0:1: resetting logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1
Feb 28 11:43:54 hlvbp011 kernel: [9133687.991815] hpsa 0000:0a:00.0: device is ready.
Feb 28 11:43:54 hlvbp011 kernel: [9133687.991832] hpsa 0000:0a:00.0: scsi 0:1:0:1: reset logical completed successfully Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1
Feb 28 12:07:29 hlvbp011 kernel: [9135103.286633] hpsa 0000:0a:00.0: scsi 0:1:0:1: resetting logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1
Feb 28 12:07:31 hlvbp011 kernel: [9135105.318671] hpsa 0000:0a:00.0: device is ready.
Feb 28 12:07:31 hlvbp011 kernel: [9135105.318687] hpsa 0000:0a:00.0: scsi 0:1:0:1: reset logical completed successfully Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1
Feb 28 12:08:40 hlvbp011 kernel: [9135174.163108] hpsa 0000:0a:00.0: scsi 0:1:0:1: resetting logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1
Feb 28 12:08:50 hlvbp011 kernel: [9135184.393227] hpsa 0000:0a:00.0: device is ready.
Feb 28 12:08:50 hlvbp011 kernel: [9135184.393245] hpsa 0000:0a:00.0: scsi 0:1:0:1: reset logical completed successfully Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1
Feb 28 12:17:09 hlvbp011 kernel: [9135682.660967] hpsa 0000:0a:00.0: scsi 0:1:0:1: resetting logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1
Feb 28 12:17:15 hlvbp011 kernel: [9135688.793320] hpsa 0000:0a:00.0: device is ready.
Feb 28 12:17:15 hlvbp011 kernel: [9135688.793336] hpsa 0000:0a:00.0: scsi 0:1:0:1: reset logical completed successfully Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1
Feb 28 12:51:37 hlvbp011 kernel: [9137750.956423] hpsa 0000:0a:00.0: scsi 0:1:0:1: resetting logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1
Feb 28 12:51:39 hlvbp011 kernel: [9137752.988591] hpsa 0000:0a:00.0: device is ready.
Feb 28 12:51:39 hlvbp011 kernel: [9137752.988606] hpsa 0000:0a:00.0: scsi 0:1:0:1: reset logical completed successfully Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap- En- Exp=1

The logical volumes are::


root@hlvbp011:/etc/pki/tls/certs# lsscsi
[0:0:0:0] storage HP P822 8.00 -
[0:1:0:0] disk HP LOGICAL VOLUME 8.00 /dev/sdb
[0:1:0:1] disk HP LOGICAL VOLUME 8.00 /dev/sdc
[0:1:0:2] disk HP LOGICAL VOLUME 8.00 /dev/sdd
[0:1:0:3] disk HP LOGICAL VOLUME 8.00 /dev/sda

 


And logical PVS

 

root@hlvbp011:/etc/pki/tls/certs# pvs
PV VG Fmt Attr PSize PFree
/dev/sda3 pve lvm2 a-- <930.48g <16.00g
/dev/sdb2 VolGroup00 lvm2 a-- <930.99g <894.49g
/dev/sdc1 VolGroup01 lvm2 a-- <5.46t 364.86g
/dev/sdd3 pve1 lvm2 a-- <930.48g <17.29g

 

root@hlvbp011:/etc/pki/tls/certs# hpssacli ctrl slot=2 pd all show

Smart Array P822 in Slot 2

Array A

physicaldrive 5I:1:13 (port 5I:box 1:bay 13, SATA HDD, 1 TB, OK)
physicaldrive 5I:1:14 (port 5I:box 1:bay 14, SATA HDD, 1 TB, OK)

Array B

physicaldrive 5I:1:1 (port 5I:box 1:bay 1, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:2 (port 5I:box 1:bay 2, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:3 (port 5I:box 1:bay 3, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:4 (port 5I:box 1:bay 4, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:5 (port 5I:box 1:bay 5, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:6 (port 5I:box 1:bay 6, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:9 (port 5I:box 1:bay 9, SATA SSD, 1 TB, OK)

Array C

physicaldrive 5I:1:11 (port 5I:box 1:bay 11, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:12 (port 5I:box 1:bay 12, SATA SSD, 1 TB, OK)

Array D

physicaldrive 5I:1:7 (port 5I:box 1:bay 7, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:8 (port 5I:box 1:bay 8, SATA SSD, 1 TB, OK)

Controller  firmware

root@hlvbp011:/etc/pki/tls/certs# hpssacli ctrl slot=2 show detail

Smart Array P822 in Slot 2
Bus Interface: PCI
Slot: 2
RAID 6 (ADG) Status: Enabled
Controller Status: OK
Hardware Revision: B
Firmware Version: 8.32-0
Rebuild Priority: Low
Expand Priority: Medium
Surface Scan Delay: 15 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: No
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Cache Ratio: 10% Read / 90% Write
Drive Write Cache: Disabled
Total Cache Size: 2.0
Total Cache Memory Available: 1.8
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: False
SSD Caching Version: 1
Cache Backup Power Source: Capacitors
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 69
Cache Module Temperature (C): 30
Capacitor Temperature (C): 21
Number of Ports: 6 (2 Internal / 4 External )
Encryption: Not Set
Driver Name: hpsa
Driver Version: 3.4.20
Driver Supports SSD Smart Path: True
PCI Address (Domain:Bus:Device.Function): 0000:0A:00.0
Port Max Phy Rate Limiting Supported: False
Host Serial Number: XXXXXXXXXXX
Sanitize Erase Supported: False
Primary Boot Volume: logicaldrive 4 (600508B1001C3A7DAAB1BE2D0CA5DED7)
Secondary Boot Volume: None

hpsa driver and version


root@hlvbp011:/etc/pki/tls/certs# lsmod | grep -i hpsa
hpsa 118784 4
scsi_transport_sas 45056 1 hpsa


root@hlvbp011:/etc/pki/tls/certs# modinfo hpsa
filename: /lib/modules/5.15.108-1-pve/kernel/drivers/scsi/hpsa.ko
alias: cciss
license: GPL
version: 3.4.20-200
description: Driver for HP Smart Array Controller version 3.4.20-200
author: Hewlett-Packard Company
srcversion: 62AA170584409D6A0D9EBD7
alias: pci:v00000E11d*sv*sd*bc01sc04i*
alias: pci:v0000103Cd*sv*sd*bc01sc04i*
alias: pci:v0000103Cd0000333Fsv0000103Csd0000333Fbc*sc*i*
alias: pci:v00001590d00000075sv00001590sd00000088bc*sc*i*
alias: pci:v00001590d00000075sv00001590sd0000007Dbc*sc*i*
alias: pci:v00001590d00000075sv00001590sd00000087bc*sc*i*
alias: pci:v00001590d00000075sv00001590sd00000076bc*sc*i*
alias: pci:v00009005d00000290sv00009005sd00000585bc*sc*i*
alias: pci:v00009005d00000290sv00009005sd00000584bc*sc*i*
alias: pci:v00009005d00000290sv00009005sd00000583bc*sc*i*
alias: pci:v00009005d00000290sv00009005sd00000582bc*sc*i*
alias: pci:v00009005d00000290sv00009005sd00000581bc*sc*i*
alias: pci:v00009005d00000290sv00009005sd00000580bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021CEbc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021CDbc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021CCbc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021CBbc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021CAbc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021C9bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021C8bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021C7bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021C6bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021C5bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021C4bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021C3bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021C2bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021C1bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021C0bc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021BFbc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021BEbc*sc*i*
alias: pci:v0000103Cd00003239sv0000103Csd000021BDbc*sc*i*
alias: pci:v0000103Cd0000323Csv0000103Csd00001929bc*sc*i*
alias: pci:v0000103Cd0000323Csv0000103Csd00001928bc*sc*i*
alias: pci:v0000103Cd0000323Csv0000103Csd00001926bc*sc*i*
alias: pci:v0000103Cd0000323Csv0000103Csd00001925bc*sc*i*
alias: pci:v0000103Cd0000323Csv0000103Csd00001924bc*sc*i*
alias: pci:v0000103Cd0000323Csv0000103Csd00001923bc*sc*i*
alias: pci:v0000103Cd0000323Csv0000103Csd00001922bc*sc*i*
alias: pci:v0000103Cd0000323Csv0000103Csd00001921bc*sc*i*
alias: pci:v0000103Cd0000323Csv0000103Csd00001920bc*sc*i*
alias: pci:v0000103Cd0000323Bsv0000103Csd00003356bc*sc*i*
alias: pci:v0000103Cd0000323Bsv0000103Csd00003355bc*sc*i*
alias: pci:v0000103Cd0000323Bsv0000103Csd00003354bc*sc*i*
alias: pci:v0000103Cd0000323Bsv0000103Csd00003353bc*sc*i*
alias: pci:v0000103Cd0000323Bsv0000103Csd00003352bc*sc*i*
alias: pci:v0000103Cd0000323Bsv0000103Csd00003351bc*sc*i*
alias: pci:v0000103Cd0000323Bsv0000103Csd00003350bc*sc*i*
alias: pci:v0000103Cd0000323Asv0000103Csd00003233bc*sc*i*
alias: pci:v0000103Cd0000323Asv0000103Csd0000324Bbc*sc*i*
alias: pci:v0000103Cd0000323Asv0000103Csd0000324Abc*sc*i*
alias: pci:v0000103Cd0000323Asv0000103Csd00003249bc*sc*i*
alias: pci:v0000103Cd0000323Asv0000103Csd00003247bc*sc*i*
alias: pci:v0000103Cd0000323Asv0000103Csd00003245bc*sc*i*
alias: pci:v0000103Cd0000323Asv0000103Csd00003243bc*sc*i*
alias: pci:v0000103Cd0000323Asv0000103Csd00003241bc*sc*i*
depends: scsi_transport_sas
retpoline: Y
intree: Y
name: hpsa
vermagic: 5.15.108-1-pve SMP mod_unload modversions
parm: hpsa_simple_mode:Use 'simple mode' rather than 'performant mode' (int)

 

I have smart utils installed but self tests and disk stats do not point out a disk has having an issue, one are two are slightly higher in temp but that's it.

 

Has anyone else encountered these errors before?

I saw similar on Centos 7 last year and the problem took 3 months to resolve by swapping one disk at a time out of a 12Tb  12 disk array allowing to rebuild and waiting for the error to occur, again. WHen this occured last time the time between initiating the reset and for the reset to complete could take up to a minute for that RAID5 array, And this caused hung task process issues/messages on both the guest and KVM Server until the reset was completed.

Does anyone have any pointers as to how to investigate this. Is this a disk / driver or controllers issue. Or perhaps a VM config/throughput config issue perhaps?

 

Thanks for your time and any possible help / suggestions.

 

Dek

 

9 REPLIES 9
DekPlen
Advisor

Re: hpsa driver reset messages

I have a created 3 x 3 disk raid 0 Volumes  now and running the jobs which provoke the error on the 3 volumes one at a time to try and elminate a bad disk.

But any hints would be helpful.

 

Thanks

 

Dek

DekPlen
Advisor

Re: hpsa driver reset messages

So having run numerous tests to provoke the hpsa resets, I could not do it with raid0 volumes.

I decided to create a 6 disk RAID5 again:

hpssacli ctrl slot=2 pd all show

Smart Array P822 in Slot 2

Array A

physicaldrive 5I:1:1 (port 5I:box 1:bay 1, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:2 (port 5I:box 1:bay 2, SATA SSD, 1 TB, OK)

Array B

physicaldrive 5I:1:4 (port 5I:box 1:bay 4, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:5 (port 5I:box 1:bay 5, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:6 (port 5I:box 1:bay 6, SATA SSD, 1 TB, OK)

Array C

physicaldrive 5I:1:7 (port 5I:box 1:bay 7, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:8 (port 5I:box 1:bay 8, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:9 (port 5I:box 1:bay 9, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:10 (port 5I:box 1:bay 10, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:11 (port 5I:box 1:bay 11, SATA SSD, 1 TB, OK)
physicaldrive 5I:1:12 (port 5I:box 1:bay 12, SATA SSD, 1 TB, OK)

root@hlvbp012:~# hpssacli ctrl slot=2 ld all show

Smart Array P822 in Slot 2

Array A

logicaldrive 1 (931.48 GB, RAID 1, OK)

Array B

logicaldrive 2 (2.73 TB, RAID 0, OK)

Array C

logicaldrive 3 (4.55 TB, RAID 5, OK)

 

On the first attempt to provoke the error within minutes I saw:

Mar 1 23:38:13 hlvbp012 kernel: [29319.435030] hpsa 0000:0a:00.0: scsi 0:1:0:2: resetting logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap+ En+ Exp=1
Mar 1 23:38:16 hlvbp012 kernel: [29321.489518] hpsa 0000:0a:00.0: device is ready.
Mar 1 23:38:16 hlvbp012 kernel: [29321.489531] hpsa 0000:0a:00.0: scsi 0:1:0:2: reset logical completed successfully Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap+ En+ Exp=1
Mar 1 23:46:43 hlvbp012 kernel: [29828.967911] hpsa 0000:0a:00.0: scsi 0:1:0:2: resetting logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap+ En+ Exp=1
Mar 1 23:46:45 hlvbp012 kernel: [29831.012037] hpsa 0000:0a:00.0: device is ready.
Mar 1 23:46:45 hlvbp012 kernel: [29831.012052] hpsa 0000:0a:00.0: scsi 0:1:0:2: reset logical completed successfully Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap+ En+ Exp=1
Mar 2 00:00:48 hlvbp012 kernel: [30674.157714] hpsa 0000:0a:00.0: scsi 0:1:0:2: resetting logical Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap+ En+ Exp=1
Mar 2 00:00:50 hlvbp012 kernel: [30676.194459] hpsa 0000:0a:00.0: device is ready.
Mar 2 00:00:50 hlvbp012 kernel: [30676.194469] hpsa 0000:0a:00.0: scsi 0:1:0:2: reset logical completed successfully Direct-Access HP LOGICAL VOLUME RAID-5 SSDSmartPathCap+ En+ Exp=1

 

I may try a RAID 1+0 for completeness But on the face of it, it appears that when the disks are arranged in a RAID5 config, I can reproduce the errors

 

 

DekPlen
Advisor

Re: hpsa driver reset messages

I created a 1+0 LD  and cannot reproduce the hpsa reset issue despite loading the KVM Server and a few VMs performing the intensive disk operations.

So there seems to be something with RAID5.

Does anyone any clues or insight as to what could be the issue ?

Confused

 

Dek

Tam92
HPE Pro

Re: hpsa driver reset messages

Hello,

This could be because RAID 5 has parity and RAID 10 does not have.

Looks like this issue is with the controller firmware/driver. Please make sure it is updated to the latest and the OS installed is supported.

https://techlibrary.hpe.com/us/en/enterprise/servers/supportmatrix/

Thanks,
TAM



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DekPlen
Advisor

Re: hpsa driver reset messages

Hi TAM,

 

The P822 controller firmware is 8-32.0 and the hpsa driver 3.4.20-200.

Are you aware  of any later revisions at all for the above please?

Thanks and Regards

Derek

Tam92
HPE Pro

Re: hpsa driver reset messages

Hello,

The firmware is running on the latest version.

We do not have drivers for Debain OS.. May be you need to get in touch with the OS support to check if this is a driver issue.

Below drivers are available for this server 

https://support.hpe.com/connect/s/product?language=en_US&cep=on&kmpmoid=5177957&tab=driversAndSoftware&driversAndSoftwareFilter=8000113&driversAndSoftwareSubtype=9000214&environmentType=2200022

Thanks,
TAM



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DekPlen
Advisor

Re: hpsa driver reset messages

Thanks TAM,

I will take a look and see if there is any source available to perhaps compile.

 

Debian 11 and 12 (ubuntu) is the underlying OS for the Proxmox

May have to look at some other method of resilience if the controller is put into HBA mode perhaps

 

Thanks again, I will investigate and report back

 

Tam92
HPE Pro

Re: hpsa driver reset messages

Noted



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
DekPlen
Advisor

Re: hpsa driver reset messages

 

Whilst I have the machine to myself I decided to create a 4 disk RAID5 with the samsung 1Tb SSDs and 3 disk RAID5 with the magnetic disks

I cannot produce the same reset issues with the RAID 5 magnetic disks without hours of repeated testing  , so this issue seems specific to

hpsa driver

SSDs in RAID 1,5,6 config and not RAID0 (so far)

In fact just a web search of just the reset messages (without hpsa and HP in the terms) ALL returned results are with hpsa and SSDSmartPathCap

I will continue to resolve or find alternative solution.

Thanks Dek