ProLiant Servers (ML,DL,SL)
1820478 Members
2959 Online
109624 Solutions
New Discussion юеВ

Microserver Gen8 with P222 + debian testing. (5.15 kernel)

 
BunnyPon
Regular Advisor

Microserver Gen8 with P222 + debian testing. (5.15 kernel)

For those of you that have one of these and try to install, it seems that something the installer does crashes the p222 quite reliably. So do be careful upgrading as you might brick yourself.

5.10 works (bullseye, 11.3)

5.15 will kill the raid card until you reset.

About the only hints I have are this from dmesg on 5.10, the crash, when I saw it was definitely reated to DMAR and "DMA PTE vof vPFN

This appears to be someone with the same problem: 

https://forum.proxmox.com/threads/kernel-5-15-30-2-break-hpe-smart-array-p222.109298/

 

[ 0.054966] DMAR: Host address width 39
[ 0.054967] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[ 0.054970] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f010da
[ 0.054971] DMAR: RMRR base: 0x000000f1ffd000 end: 0x000000f1ffffff
[ 0.054972] DMAR: RMRR base: 0x000000f1ff6000 end: 0x000000f1ffcfff
[ 0.054973] DMAR: RMRR base: 0x000000f1f93000 end: 0x000000f1f94fff
[ 0.054973] DMAR: RMRR base: 0x000000f1f8f000 end: 0x000000f1f92fff
[ 0.054974] DMAR: RMRR base: 0x000000f1f7f000 end: 0x000000f1f8efff
[ 0.054974] DMAR: RMRR base: 0x000000f1f7e000 end: 0x000000f1f7efff
[ 0.054975] DMAR: RMRR base: 0x000000000f4000 end: 0x000000000f4fff
[ 0.054975] DMAR: RMRR base: 0x000000000e8000 end: 0x000000000e8fff
[ 0.054976] DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000000e8000-0x00000000000e8fff], contact BIOS vendor for fixes
[ 0.055041] DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000000e8000-0x00000000000e8fff]
BIOS vendor: HP; Ver: J06; Product Version:
[ 0.055042] DMAR: RMRR base: 0x000000f1dee000 end: 0x000000f1deefff
[ 0.055044] DMAR-IR: IOAPIC id 8 under DRHD base 0xfed90000 IOMMU 0
[ 0.055044] DMAR-IR: HPET id 0 under DRHD base 0xfed90000
[ 0.055045] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
[ 0.055045] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.
[ 0.055262] DMAR-IR: Enabled IRQ remapping in xapic mode
[ 0.055263] x2apic: IRQ remapping doesn't support X2APIC mode

[ 0.076966] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)

[ 0.872471] ACPI Warning: SystemIO range 0x0000000000000928-0x000000000000092F conflicts with OpRegion 0x0000000000000920-0x000000000000092F (\SGPE) (20200925/utaddress-204)

[ 0.873304] HP HPSA Driver (v 3.4.20-200)
[ 0.873312] hpsa 0000:07:00.0: can't disable ASPM; OS doesn't have ASPM control
[ 0.874176] hpsa 0000:07:00.0: Logical aborts not supported
[ 0.874178] hpsa 0000:07:00.0: HP SSD Smart Path aborts not supported

[ 0.935095] scsi host0: hpsa
[ 0.935294] hpsa can't handle SMP requests

[ 0.960542] hpsa 0000:07:00.0: scsi 0:0:0:0: added RAID HP P222 controller SSDSmartPathCap- En- Exp=1
[ 0.960545] hpsa 0000:07:00.0: scsi 0:0:1:0: masked Direct-Access ATA HP SSD S700 500G PHYS DRV SSDSmartPathCap- En- Exp=0
[ 0.960547] hpsa 0000:07:00.0: scsi 0:0:2:0: masked Direct-Access ATA HP SSD S700 500G PHYS DRV SSDSmartPathCap- En- Exp=0
[ 0.960550] hpsa 0000:07:00.0: scsi 0:0:3:0: masked Direct-Access ATA MB0500GCEHE PHYS DRV SSDSmartPathCap- En- Exp=0
[ 0.960552] hpsa 0000:07:00.0: scsi 0:0:4:0: masked Direct-Access ATA MB0500GCEHE PHYS DRV SSDSmartPathCap- En- Exp=0
[ 0.960554] hpsa 0000:07:00.0: scsi 0:0:5:0: masked Enclosure PMCSIERA SRCv8x6G enclosure SSDSmartPathCap- En- Exp=0
[ 0.960556] hpsa 0000:07:00.0: scsi 0:1:0:0: added Direct-Access HP LOGICAL VOLUME RAID-1(+0) SSDSmartPathCap+ En+ Exp=1
[ 0.960558] hpsa 0000:07:00.0: scsi 0:1:0:1: added Direct-Access HP LOGICAL VOLUME RAID-1(+0) SSDSmartPathCap- En- Exp=1
[ 0.960616] hpsa can't handle SMP requests
[ 0.960754] scsi 0:0:0:0: RAID HP P222 8.00 PQ: 0 ANSI: 5
[ 0.960927] scsi 0:1:0:0: Direct-Access HP LOGICAL VOLUME 8.00 PQ: 0 ANSI: 5
[ 0.961091] scsi 0:1:0:1: Direct-Access HP LOGICAL VOLUME 8.00 PQ: 0 ANSI: 5

[ 3.939754] power_meter ACPI000D:00: Found ACPI power meter.
[ 3.939785] power_meter ACPI000D:00: Ignoring unsafe software power cap!
[ 3.939789] power_meter ACPI000D:00: hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().

 

[ 4.297353] ACPI Error: AE_NOT_EXIST, Returned by Handler for [IPMI] (20200925/evregion-293)
[ 4.297357] ACPI Error: Region IPMI (ID=7) has no handler (20200925/exfldio-261)
[ 4.297361] ACPI Error: Aborting method \_SB.PMI0._PMM due to previous error (AE_NOT_EXIST) (20200925/psparse-529)
[ 4.297369] ACPI Error: AE_NOT_EXIST, Evaluating _PMM (20200925/power_meter-325)

[ 4.575012] ipmi_si IPI0001:00: The BMC does not support clearing the recv irq bit, compensating, but the BMC needs to be fixed.
[ 4.818471] ipmi_si IPI0001:00: IPMI message handler: Found new BMC (man_id: 0x00000b, prod_id: 0x2000, dev_id: 0x13)
[ 4.904921] ipmi_si IPI0001:00: IPMI kcs interface initialized

It would be really nice if HP would fix their BIOS.

 

 

 

I can't Cat Today.
6 REPLIES 6
support_s
System Recommended
BunnyPon
Regular Advisor

Re: Query: Microserver Gen8 with P222 + debian testing. (5.15 kernel)

Greetings, robot.

I think the former advisory needs to be updated and the latter, this machine does not have the power to run VMs, it's just a toy fileserver and perfect for that. So likely "intel_iommu=off" will fix things...

Of the two, the former talks about linux kernel 2.6.18, 

This is likely some sort of change in the linux 5.<some number> kernel which breaks hpsa. So I am hoping that this post acts as a heads up when people find their machine dies.

Perhaps 8.32 will help, but I doubt it. 

 

I can't Cat Today.
BunnyPon
Regular Advisor

Re: Microserver Gen8 with P222 + debian testing. (5.15 kernel)

That was with p222 firmware 8.00,  Did anything change with firmware 8.32? quick check of dmesg. Compare and contrast with the other microserver too, that has an i3-3240 as opposed to a Xeon v2

xe:[ 0.005330] ACPI: Reserving DMAR table memory at [mem 0xf1de4a80-0xf1de4e33]

i3: [ 0.005727] ACPI: Reserving FFFF table memory at [mem 0xf1de4a80-0xf1de6393]

It seems that the i3 has no concept of DMAR at all. There is nothing.

i3:

[ 0.057747] APIC: Switch to symmetric I/O mode setup
[ 0.057750] Switched APIC routing to physical flat.

xe:

[ 0.054902] APIC: Switch to symmetric I/O mode setup
[ 0.054903] DMAR: Host address width 39
[ 0.054904] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[ 0.054907] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f010da
[ 0.054908] DMAR: RMRR base: 0x000000f1ffd000 end: 0x000000f1ffffff
[ 0.054909] DMAR: RMRR base: 0x000000f1ff6000 end: 0x000000f1ffcfff
[ 0.054910] DMAR: RMRR base: 0x000000f1f93000 end: 0x000000f1f94fff
[ 0.054910] DMAR: RMRR base: 0x000000f1f8f000 end: 0x000000f1f92fff
[ 0.054911] DMAR: RMRR base: 0x000000f1f7f000 end: 0x000000f1f8efff
[ 0.054911] DMAR: RMRR base: 0x000000f1f7e000 end: 0x000000f1f7efff
[ 0.054912] DMAR: RMRR base: 0x000000000f4000 end: 0x000000000f4fff
[ 0.054912] DMAR: RMRR base: 0x000000000e8000 end: 0x000000000e8fff
[ 0.054913] DMAR: [Firmware Bug]: No firmware reserved region can cover this RMRR [0x00000000000e8000-0x00000000000e8fff], contact BIOS vendor for fixes
[ 0.054976] DMAR: [Firmware Bug]: Your BIOS is broken; bad RMRR [0x00000000000e8000-0x00000000000e8fff]
BIOS vendor: HP; Ver: J06; Product Version:
[ 0.054977] DMAR: RMRR base: 0x000000f1dee000 end: 0x000000f1deefff
[ 0.054978] DMAR-IR: IOAPIC id 8 under DRHD base 0xfed90000 IOMMU 0
[ 0.054979] DMAR-IR: HPET id 0 under DRHD base 0xfed90000
[ 0.054980] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
[ 0.054980] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.
[ 0.055200] DMAR-IR: Enabled IRQ remapping in xapic mode
[ 0.055201] x2apic: IRQ remapping doesn't support X2APIC mode
[ 0.055204] Switched APIC routing to physical flat.

common:

[ 0.076909] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)

A bit further down. now 0.6 seconds in, another DMAR difference where the i3 has no concept thereof:

i3

[ 1.088670] Freeing initrd memory: 33328K
[ 1.088679] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)

xe:

[ 0.643638] Freeing initrd memory: 27504K
[ 0.643679] DMAR: No ATSR found
[ 0.643842] DMAR: dmar0: Using Queued invalidation
[ 0.643888] pci 0000:00:00.0: Adding to iommu group 0
[ 0.643899] pci 0000:00:01.0: Adding to iommu group 1
[ 0.643907] pci 0000:00:06.0: Adding to iommu group 2
[ 0.643913] pci 0000:00:1a.0: Adding to iommu group 3
[ 0.643919] pci 0000:00:1c.0: Adding to iommu group 4
[ 0.643926] pci 0000:00:1c.4: Adding to iommu group 5
[ 0.643932] pci 0000:00:1c.6: Adding to iommu group 6
[ 0.643939] pci 0000:00:1c.7: Adding to iommu group 7
[ 0.643945] pci 0000:00:1d.0: Adding to iommu group 8
[ 0.643951] pci 0000:00:1e.0: Adding to iommu group 9
[ 0.643961] pci 0000:00:1f.0: Adding to iommu group 10
[ 0.643968] pci 0000:00:1f.2: Adding to iommu group 10
[ 0.643971] pci 0000:07:00.0: Adding to iommu group 1
[ 0.643981] pci 0000:03:00.0: Adding to iommu group 11
[ 0.643987] pci 0000:03:00.1: Adding to iommu group 11
[ 0.643993] pci 0000:04:00.0: Adding to iommu group 12
[ 0.644008] pci 0000:01:00.0: Adding to iommu group 13
[ 0.644014] pci 0000:01:00.1: Adding to iommu group 13
[ 0.644020] pci 0000:01:00.2: Adding to iommu group 13
[ 0.644026] pci 0000:01:00.4: Adding to iommu group 13
[ 0.648376] DMAR: Intel(R) Virtualization Technology for Directed I/O
[ 0.648378] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)

An HPSA warning, this is common to both of them.

[ 1.406609] HP HPSA Driver (v 3.4.20-200)
[ 1.406621] hpsa 0000:07:00.0: can't disable ASPM; OS doesn't have ASPM control
[ 1.407538] hpsa 0000:07:00.0: Logical aborts not supported
[ 1.407540] hpsa 0000:07:00.0: HP SSD Smart Path aborts not supported

A mysterious warning on the Xe

[ 1.137164] DMAR: DRHD: handling fault status reg 2
[ 1.137230] DMAR: [INTR-REMAP] Request device [01:00.0] fault index 55 [fault reason 38] Blocked an interrupt request due to source-id verification failure

 

No other differences regardless of the HPSA version.

I think I should get another microserver gen8 for testing.

 

I can't Cat Today.
ksram
HPE Pro

Re: Microserver Gen8 with P222 + debian testing. (5.15 kernel)

Hi,

Thank you for the Post.

May we please know if you have tried enabling SR IOV / Disable IOMMU also Disable "HP Shared Memory Features"
Check if that helps.
Also confirm if you have updated the Firmware / Driver of any component such as BIOS / Network Adapter / Controller.

Try downgrading if you have updated them - else ignore this.


Checking for BIOS, but we dont see any new Firmware at the moment.


Thank you
RamKS


I work for HPE.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo

BunnyPon
Regular Advisor

Re: Microserver Gen8 with P222 + debian testing. (5.15 kernel)

@ksram 

Useful link: 5.15 intel iommu enabled by default 

Firmware:

HP Ethernet 1Gb 2-port 332i Adapter 17.4.41

Embedded iLO 2.80 Jan 25 2022 System Board

Intelligent Platform Abstraction Data 0.00 System Board

Intelligent Provisioning 1.63.192 System Board Redundant

System ROM J06 11/02/2015 System Board

Server Platform Services (SPS) Firmware 2.2.0.31.2 System Board

Smart Array P222 Controller 8.32 Slot 1

System Programmable Logic Device Version 0x06 System Board

System ROM J06 04/04/2019 System Board

System ROM Bootblock 02/04/2012 System Board

 

Spurious error.:

[241206.883873] DMAR: DRHD: handling fault status reg 2
[241206.883892] DMAR: [INTR-REMAP] Request device [01:00.0] fault index 55 [fault reason 38] Blocked an interrupt request due to source-id verification failure

Looking for SR IOV and friends.

SR IOV - no such animal on the Microserver Gen8

I did not find a "HP Shared Memroy Feature" in the BIOS either. Only this:

https://community.hpe.com/t5/ProLiant-Servers-Netservers/Disabling-RMRDS-RMRR-HP-Shared-Memory-features-on-Microserver/td-p/7105623

 

Since this Microserver Gen8 has a job to do, I will stick with kernel 5.10 for now. It is already way beyond my level of expertise. Some google suggests that DMAR and XEONs has always been a problem and most things point at the BIOS. Even in Linux 2.6

I am sure if you have an old xeon microserver gen8 with p222 lying around, you can trivially reproduce these problems.

 

 

I can't Cat Today.
ksram
HPE Pro

Re: Microserver Gen8 with P222 + debian testing. (5.15 kernel)

Hi @BunnyPon 

Thank you for the Post.

Couldnt get much information on this.

I would request you to log  a Case with HPE whenever possible to check for any other options.

Thank you
RamKS

 


I work for HPE.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]

Accept or Kudo