ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

ML350p Gen8 & nVidia Tesla K40m

 
RiDDO
Occasional Visitor

ML350p Gen8 & nVidia Tesla K40m

Last year I was given an HP ML350p Gen8 Server after it was retired from our company. I have been using it mainly for SETI@home crunching. It has always run MS Windows Server as we have several old/unused versions kicking around.

Spec is: 2 x XEON E5 2697v2, 64GB RAM/Processor (128GB total), P830 12G Array Controller with 4GB cache, 2 x SFF (Small Form Factor) 8 bay drive cages, 3 x 600GB 12G 15K SAS Drives configured RAID1+ADM and a 4th drive as a hot spare, 2 x 1,200watt PSU's currently configured PSU1 'live' PSU2 'redundant spare'. The machine is drawing about 450watts in total running 24 cores (48 threads) running 48 work unit tasks in parallel. I am (was) hoping to reduce the processing times with the K40 installed. I am familiar with the ILO interface and all temperatures are within tolerances, it is adequately cooled and lives in a dedicated shelf in my Server Cabinet in my office.

I recently bought a 'genuine' (HP Part #'s) nVidia TESLA K40m and the HP Graphics Card Adaptor Kit. It was installed last week on Windows Server 16 the drivers (nVidia website driver download) install but the K40 returns a Code 12 (not enough resources to run the card) error when the machine is re-booted. I've also tried Windows 10 (nVidia website driver download) and Server 2012 R2 (using HPE support website's nVidia driver download). I get the same Code 12 error regardless of installed OS. The cards is installed in PCIe Slot 6 as instructed in the manual. The K40 was one of only 4 cards certified for use in the Gen8 ML350p. In total I've spent over 70 hours researching the problem and trying to find a fix without success. I am convinced it's a BIOS/IRQ/PCIe setting but no amount of fiddling/tweaking removes the Code 12 error. I'm also running the most up to date BIOS 2018.05.21 both as main and backup. Both BIOS's have been refreshed just to make sure . . . . . . . I have removed the P830 and used the embedded P420i Disk Controller that had no effect on the Code 12 problem either. I've disabled both SATA controller's (in the BIOS), again the Code 12 problem persists. The P830 lives in PCIe slot 3.

Can anyone shed any light on why the K40 appears to install, but doesn't function under any version of Windows?

Thanks, RiDDO

--RiDDO
2 REPLIES 2
Paul_J_K
HPE Pro

Re: ML350p Gen8 & nVidia Tesla K40m

Following are the things I'd think to try in this matter. Ensure that the GPU is running the latest supported VBIOS Install HPE published driver if applicable. Check if it has PCI power connector which needs to be connected to the riser / board. Most importantly, please make sure that the server is set to max performance in the power profile (RBSU).

I work for HPE
RiDDO
Occasional Visitor

Re: ML350p Gen8 & nVidia Tesla K40m

Thanks Paul_J_K,

Appreciate you taking the time to reply. Adjusted the BIOS to Max Performance. Sadly this has mode no difference.

I'm unable to update the K40 BIOS as the software appears to install but will not run one the machine re-boot's. Device manager initially shows it as a Video Card, then I install the drivers (I have used nVidia & HP Support driver packages), once the installtions completes the card moves from 'unknown device' to Display Adapter with a Code 14 (requires a re-boot to finish configuration) I duly re-boot as requested and the get the yellow triangle/exclamation of doom on the K40 and the Code 12 error.

Just to confrim the K40 is inserted in PCIe Slot 6 and is connected to the power riser using a genuine HP Graphics Card Accesory Kit cable.

Does the message 'APCI: SPCR: Unexpected SPCR Access Width. Defaulting to byte size' mean anything? I've been playing about with Manjaro & Ubuntu today (I know very little about Linux/Unix) and see this message every time either try to boot into a 'live' environment. Neither actually boot :-(. Although openSuse did boot last week, although with the same message.

I appear to have a proper conundrum . . . . . . 

 

--RiDDO