ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

ML350p Gen8 & nVidia Tesla K40m

 
Visitor

ML350p Gen8 & nVidia Tesla K40m

Last year I was given an HP ML350p Gen8 Server after it was retired from our company. I have been using it mainly for SETI@home crunching. It has always run MS Windows Server as we have several old/unused versions kicking around.

Spec is: 2 x XEON E5 2697v2, 64GB RAM/Processor (128GB total), P830 12G Array Controller with 4GB cache, 2 x SFF (Small Form Factor) 8 bay drive cages, 3 x 600GB 12G 15K SAS Drives configured RAID1+ADM and a 4th drive as a hot spare, 2 x 1,200watt PSU's currently configured PSU1 'live' PSU2 'redundant spare'. The machine is drawing about 450watts in total running 24 cores (48 threads) running 48 work unit tasks in parallel. I am (was) hoping to reduce the processing times with the K40 installed. I am familiar with the ILO interface and all temperatures are within tolerances, it is adequately cooled and lives in a dedicated shelf in my Server Cabinet in my office.

I recently bought a 'genuine' (HP Part #'s) nVidia TESLA K40m and the HP Graphics Card Adaptor Kit. It was installed last week on Windows Server 16 the drivers (nVidia website driver download) install but the K40 returns a Code 12 (not enough resources to run the card) error when the machine is re-booted. I've also tried Windows 10 (nVidia website driver download) and Server 2012 R2 (using HPE support website's nVidia driver download). I get the same Code 12 error regardless of installed OS. The cards is installed in PCIe Slot 6 as instructed in the manual. The K40 was one of only 4 cards certified for use in the Gen8 ML350p. In total I've spent over 70 hours researching the problem and trying to find a fix without success. I am convinced it's a BIOS/IRQ/PCIe setting but no amount of fiddling/tweaking removes the Code 12 error. I'm also running the most up to date BIOS 2018.05.21 both as main and backup. Both BIOS's have been refreshed just to make sure . . . . . . . I have removed the P830 and used the embedded P420i Disk Controller that had no effect on the Code 12 problem either. I've disabled both SATA controller's (in the BIOS), again the Code 12 problem persists. The P830 lives in PCIe slot 3.

Can anyone shed any light on why the K40 appears to install, but doesn't function under any version of Windows?

Thanks, RiDDO

--RiDDO
7 REPLIES 7
HPE Pro

Re: ML350p Gen8 & nVidia Tesla K40m

Following are the things I'd think to try in this matter. Ensure that the GPU is running the latest supported VBIOS Install HPE published driver if applicable. Check if it has PCI power connector which needs to be connected to the riser / board. Most importantly, please make sure that the server is set to max performance in the power profile (RBSU).

I am an HPE employee
Accept or Kudo
Visitor

Re: ML350p Gen8 & nVidia Tesla K40m

Thanks Paul_J_K,

Appreciate you taking the time to reply. Adjusted the BIOS to Max Performance. Sadly this has mode no difference.

I'm unable to update the K40 BIOS as the software appears to install but will not run one the machine re-boot's. Device manager initially shows it as a Video Card, then I install the drivers (I have used nVidia & HP Support driver packages), once the installtions completes the card moves from 'unknown device' to Display Adapter with a Code 14 (requires a re-boot to finish configuration) I duly re-boot as requested and the get the yellow triangle/exclamation of doom on the K40 and the Code 12 error.

Just to confrim the K40 is inserted in PCIe Slot 6 and is connected to the power riser using a genuine HP Graphics Card Accesory Kit cable.

Does the message 'APCI: SPCR: Unexpected SPCR Access Width. Defaulting to byte size' mean anything? I've been playing about with Manjaro & Ubuntu today (I know very little about Linux/Unix) and see this message every time either try to boot into a 'live' environment. Neither actually boot :-(. Although openSuse did boot last week, although with the same message.

I appear to have a proper conundrum . . . . . . 

 

--RiDDO
Frequent Visitor

Re: ML350p Gen8 & nVidia Tesla K40m

Hey, Just checking if you had any luck solving your problem as im experiencing the same issue.

Visitor

Re: ML350p Gen8 & nVidia Tesla K40m

AirVision,

Yes! I'm assuming you're using an ML350p Gen8? If you are then follow my instructions below

1. Boot the server and F9 into the BIOS/RBSU

2. Once the BIOS screen loads push CTRL+A simultaneously, you will notice an extra menu (at the bottom) of the main BIOS screen called 'SERVICE OPTIONS'.

3. Scroll down to 'SERVICE OPTIONS', select it and then scroll to 'PCI Express 64-BIT BAR Support' and select that, then highlight 'Enable' hit enter and escape out of the BIOS and ensuring you save your seetings and re-boot the server.

4. Your K40 will now work!

Don't play with any of the other settings in the hidden 'SERVICE OPTIONS'  menu, I don't know what they're all for, and I assume you can break the server, they're obviously hidden for a reason!

ML350's like their K40's to be installed in PCIE slot 6 and require the HP Graphics Card power cables to make them work correctly (but you already know that I'm sure!). Good luck.

 

Ian

--RiDDO
Frequent Visitor

Re: ML350p Gen8 & nVidia Tesla K40m

Hi Ian

Thank you very much for responding, I really appreciate it.

Yes, I have an ML350p Gen 8 with the card installed in slot six however using an aftermarket power cable described in ebay auction for Tesla K40 (containing 6 pin and 8pin connector), but not a Hp branded Graphics card power cable. 

Your instructions for enabling PCI express 64-bit BAR support has made progress now with the card registering in the device manager, so thank you. However, after a short, random period of time ( I'm guessing, depending on GPU load) the device manager shows up the Code 12 error, not enough resources to support Card. So hoping there's a simple settings solution with that again. I will buy the HP graphics card power cable if you know that will be the reason why it's bringing up code 12.

Thanks very much in advance. 

Regard Mark

 

 

Visitor

Re: ML350p Gen8

Hi Mark,

I installed my card having bought it of eBay back in January.

I thought long and hard about buying a copy of the Graphics Power Lead, but in the end bought a genuine HP one from my IT supplier in the UK. It was £250.
I installed the K40 & Power Lead but was stuck for several months with the Code 12 message. I changed slots, removed my P830 Raid Card, still the Code12 stayed.
I installed the HP drivers, then the nVidia drivers, still nothing worked.
I gave up. Removed the cars assuming it was faulty.
I’d spent hours and hours reading about 64 BIT bar support but there is nothing in the BIOS to enable it. There was lots of information on Dell/Fujitsu/IBM but nothing for HP. Several weeks later I was reading a support document on HPE’s website completely non related to the K40 & my problem. It mentioned accessing the ML350p Gen8 ‘advanced’ Bios features. I followed the instructions and found the 64 Bit Bar Support and enable it. My 350/K40 has worked fine ever since crunching Work Units for SETI@Home!

I cannot specifically say the genuine HP graphics power lead will fix the problem, but it is the only difference between our configurations.

Let me know how you get on, good luck.

Thanks, Ian

On 14 Sep 2019, at 6:58 am, Hewlett Packard Enterprise Community > wrote:


[Hewlett Packard Enterprise Community]

Hi RiDDO,

AirVision (Occasional Visitor) posted a new reply in ProLiant Servers (ML,DL,SL) on 09-13-2019 10:57 PM :

Re: ML350p Gen8 & nVidia Tesla K40m

Hi Ian

Thank you very much for responding, I really appreciate it.

Yes, I have an ML350p Gen 8 with the card installed in slot six however using an aftermarket power cable described in ebay auction for Tesla K40 (containing 6 pin and 8pin connector), but not a Hp branded Graphics card power cable.

Your instructions for enabling PCI express 64-bit BAR support has made progress now with the card registering in the device manager, so thank you. However, after a short, random period of time ( I'm guessing, depending on GPU load) the device manager shows up the Code 12 error, not enough resources to support Card. So hoping there's a simple settings solution with that again. I will buy the HP graphics card power cable if you know that will be the reason why it's bringing up code 12.

Thanks very much in advance.

Regard Mark

Thanks, Ian
--RiDDO
Frequent Visitor

Re: ML350p Gen8

A huge thanks to you Ian

I followed your instructions and a glad to report this worked and now have the Tesla k40 working in my server.

Cudos to you and your effort to find the critical piece of information about accessing advanced option in the BIOS. This is one of those cases where you wish that piece of information was in plain sight on the quick specs or manual for the server.

Regards

 

Mar