ProLiant Servers (ML,DL,SL)
1752609 Members
4400 Online
108788 Solutions
New Discussion юеВ

Re: HPC - sl250 GPU node

 
yamoo
Occasional Contributor

HPC - sl250 GPU node

The Sl250 -G8 can take maximum of 3 GPU module and I presumed those GPUs are only accessible to the CPUs on the node where the GPU modules are installed and that there is no possible DMA access by which the GPU nodes can be shared by other nodes in the cluster. Please I am new to HPC concept and I will appreciate if any one can confirm this to me or point me to other possiblities around this.

 

Thanks.

2 REPLIES 2
Johan Guldmyr
Honored Contributor

Re: HPC - sl250 GPU node

There is RDMA on InfiniBand. For example https://developer.nvidia.com/gpudirect
Casper42
Respected Contributor

Re: HPC - sl250 GPU node

Usually in HPC Applications, you have a management node that takes a workload and splits it up into a bunch of individual jobs.
Each job is then sent to a different server to be crunched and then the results are sent back and collected into a single result set.


Now if you have an app that does NOT handle this kind of job creation, you need something like that Johan posted above or this other tool I ran across a while ago:
http://www.mosix.org/txt_vcl.html/
I read about it when reading a blog about password cracking. A Security Researcher used 25 AMD GPUs spread across like 4 or 5 machines and used the software above to make the GPUs all appear as if they were "local" to a single machine.


PS: There is also an SL270 Gen8 that can accommodate up to 8 GPUs on a single machine. Might be easier to start there if you cannot job split.