- Community Home
- >
- Software
- >
- Software - General
- >
- NVIDIA AI Enterprise Implementation on VMware.
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-08-2025 08:10 PM - last edited on 08-08-2025 08:13 PM by support_s
08-08-2025 08:10 PM - last edited on 08-08-2025 08:13 PM by support_s
NVIDIA AI Enterprise Implementation on VMware.
NVIDIA AI Enterprise Implementation on VMware.
About:
NVAIE implementation on VMware: This guide includes an end-to-end installation of NVAIE on VMware, starting from the server BIOS setup to running a use case using the NGC library. This includes BIOS setup, installation of ESXi, vCenter, and installation of NVAIE host and guest driver installation. After the configuration and setup of the environment, a Data Scientist / developer can use the environment to develop AI and ML-related workloads with maximum GPU efficiency.
Contents
Server Details – MindSparks LAB. 4
Single Root I/O Virtualization (SR-IOV) – Enabled. 6
Power Setting or System Profile - High Performance. 7
vCenter implementation – Stage# 2. 15
Setup CUP power management policy.. 16
Install NVIDIA AI enterprise Host Software. 16
Preparing the VIB file for Install. 16
Installing the VIB on the Esxi host: 17
Changing the Default Graphics Type in VMware vSphere: 18
Change the GPU type to Shared Direct. 18
Create an Ubuntu based Virtual Machine: 19
Configure MMIO settings for the VM. 19
Enable GPU & parameters on VMs. 20
Add a GPU to the gust VM... 20
Change the PCI personality from vCenter. 21
Disable Nouveau on the Gust ubuntu machines. 22
Install NVAIE driver on Guest machines. 23
Installing Docker and The Docker Utility Engine for NVIDIA GPUs. 25
Installing Docker and The Docker Utility Engine for NVIDIA GPUs. 25
Installing the NVIDIA Container Toolkit 26
Test the GPU function with a container 27
Install and setup NGC on the gust VM. 28
Install NGC CLI on the Ubuntu Guest VM. 28
Server Details – MindSparks LAB Host details:
Host details
Host IP address
10.25.41.12
iLO IP address
10.25.42.12
Hypervisor details:
Hypervisor
Esxi Host IP
10.25.41.12
OS
Esxi 8.0
vCenter Appliance Details:
vCenter Appliance
vCenter IP
vCenter Appliance version
vcsa 8.0
Gust VM-1:
Gust VM -1
VM name
nvdia-vm-1
IP
10.25.41.15
OS
Ubuntu 22
Server Set up BIOS Setup:
- Login to iLO of the server, make sure you are on the right host.
- Based on the official documentation we have to follow the below : https://docs.nvidia.com/ai-enterprise/deployment-guide-vmware/0.1.0/prereqs.html
- If using NVIDIA A100, the following BIOS settings are required:
- Single Root I/O Virtualization (SR-IOV) – Enabled.
- VT-d/IOMMU – Enabled.
- Hyperthreading – Enabled.
- Power Setting or System Profile - High Performance.
- CPU Performance (if applicable) - Enterprise or High Throughput (Optional).
- Memory Mapped I/O above 4-GB - Enabled (if applicable) (Optional).
- Reboot the host > get in BIOS.
Single Root I/O Virtualization (SR-IOV) – Enabled
System utilities > System Configuration > BIOS/Platform Configuration (RBSU) > Virtualization Options > SR-IOV = Enabled > F10: Save
VT-d/IOMMU – Enabled
System utilities > System Configuration > BIOS/Platform Configuration (RBSU) > Virtualization Options > VT-d = Enabled > F10: Save
Hyperthreading – Enabled
System Utilities > BIOS/Platform Configuration (RBSU) > Processor Options > Intel (R) Hyperthreading Options = Enabled
Power Setting or System Profile - High Performance.
System Utilities > System Configuration > BIOS/Platform Configuration (RBSU) Workload Profile = Virtualization - Max performance
Note: This Option may be very based on the ROM version / Servers
- Reboot the host and make sure all the above options are updated in BIOS.
Install Esxi.
- Access iLO of the GPU server.
- Mount the Esxi IOS image from the iLO.
- From BIOS, select the one-time boot menu and select the iLO virtual media
- Initiate the Esxi installation.
- Press Enter to continue when you see the Esxi installer.
- Accept the Licensing and agreement (EULA)
- Select a Disc for the Esxi installation.
- Select the language.
- Enter the root password for Esxi.
- Initiate the Esxi installation.
- Unmount the image and boot the server normally.
- Validate the Esxi is installed correctly.
Install vCenter Appliance.
- Download the compatible VCSA onto a Windows Machine.
- Mount the Image by right clicking on the image.
- Go to the below location and double click on installer.
D:\vcsa-ui-installer\win32
- Click on installation.
- Update Esxi IP, username and password.
- Set up vCenter Server VM.
- Select the deployment Size.
- Select the datastore.
- Update Network Settings.
- Verify the details and initiate the installation of vCenter.
- Access the vCenter appliance with the configured IP address.
vCenter implementation – Stage# 2
- Introduction.
- vCenter server Configuration.
- SSO configuration.
- Configure CEIP.
- Ready to complete > Finish.
- Completion of the Appliance installation.
- Access the vCenter using the above-mentioned IP after the stage 2 installation.
Setup CUP power management policy.
Must change the below setting from all the host in the vCenter.
Install NVIDIA AI enterprise Host Software. Preparing the VIB file for Install.
Before you begin, download the archive containing the VIB file and extract the archive contents to a folder. The file ending with .VIB is the file that you must upload to the host data store for installation. For demonstration purposes, these steps use the VMWare vSphere web interface to upload the VIB to the server host.
Note: Download the drivers from NVDIA Application Hub.
- Extract the Software packages - NVIDIA-AI-Enterprise-vSphere-8.0-550.54.16-550.54.15-551.78
- Locate NVD-AIE_ESXi_8.0.0_Driver_550.54.16-1OEM.800.1.0.20613240.vib file under Host_Drivers folder
- Uploading VIB in vSphere Client.
- Upload the .VIB file to the datastore.
Installing the VIB on the Esxi host:
- Place the host into Maintenance mode.
- Use the esxcli command to install the NVIDIA AI Enterprise Host Software package:
- Navigate the folder to the datastore where we have the .vib file saved
esxcli software vib install -v /vmfs/volumes/datastore-5tb/vib/NVD-AIE_ESXi_8.0.0_Driver_550.54.16-1OEM.800.1.0.20613240.vib
- Exit Maintenance Mode.
- Reboot the Esxi host.
- Verifying the Installation of the VIB.
vmkload_mod -l | grep nvidia
- Verify that the NVIDIA kernel driver can successfully communicate with the NVIDIA physical GPUs in your system by running the nvidia-smi command.
nvidia-smi
Changing the Default Graphics Type in VMware vSphere: Change the GPU type to Shared Direct
- Log in to vCenter Server by using the vSphere Web Client.
- In the navigation tree, select your ESXi host and click the Configure tab.
- From the menu, choose Graphics and then click the Host Graphics tab.
- On the Host Graphics tab, click Edit.
- In the Edit Host Graphics Settings dialog box that opens, select Shared Direct and click > OK.
After you click OK, the default graphics type changes to Shared Direct.
- Either restart the ESXi host, or stop and restart the Xorg service and nv-hostengine on the ESXi host. To stop and restart the Xorg service and nv-hostengine, perform these steps:
Stop the Xorg service.
[root@esxi:~] /etc/init.d/xorg stop
Stop nv-hostengine.
[root@esxi:~] nv-hostengine -t
- Wait for 1 second to allow nv-hostengine to stop,
Start nv-hostengine.
[root@esxi:~] nv-hostengine -d
Start the Xorg service.
[root@esxi:~] /etc/init.d/xorg start
- Check the status of the Graphics card.
Create an Ubuntu based Virtual Machine: Create a VM (Ubuntu)
- From vCenter create a virtual machine (normal process)
VM requirement:
- CPU, RAM and HDD specification (minimum requirement).
Configure MMIO settings for the VM.
- Adjust the Memory Mapped I/O (MMIO) settings for the VM.
- Click Add Configuration Params and add the parameters from the table, fill in xxx with the corresponding value in the column MMIO Space Required for the your GPU model.
pciPassthru.64bitMMIOSizeGB = 128
pciPassthru.use64bitMMIO = TRUE
Enable GPU & parameters on VMs. Add a GPU to the gust VM
- Settings of the Gust VM.
- Virtual Hardware.
- Add a new device.
- Under Other device > Select PCI device.
- Select the vGPU and add.
- Check the status.
Change the PCI personality from vCenter
- Right-click on your VM and select "Edit Settings."
- Click on the "VM Options" tab.
- Select "Edit Configuration" from the "Advanced" drop-down list.
- Click "Add Row."
- Name: pciPassthru0.cfg.enable_uvm
- Value: 1
- Click "OK" to save.
- Click "Add Row" again.
- Name: pciPassthru1.cfg.enable_uvm
- Value: 1
- Click "OK" to save.
pciPassthru0.cfg.enable_uvm - 1
pciPassthru1.cfg.enable_uvm - 1
- Summary > Power on the VM
Disable Nouveau on the Gust ubuntu machines.
Nouveau is an open-source graphics device driver for Nvidia video cards and the Tegra family
- Run the below command to verify if Nouveau is loaded
lsmod | grep nouveau
- If you see the above output, follow the below steps to disable Nouveau.
cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF
- Regenerate the kernel initramfs.
sudo update-initramfs -u
- Reboot the host.
Install NVAIE driver on Guest machines.
- Log in to the VM and check for updates.
sudo apt-get update
- Install the gcc compiler and the make tool in the terminal.
sudo apt-get install build-essential
- Download the NVIDIA AI Enterprise Software Driver and place it in the Guest VMs.
- Navigate to the directory containing the NVIDIA Driver .run file. Then, add the Executable permission to the NVIDIA Driver file using the chmod command
cd vgpu_guest_driver_2_1:510.73.08
sudo chmod +x NVIDIA-Linux-x86_64-510.73.08-grid.run
- Run the driver installer as the root user and accept defaults.
sudo sh ./NVIDIA-Linux-x86_64-510.73.08-grid.run
- Reboot the gust VM.
- Check the GPU status using the below command on the Guest VM.
nvidia-smi
- From the physical host side, we can see the sliced GPU status.
nvidia-smi mig -lgip
Guest VM Licensing.
To use an NVIDIA vGPU software licensed product, each client system to which a physical or virtual GPU is assigned must be able to obtain a license from the NVIDIA License System. A client system can be a VM that is configured with NVIDIA vGPU, a VM that is configured for GPU pass through, or a physical host to which a physical GPU is assigned in a bare-metal deployment.
- Generating a Client Configuration Token
- Configuring a Licensed Client on Linux
Reference: https://docs.nvidia.com/ai-enterprise/deployment-guide-vmware/0.1.0/first-vm.html#licensing-the-vm
Installing Docker and The Docker Utility Engine for NVIDIA GPUs Installing Docker and The Docker Utility Engine for NVIDIA GPUs
- Install docker.
- Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
- Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
- Install the Docker packages and check the status.
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Installing the NVIDIA Container Toolkit
- Configure the production repository.
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
- Optionally, configure the repository to use experimental packages.
- Update the packages list from the repository.
sudo apt-get update
- Install the NVIDIA Container Toolkit packages:
sudo apt-get install -y nvidia-container-toolkit
Configuring Docker
- Configure the container runtime by using the nvidia-ctk command.
Note: The nvidia-ctk command modifies the /etc/docker/daemon.json file on the host. The file is updated so that Docker can use the NVIDIA Container Runtime.
sudo nvidia-ctk runtime configure --runtime=docker
- Restart the Docker daemon:
sudo systemctl restart docker
Rootless mode
Note: To configure the container runtime for Docker running in Rootless mode, follow these steps:
Configure the container runtime by using the nvidia-ctk command:
nvidia-ctk runtime configure --runtime=docker --config=$HOME/.config/docker/daemon.json
- Restart the Rootless Docker daemon:
systemctl --user restart docker
- Configure /etc/nvidia-container-runtime/config.toml by using the sudo nvidia-ctk command:
sudo nvidia-ctk config --set nvidia-container-cli.no-cgroups --in-place
Test the GPU function with a container
- Execute the below command to test the GPU function on the Gust VM
Docker run –rm –runtime=nvidia –gpus all ubuntu nvidia-smi
Install and setup NGC on the gust VM. Install NGC CLI on the Ubuntu Guest VM.
- Enter the NVIDIA NGC website as a guest user.
- In the top right corner, click Welcome Guest and then select Setup from the menu.
- Click Downloads under Install NGC CLI from the Setup page.
- From the CLI Install page, click the Windows, Linux, or MacOS tab, according to the platform from which you will be running NGC Catalog CLI.
- Execute the below command on the guest VMs.
wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/ngc-apps/ngc_cli/versions/3.44.0/files/ngccli_linux.zip -O ngccli_linux.zip && unzip ngccli_linux.zip
- Make the NGC CLI binary executable and add your current directory to path
- Execute the file from the same directory.
- Check the NGC version.
chmod u+x ngc-cli/ngc
echo "export PATH=\"\$PATH:$/root/praveen/ngc_cli/ngc-cli/ngc-cli\"" >> ~/.bash_profile && source ~/.bash_profile
ngc --version
Sample use case execution.
Note: An example container pull using the NVIDIA RAPIDS Production Branch has been provided, this is part of NVIDIA CUDA-X, is an open-source suite of GPU-accelerated data science and AI libraries with APIs that match the most popular open-source data tools. It accelerates performance by orders of magnitude at scale across data pipelines.
- Login to NVIDIA container repository.
docker login nvcr.io
Username: $oauthtoken
Password: <my-api-key-from-your-ngc-account>
- Login to ngc, get a container, for the sample use case execution.
- Access the Jupyter notebook with the exposed local IP:port
docker run --rm -it --pull always --gpus all --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -e EXTRA_CONDA_PACKAGES="jq" -e EXTRA_PIP_PACKAGES="beautifulsoup4" -p 8888:8888 rapidsai/notebooks:24.04-cuda11.8-py3.11
Conclusion
In conclusion, the comprehensive guide to NVAIE implementation on VMware offers a thorough, step-by-step approach to setting up a robust AI and ML environment. By meticulously detailing the process from server BIOS configuration to leveraging the NGC library, this guide ensures that data scientists and developers can efficiently utilize GPU resources. This seamless integration of NVAIE on VMware not only enhances the performance of AI and ML workloads but also empowers users to harness the full potential of their hardware, driving innovation and productivity in their respective fields.
I am an HPE Employee
- Tags:
- drive