HPE Ezmeral Software platform
1833776 Members
2297 Online
110063 Solutions
New Discussion

heterogeneous GPU support within one Ezmeral compute cluster

 
Chezan
Occasional Visitor

heterogeneous GPU support within one Ezmeral compute cluster

Hi, one question one Ezmeral confused me a lot recently. 

Anyone could let me know how to assign different GPU to different KDApp, eg. here I want to this notebook can run on 'rtx-4000' card and I tried to edit my yaml file (I labelled on the rtx 4000 node first, using label is the approach on K8S) on ezmeral like this, but it failed in the end. 
 
As for KDapp, since packaged by App Workbench, there is no place to set detailed GPU information and from GUI, it can only set GPU number.
 
Screenshot:
 
 
1 REPLY 1
HPE-Ezmeral-Dev
Occasional Visitor

Re: heterogeneous GPU support within one Ezmeral compute cluster

KDApps support adding a NodeAffinity section to the KDApp cluster yaml for specifying one or more GPU models.. The nodeSelector section is not supported.
See the HPE Ezmeral docs for topic "Using nodeAffinity to Select By GPU Type"

Here is an example KD cluster app yaml that specifies a NodeAffinity:

---
apiVersion: "kubedirector.hpe.com/v1beta1"
kind: "KubeDirectorCluster"
metadata: 
  name: "tf-k80-gpu"
  namespace: "gpu-test-none"
  labels: 
    description: ""
spec: 
  app: "tensorflow-gpu-jupyter"
  namingScheme: "CrNameRole"
  appCatalog: "local"
  connections: 
    secrets: []
  roles: 
    - 
      id: "controller"
      members: 1
      resources: 
        requests: 
          cpu: "2"
          memory: "4Gi"
          nvidia.com/gpu: "1"
        limits: 
          cpu: "2"
          memory: "4Gi"
          nvidia.com/gpu: "1"
      #Note: "if the application is based on hadoop3 e.g. using StreamCapabilities interface, then change the below dtap label to 'hadoop3', otherwise for most applications use the default 'hadoop2'"
      #podLabels: 
        #hpecp.hpe.com/dtap: "hadoop2"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: nvidia.com/gpu.product
                    operator: In
                    values:
                      - Tesla-K80

The node label nvidia.com/gpu.product is created by default with Ezmeral Runtime release 5.3..5 and later.
You can also create your own node label for the app to key off.