Using Autoscaler with GPU

Container-level autoscaling

Horizontal Pod Autoscaler (HPA) automatically updates workload resources (such as Deployments or StatefulSets) to scale them according to application demand. In practice, when a Kubernetes application workload increases, HPA deploys more pods to meet resource requirements. If the load decreases and the number of pods is above the configured minimum, HPA reduces the workload resource (Deployment, StatefulSet, or similar), meaning it reduces the number of pods. GPU HPA uses DCGM custom metrics to monitor and scale pods up or down based on the GPU application workload.

To configure HPA for a GPU application, refer to the following configuration:

apiVersion: autoscaling/v2beta2

kind: HorizontalPodAutoscaler

metadata:

 name: my-gpu-app

spec:

 maxReplicas: 3  # Update this accordingly

 minReplicas: 1

 scaleTargetRef:

   apiVersion: apps/v1beta1

   kind: Deployment

   name: my-gpu-app # Add label from Deployment we need to autoscale

 metrics:

 - type: Pods  # scale pod based on gpu

   pods:

     metric:

       name: DCGM_FI_PROF_GR_ENGINE_ACTIVE # Add the DCGM metric here accordingly

     target:

       type: AverageValue

       averageValue: 0.8 # Set the threshold value as per the requirement

For further reference, see the NVIDIA documentation on DCGM metrics.

To check that the HPA has been initialized for the GPU application, run:

Node-level autoscaling

Like standard Cluster Autoscaler, the Kubernetes cluster automatically scales worker nodes up or down within a worker group based on GPU usage demand. It automatically adds new workers to a worker group when the application running in that group cannot be served with sufficient GPU resources from the existing worker nodes. Pending pods caused by insufficient node resources will be served by new worker nodes after scaling up. Cluster Autoscaler also automatically removes nodes whose utilization is below the threshold (default 50%).

Worker group node count configuration is defined on the FPTCloud Portal as shown below:

Container-level autoscaling​

Node-level autoscaling​

Container-level autoscaling

Node-level autoscaling