GPUカスタムメトリクスを使用したオートスケールの設定

KubernetesはPrometheusと統合することで、GPUメトリクスなどのカスタムメトリクスに基づく自動スケーリングをサポートしています。このガイドでは、FPT Kubernetes Engineプラットフォーム上で動作するGPUベースのアプリケーションにオートスケールを設定する方法を説明します。

前提条件

GPUが接続されたKubernetesクラスター
実行中のGPUアプリケーション

設定手順

ステップ1：kube-prometheus-stackとprometheus-adapterのインストール

FPT App Catalogを使用する場合：

FPT App Catalogサービスを使用してApp Catalogを作成し、「Connect Cluster」を選択してGPUクラスターに接続します。
App Catalogsメニューでfptcloud-catalogsリポジトリを選択し、prometheusを検索して、リリース名とnamespaceを入力することでkube-prometheus-stackパッケージをインストールします。

Helm chartを使用する場合：

helm repo add xplat-fke https://registry.fke.fptcloud.com/chartrepo/xplat-fke && helm repo update
helm install --wait --generate-name \
    -n prometheus --create-namespace \
    xplat-fke/kube-prometheus-stack
prometheus_service=$(kubectl get svc -n prometheus -lapp=kube-prometheus-stack-prometheus -ojsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}')
helm install --wait --generate-name \
    -n prometheus --create-namespace \
    xplat-fke/prometheus-adapter \
    --set prometheus.url=http://${prometheus_service}.prometheus.svc.cluster.local

kube-prometheus-stackをデプロイした後、prometheus-adapterをデプロイします。正しいPrometheusサービスを指すようにvaluesを更新する必要があります。例えば、kube-prometheus-stackのnamespaceがprometheusの場合、入力する値は以下のとおりです：

prometheus-kube-prometheus-prometheus.prometheus.svc.cluster.local

その後、両パッケージのステータスを確認します：

ステップ2：GPUアプリケーション用Horizontal Pod Autoscalerの設定

Horizontal Pod Autoscaler（HPA）は、設定で指定した条件を満たすようにPodを自動スケーリングします。prometheus-adapterを設定すると、DCGMカスタムメトリクスをエクスポートしてGPUワークロードを監視できます。

HPAマニフェストの例：

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
 name: my-gpu-app
spec:
 maxReplicas: 3  # Update this accordingly
 minReplicas: 1
 scaleTargetRef:
   apiVersion: apps/v1beta1
   kind: Deployment
   name: my-gpu-app # Add label from Deployment we need to autoscale
 metrics:
 - type: Pods  # scale pod based on gpu
   pods:
     metric:
       name: DCGM_FI_PROF_GR_ENGINE_ACTIVE # Add the DCGM metric here accordingly
     target:
       type: AverageValue
       averageValue: 0.8

DCGMメトリクスの詳細については、NVIDIAドキュメントを参照してください。

HPA作成後、以下のコマンドで確認します：

kubectl get hpa -A

前提条件​

設定手順​

ステップ1：kube-prometheus-stackとprometheus-adapterのインストール​

ステップ2：GPUアプリケーション用Horizontal Pod Autoscalerの設定​

前提条件

設定手順

ステップ1：kube-prometheus-stackとprometheus-adapterのインストール

ステップ2：GPUアプリケーション用Horizontal Pod Autoscalerの設定