Deploy GPU Workload to Managed GPU Cluster

📄️ Single GPU example: serving an LLM with vLLM

This guide explains how to deploy and serve the Gemma 3 large language model (LLM) using a single GPU on FPT Kubernetes Engine (FKE GPU) with the vLLM framework.

📄️ Multi-GPU example: serving an LLM with vLLM

This guide explains how to deploy and serve the Gemma 3 large language model (LLM) using multiple GPUs on FPT Kubernetes Engine (FKE GPU) with the vLLM framework.

📄️ Multi-node example: vLLM and multi-host serving

This guide explains how to deploy and serve the Gemma 3 large language model (LLM) across multiple nodes on FPT Kubernetes Engine (FKE GPU) using the vLLM framework.

📄️ Fine-tuning an LLM model with Unsloth on Kubernetes

This guide explains how to fine-tune an LLM model on Kubernetes using Unsloth and GPU.

📄️ Try Example Workload

* Initial Setup