📄️ Single GPU example: serving an LLM with vLLM
This guide explains how to deploy and serve the Gemma 3 large language model (LLM) using a single GPU on FPT Kubernetes Engine (FKE GPU) with the vLLM framework.
📄️ Multi-GPU example: serving an LLM with vLLM
This guide explains how to deploy and serve the Gemma 3 large language model (LLM) using multiple GPUs on FPT Kubernetes Engine (FKE GPU) with the vLLM framework.
📄️ Multi-node example: vLLM and multi-host serving
This guide explains how to deploy and serve the Gemma 3 large language model (LLM) across multiple nodes on FPT Kubernetes Engine (FKE GPU) using the vLLM framework.
📄️ Fine-tuning an LLM model with Unsloth on Kubernetes
This guide explains how to fine-tune an LLM model on Kubernetes using Unsloth and GPU.
📄️ Try Example Workload
* Initial Setup