Introduction to NVIDIA L4 GPUs on CloudPe
NVIDIA L4 GPUs on CloudPe are designed to accelerate AI inference, machine learning workloads, computer vision, and video processing applications. These GPUs provide a balance of high performance, energy efficiency, and cost-effectiveness, making them ideal for production-grade AI workloads as well as development and testing environments.
CloudPe offers L4 GPUs as part of its GPU-enabled virtual machine plans, which can be launched directly from the CloudPe Dashboard. Users can select an L4 GPU plan while creating a VM, choose a supported operating system, and deploy workloads without needing to manage underlying hardware.
Once deployed, the GPU is automatically attached to the VM and can be accessed using standard NVIDIA drivers and CUDA libraries, enabling seamless integration with popular AI frameworks such as TensorFlow, PyTorch, and OpenCV.
The NVIDIA L4 is a cloud GPU with 24 GB GDDR6 VRAM, designed for:
- AI inference (running trained models)
- Lightweight machine learning training
- Video processing & streaming
- Computer vision applications
It gives much faster performance than a CPU for AI tasks — while also being power-efficient and affordable.
Common Use Cases
| Task Type | Example | Benefit with L4 |
|---|---|---|
| AI Inference | Chatbots, image recognition | Very fast response time |
| ML Training | Small/medium models | 3–10× faster vs CPU |
| Video Transcoding | 1080p → 4K conversion, streaming | Real-time performance |
Recommended Guest Operating Systems
| OS | Version |
|---|
| Ubuntu | 20.04/22.04 / 24.04 |
| AlmaLinux | 8 / 9 |
| Rocky Linux | 8 / 9 |
| Windows Server | 2019 / 2022 |
GPU Architecture Basics
GPU = many small cores → great at doing the same task many times.
CPU = fewer powerful cores → great at logic, program control, OS operations.
➡ For AI:
- Training + Inference happens on GPU
- Application + Logic runs on CPU
Think of:
CPU = Brain 🧠
GPU = Muscle 💪
They work together.
2. Available GPU Plans & Limits

Full plan and pricing details available at:
https://www.cloudpe.com/pricing/#Cloud-GPU
Where are L4 Plans Available?
NVIDIA L4 GPU plans are currently available only in the IN-WEST2 region, IN-WEST3 is a CPU-only region, and GPU hardware (including NVIDIA L4 GPUs) has not been deployed or is temporarily exhausted in this region. As a result, GPU-based virtual machines cannot be created in IN-WEST3.
3.Network Bandwidth on CloudPe GPU Instances
CloudPe L4 GPU VMs include dual-network connectivity:
| Network Type | Purpose | Typical Speed |
|---|---|---|
| Public Network | Internet access, public APIs, VPN connectivity | Up to 1 Gbps |
| Private Network | Traffic between VMs inside CloudPe datacenter | Up to 10 Gbps |
4. Choosing the Right L4 Plan
Comparisons: Inference vs Training
| Feature | Inference (Running Model) | Training (Learning Model) |
|---|---|---|
| Purpose | Use already-trained AI to predict results | Teach the AI model using large datasets |
| Workload Example | Ask model to identify a face in an image | Train model to learn faces from thousands of images |
| GPU Requirement | Low to Medium | High (more GPU Memory + CUDA cores) |
| Performance Focus | ⚡ Low latency, fast results | 🧮 High compute power, long runtimes |
| Storage Needs | Small (model already built) | Large datasets required |
| Typical GPU Usage | NLP chatbots, vision object detection, speech-to-text | ML model development, fine-tuning LLMs / Vision models |
5. How Many Models Per GPU?
The number of models that can run on a GPU depends mainly on VRAM (GPU Memory).
- Each AI model needs VRAM to load and process data
- If VRAM is full → GPU cannot load more models → performance drops or model fails to run
Approx. VRAM Needs for Common AI Models
| Model Type | Example Models | VRAM Required (Approx.) | Usage Type |
|---|---|---|---|
| Small Language Models | Llama-3 8B | 10–12 GB | Inference |
| Medium Vision Models | YOLOv8-L | 8–16 GB | Inference/Training |
| Large Speech Models | Whisper-Large-V3 | 16–20 GB | Real-time speech |
| Large Language Models | Llama-3 70B | 70–130 GB (requires multiple GPUs — not supported on a single L4’s 24 GB VRAM) | Training/Enterprise |
6. Benchmark References (Latency & Throughput)
To understand performance on an NVIDIA L4 GPU, the most common metrics used in AI workloads are:
| Metric | Meaning | Example |
|---|---|---|
| Latency | How long one request or inference takes | “50 ms per image” |
| Throughput | How many requests can be processed per second | “200 images/sec” |
These help customers choose the right plan based on workload and user traffic.