1. Home
  2. Knowledge Base
  3. GPU
  4. 1. Introduction to L4 GPUs on CloudPe

1. Introduction to L4 GPUs on CloudPe

Introduction to NVIDIA L4 GPUs on CloudPe

NVIDIA L4 GPUs on CloudPe are designed to accelerate AI inference, machine learning workloads, computer vision, and video processing applications. These GPUs provide a balance of high performance, energy efficiency, and cost-effectiveness, making them ideal for production-grade AI workloads as well as development and testing environments.

CloudPe offers L4 GPUs as part of its GPU-enabled virtual machine plans, which can be launched directly from the CloudPe Dashboard. Users can select an L4 GPU plan while creating a VM, choose a supported operating system, and deploy workloads without needing to manage underlying hardware.

Once deployed, the GPU is automatically attached to the VM and can be accessed using standard NVIDIA drivers and CUDA libraries, enabling seamless integration with popular AI frameworks such as TensorFlow, PyTorch, and OpenCV.

The NVIDIA L4 is a cloud GPU with 24 GB GDDR6 VRAM, designed for:

  • AI inference (running trained models)
  • Lightweight machine learning training
  • Video processing & streaming
  • Computer vision applications

It gives much faster performance than a CPU for AI tasks — while also being power-efficient and affordable.

Common Use Cases

Task TypeExampleBenefit with L4
AI InferenceChatbots, image recognitionVery fast response time
ML TrainingSmall/medium models3–10× faster vs CPU
Video Transcoding1080p → 4K conversion, streamingReal-time performance

Recommended Guest Operating Systems

OSVersion
Ubuntu20.04/22.04 / 24.04
AlmaLinux8 / 9
Rocky Linux8 / 9
Windows Server2019 / 2022

GPU Architecture Basics

GPU = many small cores → great at doing the same task many times.

CPU = fewer powerful cores → great at logic, program control, OS operations.

➡ For AI:

  • Training + Inference happens on GPU
  • Application + Logic runs on CPU

Think of:

CPU = Brain 🧠
GPU = Muscle 💪

They work together.


2. Available GPU Plans & Limits

Full plan and pricing details available at:
https://www.cloudpe.com/pricing/#Cloud-GPU

Where are L4 Plans Available?

NVIDIA L4 GPU plans are currently available only in the IN-WEST2 region, IN-WEST3 is a CPU-only region, and GPU hardware (including NVIDIA L4 GPUs) has not been deployed or is temporarily exhausted in this region. As a result, GPU-based virtual machines cannot be created in IN-WEST3.

3.Network Bandwidth on CloudPe GPU Instances

CloudPe L4 GPU VMs include dual-network connectivity:

Network TypePurposeTypical Speed
Public NetworkInternet access, public APIs, VPN connectivityUp to 1 Gbps
Private NetworkTraffic between VMs inside CloudPe datacenterUp to 10 Gbps

4. Choosing the Right L4 Plan

Comparisons: Inference vs Training

FeatureInference (Running Model)Training (Learning Model)
PurposeUse already-trained AI to predict resultsTeach the AI model using large datasets
Workload ExampleAsk model to identify a face in an imageTrain model to learn faces from thousands of images
GPU RequirementLow to MediumHigh (more GPU Memory + CUDA cores)
Performance Focus⚡ Low latency, fast results🧮 High compute power, long runtimes
Storage NeedsSmall (model already built)Large datasets required
Typical GPU UsageNLP chatbots, vision object detection, speech-to-textML model development, fine-tuning LLMs / Vision models

5. How Many Models Per GPU?

The number of models that can run on a GPU depends mainly on VRAM (GPU Memory).

  • Each AI model needs VRAM to load and process data
  • If VRAM is full → GPU cannot load more models → performance drops or model fails to run

Approx. VRAM Needs for Common AI Models

Model TypeExample ModelsVRAM Required (Approx.)Usage Type
Small Language ModelsLlama-3 8B10–12 GBInference
Medium Vision ModelsYOLOv8-L8–16 GBInference/Training
Large Speech ModelsWhisper-Large-V316–20 GBReal-time speech
Large Language ModelsLlama-3 70B70–130 GB (requires multiple GPUs — not supported on a single L4’s 24 GB VRAM)Training/Enterprise

6. Benchmark References (Latency & Throughput)

To understand performance on an NVIDIA L4 GPU, the most common metrics used in AI workloads are:

MetricMeaningExample
LatencyHow long one request or inference takes“50 ms per image”
ThroughputHow many requests can be processed per second“200 images/sec”

These help customers choose the right plan based on workload and user traffic.

Was this article helpful?

Related Articles

This is a staging environment