Home
Knowledge Base
GPU
1. Introduction to L4 GPUs on CloudPe

1. Introduction to L4 GPUs on CloudPe

Introduction to NVIDIA L4 GPUs on CloudPe

NVIDIA L4 GPUs on CloudPe are designed to accelerate AI inference, machine learning workloads, computer vision, and video processing applications. These GPUs provide a balance of high performance, energy efficiency, and cost-effectiveness, making them ideal for production-grade AI workloads as well as development and testing environments.

CloudPe offers L4 GPUs as part of its GPU-enabled virtual machine plans, which can be launched directly from the CloudPe Dashboard. Users can select an L4 GPU plan while creating a VM, choose a supported operating system, and deploy workloads without needing to manage underlying hardware.

Once deployed, the GPU is automatically attached to the VM and can be accessed using standard NVIDIA drivers and CUDA libraries, enabling seamless integration with popular AI frameworks such as TensorFlow, PyTorch, and OpenCV.

The NVIDIA L4 is a cloud GPU with 24 GB GDDR6 VRAM, designed for:

AI inference (running trained models)
Lightweight machine learning training
Video processing & streaming
Computer vision applications

It gives much faster performance than a CPU for AI tasks — while also being power-efficient and affordable.

Common Use Cases

Task Type	Example	Benefit with L4
AI Inference	Chatbots, image recognition	Very fast response time
ML Training	Small/medium models	3–10× faster vs CPU
Video Transcoding	1080p → 4K conversion, streaming	Real-time performance

Recommended Guest Operating Systems

OS	Version

Ubuntu

20.04/22.04 / 24.04

AlmaLinux

8 / 9

Rocky Linux

8 / 9

Windows Server

2019 / 2022

GPU Architecture Basics

GPU = many small cores → great at doing the same task many times.

CPU = fewer powerful cores → great at logic, program control, OS operations.

➡ For AI:

Training + Inference happens on GPU
Application + Logic runs on CPU

Think of:

CPU = Brain 🧠
GPU = Muscle 💪

They work together.

2. Available GPU Plans & Limits

Full plan and pricing details available at:
https://www.cloudpe.com/pricing/#Cloud-GPU

Where are L4 Plans Available?

NVIDIA L4 GPU plans are currently available only in the IN-WEST2 region, IN-WEST3 is a CPU-only region, and GPU hardware (including NVIDIA L4 GPUs) has not been deployed or is temporarily exhausted in this region. As a result, GPU-based virtual machines cannot be created in IN-WEST3.

3.Network Bandwidth on CloudPe GPU Instances

CloudPe L4 GPU VMs include dual-network connectivity:

Network Type	Purpose	Typical Speed
Public Network	Internet access, public APIs, VPN connectivity	Up to 1 Gbps
Private Network	Traffic between VMs inside CloudPe datacenter	Up to 10 Gbps

4. Choosing the Right L4 Plan

Comparisons: Inference vs Training

Feature	Inference (Running Model)	Training (Learning Model)
Purpose	Use already-trained AI to predict results	Teach the AI model using large datasets
Workload Example	Ask model to identify a face in an image	Train model to learn faces from thousands of images
GPU Requirement	Low to Medium	High (more GPU Memory + CUDA cores)
Performance Focus	⚡ Low latency, fast results	🧮 High compute power, long runtimes
Storage Needs	Small (model already built)	Large datasets required
Typical GPU Usage	NLP chatbots, vision object detection, speech-to-text	ML model development, fine-tuning LLMs / Vision models

5. How Many Models Per GPU?

The number of models that can run on a GPU depends mainly on VRAM (GPU Memory).

Each AI model needs VRAM to load and process data.
If VRAM is full → GPU cannot load more models → performance drops or model fails to run.

Approx. VRAM Needs for Common AI Models

Model Type	Example Models	VRAM Required (Approx.)	Usage Type
Small Language Models	Llama-3 8B	10–12 GB	Inference
Medium Vision Models	YOLOv8-L	8–16 GB	Inference/Training
Large Speech Models	Whisper-Large-V3	16–20 GB	Real-time speech
Large Language Models	Llama-3 70B	70–130 GB (requires multiple GPUs — not supported on a single L4’s 24 GB VRAM)	Training/Enterprise

6. Benchmark References (Latency & Throughput)

To understand performance on an NVIDIA L4 GPU, the most common metrics used in AI workloads are:

Metric	Meaning	Example
Latency	How long one request or inference takes	“50 ms per image”
Throughput	How many requests can be processed per second	“200 images/sec”

These help customers choose the right plan based on workload and user traffic.

Was this article helpful?

Yes No

1. Introduction to L4 GPUs on CloudPe

Introduction to NVIDIA L4 GPUs on CloudPe

Common Use Cases

GPU Architecture Basics

2. Available GPU Plans & Limits

Where are L4 Plans Available?

3.Network Bandwidth on CloudPe GPU Instances

4. Choosing the Right L4 Plan

Comparisons: Inference vs Training

5. How Many Models Per GPU?

Approx. VRAM Needs for Common AI Models

6. Benchmark References (Latency & Throughput)

Contents

Need Support?

1. Introduction to L4 GPUs on CloudPe

Introduction to NVIDIA L4 GPUs on CloudPe

Common Use Cases

GPU Architecture Basics

2. Available GPU Plans & Limits

Where are L4 Plans Available?

3.Network Bandwidth on CloudPe GPU Instances

4. Choosing the Right L4 Plan

Comparisons: Inference vs Training

5. How Many Models Per GPU?

Approx. VRAM Needs for Common AI Models

6. Benchmark References (Latency & Throughput)

Related Articles

Contents

Need Support?