Table of Contents
NVLink is a high-bandwidth, low-latency interconnect designed for fast GPU-to-GPU communication. It is mainly used in NVIDIA systems. On the other hand, PCIe is a general-purpose, widely supported interface with lower bandwidth and higher latency, used to connect GPUs to CPUs and peripherals.
AI based supercomputers are meant to tackle large volumes of data sets, can process mountains of data and help to bring meaningful insights out of them. Banking sector, car manufacturers, factories, hospitals, wholesale retailers all are adopting them to accelerate their business.
The powerful AI systems carry data over parallel paths at lighting speed to fast track actionable results. AI infrastructure typically uses GPU and TPU processors and requires fast interconnects for accelerated computing achieved by the gold standard using NVLink.
PCIe (Peripheral Component Interconnect Express) is a standard serial expansion bus which provides 32GB/s of aggregated bandwidth which is bidirectional.
In today’s article we will understand about difference between NVLink and PCIe, there key differences, characteristics and uses
What is NVLink
Rise of gigantic AI models placing extraordinary demands on computing power. Ingesting and synchronizing data on hundreds or thousands of GPUs is not enough without an interconnect. Just having a high number of GPUs does not improve performance without being supported by a high speed data highway; it is just like having a sports car in hand with a broken road ahead. NVidia NVLink establishes a high speed connection for CPUs and GPUs using a software protocol. It enables processors sending and receiving data from shared memory pools at a rapid pace. It can connect host and accelerated processors up to 900 Gigabyte/s.

Characteristics of NVLink
- Supports high bandwidth up to 1.8 TB/s bidirectional per GPU
- Peer-to-peer latency low from 100 ms to 300 ms
- Direct connectivity between GPU-to-GPU
- Supports hardware enforced cache coherence for memory access as Unified address space
- More energy efficient
What is PCIe
PCIe is a serial bus expansion standard which connects GPUs, storage drives and any other equipment to CPU. To build a high performance computing node typically multiple GPUs are directly linked to CPU using a PCIe switch. But as AI models demand terabytes of scale this classic arrangement is becoming a bottleneck. While sharing data using PCIe each round trip between GPU-CPU have to traverse system memory which adds to latency and if we scale from 4-8-16 GPUs then this delay is compounded impacting overall cluster throughput.

Characteristics of PCIe
- Supports data transfer rate up to 64 GT/s per lane
- Point-to-point serial connection
- Uses multiple lanes for data transfer
- Reduction in lag and shuttering during resource intensive tasks
Comparison: NVLink vs PCIe
| Features | NVLink | PCIe |
|---|---|---|
| Purpose | High-speed, direct connectivity between GPU-GPU for all AI and HPC workloads | Meant for general purpose connectivity between GPUs, network components and other peripherals to CPU |
| Designed | Short reach, purpose build fabric for huge data transport at low latency | Long reach, universal , plug-and-play I/O for any card, slot , OS having good bandwidth and compatibility |
| Architecture | * Mesh networking direct connection between GPU-GPU * Point to point connections having multiple links to GPUs * It is a proprietary NVidia technology | * CPU/chipset based hub system * Universally compatible across vendors * Tree structure is Hierarchical |
| Physical Layer | High-speed differential signalling (NVHS) is used | PCIe depending on version could use Non-Return-to-Zero (NRZ) signalling or Pulse Amplitude Modulation 4-level (PAM4) signaling |
| Bandwidth | 900 GB/s for certain cases highly demanding | 128 GB/s is available with Gen5*16 offering |
| Latency | Very low due to direct GPU-CPU link with CPU | High latency as data need to pass thru CPU |
| Compatibility | Limited support for selected high end data center GPUs and NVIDIA | Universal support with most motherboards and GPUs |
| Use cases | For training large models, faster outcome, high bandwidth memory pooling to train larger batches, long sequences and complex architectures | Meant for training small models, image classifications, Chatbot responses or recommendations on queries, AI running on mix of machines, cloud servers etc. |
Download the comparison table: NVLink vs PCIe
ABOUT THE AUTHOR

You can learn more about her on her linkedin profile – Rashmi Bhardwaj



