NVLink vs PCIe: What is the difference?

Google ADs

NVLink is a high-bandwidth, low-latency interconnect designed for fast GPU-to-GPU communication. It is mainly used in NVIDIA systems. On the other hand, PCIe is a general-purpose, widely supported interface with lower bandwidth and higher latency, used to connect GPUs to CPUs and peripherals.

AI based supercomputers are meant to tackle large volumes of data sets, can process mountains of data and help to bring meaningful insights out of them. Banking sector, car manufacturers, factories, hospitals, wholesale retailers all are adopting them to accelerate their business. 

The powerful AI systems carry data over parallel paths at lighting speed to fast track actionable results. AI infrastructure typically uses GPU and TPU processors and requires fast interconnects for accelerated computing achieved by the gold standard using NVLink. 

Google ADs

PCIe (Peripheral Component Interconnect Express) is a standard serial expansion bus which provides 32GB/s of aggregated bandwidth which is bidirectional. 

In today’s article we will understand about difference between NVLink and PCIe, there key differences, characteristics and uses

Rise of gigantic AI models placing extraordinary demands on computing power. Ingesting and synchronizing data on hundreds or thousands of GPUs is not enough without an interconnect. Just having a high number of GPUs does not improve performance without being supported by a high speed data highway; it is just like having a sports car in hand with a broken road ahead. NVidia NVLink establishes a high speed connection for CPUs and GPUs using a software protocol. It enables processors sending and receiving data from shared memory pools at a rapid pace. It can connect host and accelerated processors up to 900 Gigabyte/s.

  • Supports high bandwidth up to 1.8 TB/s bidirectional per GPU 
  • Peer-to-peer latency low from 100 ms to 300 ms
  • Direct connectivity between GPU-to-GPU
  • Supports hardware enforced cache coherence for memory access as Unified address space
  • More energy efficient 

What is PCIe

PCIe is a serial bus expansion standard which connects GPUs, storage drives and any other equipment to CPU. To build a high performance computing node typically multiple GPUs are directly linked to CPU using a PCIe switch. But as AI models demand terabytes of scale this classic arrangement is becoming a bottleneck. While sharing data using PCIe each round trip between GPU-CPU have to traverse system memory which adds to latency and if we scale from 4-8-16 GPUs then this delay is compounded impacting overall cluster throughput. 

Characteristics of PCIe 

  • Supports data transfer rate up to 64 GT/s per lane 
  • Point-to-point serial connection 
  • Uses multiple lanes for data transfer 
  • Reduction in lag and shuttering during resource intensive tasks 
FeaturesNVLinkPCIe
PurposeHigh-speed, direct connectivity between GPU-GPU for all AI and HPC workloadsMeant for general purpose connectivity between GPUs, network components and other peripherals to CPU
DesignedShort reach, purpose build fabric for huge data transport at low latencyLong reach, universal , plug-and-play I/O for any card, slot , OS having good bandwidth and compatibility
Architecture* Mesh networking direct connection between GPU-GPU
* Point to point connections having multiple links to GPUs
* It is a proprietary NVidia technology  
* CPU/chipset based hub system
* Universally compatible across vendors
* Tree structure is Hierarchical  
Physical LayerHigh-speed differential signalling (NVHS) is usedPCIe depending on version could use Non-Return-to-Zero (NRZ) signalling or Pulse Amplitude Modulation 4-level (PAM4) signaling
Bandwidth900 GB/s for certain cases highly demanding128 GB/s is available with Gen5*16 offering 
LatencyVery low due to direct GPU-CPU link with CPUHigh latency as data need to pass thru CPU
CompatibilityLimited support for selected high end data center GPUs and NVIDIAUniversal support with most motherboards and GPUs
Use casesFor training large models, faster outcome, high bandwidth memory pooling to train larger batches, long sequences and complex architecturesMeant for training small models, image classifications, Chatbot responses or recommendations on queries, AI running on mix of machines, cloud servers etc.

Download the comparison table: NVLink vs PCIe

ABOUT THE AUTHOR


Leave a Comment

Your email address will not be published. Required fields are marked *

Shopping Cart