Nvidia-GPU specs

本文汇总了NVIDIA GPU 系列的技术规格以及关键改进

V100

V100 关键改进

  • Volta architecture
  • SM architecture: 支持深度学习
  • 2nd NVIDIA NVLink
  • HBM2 memory
  • Volta Multi-process Service

V100 技术规格

Tesla ProductTesla K40Tesla M40Tesla P100Tesla V100
GPUGK180 (Kepler)GM200 (Maxwell)GP100 (Pascal)GV100 (Volta)
SMs15245680
TPCs15242840
FP32 Cores / GPU2880307235845120
FP64 Cores / GPU9609617922560
Tensor Cores / GPUNANANA640
GPU Boost Clock810/875 MHz1114 MHz1480 MHz1530 MHz
Peak FP32 TFLOPS²56.810.615.7
Peak FP64 TFLOPS²1.7.215.37.8
Peak Tensor TFLOPS²NANANA125
Memory SizeUp to 12 GBUp to 24 GB16 GB16 GB
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM24096-bit HBM2
TDP235 Watts250 Watts300 Watts300 Watts
Manufacturing Process28 nm28 nm16 nm FinFET+12 nm FFN

内存规格

GPUKepler GK180Maxwell GM200Pascal GP100Volta GV100
Compute Capability3.55.26.07.0
Threads / Warp32323232
Max Warps / SM64646464
Max Threads / SM2048204820482048
Max Thread Blocks / SM32323232
Max 32-bit Registers / SM65536655366553665536
Max Registers / Block65536655366553665536
Max Registers / Thread255255255255
Max Thread Block Size1024102410241024
FP32 Cores / SM1921286464
Ratio of SM Registers to FP32 Cores34151210241024
Shared Memory Size / SM16 KB/32 KB/ 48 KB96 KB64 KBConfigurable up to 96 KB

系统规格

SpecificationDGX-1 (Tesla P100)DGX-1 (Tesla V100)
GPU8x Tesla P100 GPUs8x Tesla V100 GPUs
TFLOPS170 (GPU FP16) + 3 (CPU FP32)1 (GPU Tensor PFLOP)
GPU Memory16 GB per GPU / 128 GB per DGX-1 Node16 GB or 32 GB per GPU / 128-256 GB per DGX-1 Node
CPUDual 20-core Intel® Xeon® E5-2698 v4Dual 20-core Intel® Xeon® E5-2698 v4
FP32 CUDA Cores28,672 Cores40,960 Cores
System MemoryUp to 512 GB 2133 MHz DDR4 LRDIMMUp to 512 GB 2133 MHz DDR4 LRDIMM
Storage4x 1.92 TB SSD RAID 04x 1.92 TB SSD RAID 0
Network InterconnectDual 10 GbE, 4 IB EDRDual 10 GbE, 4 IB EDR
System Dimensions866 D x 444 W x 131 H (mm)866 D x 444 W x 131 H (mm)
System Weight80 lbs80 lbs
Max Power TDP3200 W3200 W
Operating Temp10 - 35°C10 - 35°C

A100

A100 关键改进

  • Ampere 架构:使用 MIG 来将 A100 切分为更小的实例或者链接更多 GPU
  • Tensor Cores: 312 TFLOPs/s
  • NVLink: 更高的 throughput
  • MIG (multi-instance GPU): 一个 A100 可以切分为至多 7 个硬件层面隔离的实例
  • HBM2e: 更大的 HBM, 更快的 bandwidth, 更高的 DRAM 使用效率
  • structure sparsity: 稀疏运算可以带来 2 倍的算力提升

A100 技术规格

A100 80GB PCIeA100 80GB SXM
FP649.7 TFLOPS9.7 TFLOPS
FP64 Tensor Core19.5 TFLOPS19.5 TFLOPS
FP3219.5 TFLOPS19.5 TFLOPS
Tensor Float 32 (TF32)156 TFLOPS | 312 TFLOPS156 TFLOPS | 312 TFLOPS*
BFLOAT16 Tensor Core312 TFLOPS | 624 TFLOPS*312 TFLOPS | 624 TFLOPS*
FP16 Tensor Core312 TFLOPS | 624 TFLOPS*312 TFLOPS | 624 TFLOPS*
INT8 Tensor Core624 TOPS | 1248 TOPS*624 TOPS | 1248 TOPS*
GPU Memory80GB HBM2e80GB HBM2e
GPU Memory Bandwidth1,935 GB/s2,039 GB/s
Max Thermal Design Power (TDP)300W400W ***
Multi-Instance GPUUp to 7 MIGs @ 10GBUp to 7 MIGs @ 10GB
Form FactorPCIe
Dual-slot air-cooled or single-slot liquid-cooled
SXM
InterconnectNVIDIA® NVLink® Bridge
for 2 GPUs: 600 GB/s **
PCIe Gen4: 64 GB/s
NVLink: 600 GB/s
PCIe Gen4: 64 GB/s
Server OptionsPartner and NVIDIA-Certified Systems™ with 1-8 GPUsNVIDIA HGX™ A100-Partner and NVIDIA-Certified Systems with 4,8, or 16 GPUs NVIDIA DGX™ A100 with 8 GPUs

H100

H100 关键改进

  • Hopper 架构
  • Tensor Core: 更强的 tensor core
  • transformer engine: 加速基于 transformer 架构模型的训练
  • NVLink: 900GB/s 的 bandwidth
  • 2nd MIG: 支持 multi-tenant, multi-user 使用
  • DPX: 基于 DPX 指令集加速动态规划算法

H100 技术规格

H100 SXMH100 NVL
FP6434 teraFLOPS30 teraFLOPs
FP64 Tensor Core67 teraFLOPS60 teraFLOPs
FP3267 teraFLOPS60 teraFLOPs
TF32 Tensor Core*989 teraFLOPS835 teraFLOPs
BFLOAT16 Tensor Core*1,979 teraFLOPS1,671 teraFLOPS
FP16 Tensor Core*1,979 teraFLOPS1,671 teraFLOPS
FP8 Tensor Core*3,958 teraFLOPS3,341 teraFLOPS
INT8 Tensor Core*3,958 teraFLOPS3,341 teraFLOPS
GPU Memory80GB94GB
GPU Memory Bandwidth3.35TB/s3.9TB/s
Decoders7 NVDEC
7 JPEG
7 NVDEC
7 JPEG
Max Thermal Design Power (TDP)Up to 700W (configurable)350-400W (configurable)
Multi-Instance GPUsUp to 7 MIGS @ 10GB eachUp to 7 MIGS @ 12GB each
Form FactorSXMPCIe
dual-slot air-cooled
InterconnectNVIDIA NVLink™: 900GB/s
PCIe Gen5: 128GB/s
NVIDIA NVLink: 600GB/s
PCIe Gen5: 128GB/s
Server OptionsNVIDIA HGX H100 Partner and NVIDIA-
Certified Systems™ with 4 or 8 GPUs
NVIDIA DGX H100 with 8 GPUs
Partner and NVIDIA-Certified Systems with 1–8 GPUs
NVIDIA AI EnterpriseAdd-onIncluded

H200

H200 关键改进

  • 更高的 HBM 内存和带宽
  • 更高的 LLM inference 速度

H200 技术规格

H200 SXMH200 NVL
FP6434 teraFLOPS30 teraFLOPs
FP64 Tensor Core67 teraFLOPS60 teraFLOPs
FP3267 teraFLOPS60 teraFLOPs
TF32 Tensor Core*989 teraFLOPS835 teraFLOPs
BFLOAT16 Tensor Core*1,979 teraFLOPS1,671 teraFLOPS
FP16 Tensor Core*1,979 teraFLOPS1,671 teraFLOPS
FP8 Tensor Core*3,958 teraFLOPS3,341 teraFLOPS
INT8 Tensor Core*3,958 teraFLOPS3,341 teraFLOPS
GPU Memory141GB141GB
GPU Memory Bandwidth4.8TB/s4.8TB/s
Decoders7 NVDEC
7 JPEG
7 NVDEC
7 JPEG
Confidential ComputingSupportedSupported
Max Thermal Design Power (TDP)Up to 700W (configurable)Up to 600W (configurable)
Multi-Instance GPUsUp to 7 MIGS @ 18GB eachUp to 7 MIGS @ 18GB each
Form FactorSXMPCIe
dual-slot air-cooled
InterconnectNVIDIA NVLink™: 900GB/s
PCIe Gen5: 128GB/s
2- or 4-way NVIDIA NVLink bridge: ** 900GB/s** per GPU
PCIe Gen5: 128GB/s
Server OptionsNVIDIA HGX H200 Partner and NVIDIA-
Certified Systems™ with 4 or 8 GPUs
NVIDIA MGX™ H200 NVL partner and
NVIDIA-Certified Systems with up to 8 GPUs
NVIDIA AI EnterpriseAdd-onIncluded

相比于 H100, H200 升级了 HBM 和 bandwidth

B200

B200 关键改进

  • blackwell 架构: GPU 之间的通信效率大幅度提升
  • Grace CPU: GPU 可以与 Grace CPu 之间达到 900GB/s 的 bidirectional bandwidth
  • 5th NVIDIA NVLink: 可以链接 576 块 GPU 来支持计算,NVlink 的带宽可以达到 130TB/s
  • RAS engine: 自动识别故障来提高效率
  • NVIDIA networking

B2100 技术规格

system specification 如下

SpecificationGB200 NVL72GB200 NVL4HGX B200
NVIDIA Blackwell GPUs | Grace CPUs72 | 364 | 28 | 0
CPU Cores2,592 Arm® Neoverse V2 Cores144 Arm Neoverse V2 Cores-
Total NVFP4 Tensor Core²1,440 | 720 PFLOPS80 | 40 PFLOPS144 | 72 PFLOPS
Total FP8/FP6 Tensor Core²720 PFLOPS40 PFLOPS72 PFLOPS
Total Fast Memory31 TB1.8 TB1.4 TB
Total Memory Bandwidth576 TB/s32 TB/s62 TB/s
Total NVLink Bandwidth130 TB/s7.2 TB/s14.4 TB/s

individual specification 如下

SpecificationGB200 NVL72GB200 NVL4HGX B200
FP4 Tensor Core20 PFLOPS20 PFLOPS18 PFLOPS
FP8/FP6 Tensor Core²10 PFLOPS10 PFLOPS9 PFLOPS
INT8 Tensor Core²10 POPS10 POPS9 POPS
FP16/BF16 Tensor Core²5 PFLOPS5 PFLOPS4.5 PFLOPS
TF32 Tensor Core²2.5 PFLOPS2.5 PFLOPS2.2 PFLOPS
FP3280 TFLOPS80 TFLOPS75 TFLOPS
FP64 / FP64 Tensor Core40 TFLOPS40 TFLOPS37 TFLOPS
GPU Memory
Bandwidth
186 GB HBM3E
8 TB/s
186 GB HBM3E
8 TB/s
180 GB HBM3E
7.7 TB/s
Multi-Instance GPU (MIG)-7-
Decompression Engine-Yes-
Decoders-7 NVDEC³
7 nvJPEG
-
Max Thermal Design Power (TDP)Configurable up to 1,200 WConfigurable up to 1,200 WConfigurable up to 1,000 W
Interconnect-Fifth-generation NVLink: 1.8 TB/s
PCIe Gen5: 128 GB/s
-
Server OptionsNVIDIA GB200 NVL72 partner and NVIDIA-Certified Systems™ with 72 GPUsNVIDIA MGX partner and NVIDIA-Certified SystemsNVIDIA HGX B200 partner and NVIDIA-Certified Systems with 8 GPUs

B300

B300 关键改进

  • Blackwell 架构
  • AI reasoning inference: 支持 test-time scaling, 对 attention layer 和 FLOPs 都有加速
  • HBM3e: 支持更大的 batch size 和 throughput
  • ConnectX-8 SuperNIC, 一个 host2 个 ConnectX-8 设备,支持 800Gb/s 的 GPU 之间通信
  • Grace-CPU: 更强的表现和带宽
  • 5th NVIDIA NVLink: 更高的通信效率

B3100 技术规格

system specification 如下

GB300 NVL72HGX B300
Blackwell Ultra GPUs| Grace CPUs72 | 368 | 0
CPU Cores2,592 Arm Neoverse V2 Cores-
Total FP4 Tensor Core1 1,440 PFLOPS | 1,080 PFLOPS144 PFLOPS | 108 PFLOPS
Total FP8/FP6 Tensor Core2 720 PFLOPS72 PFLOPS
Total Fast Memory37 TB2.1 TB
Total Memory Bandwidth576 TB/s62 TB/s
Total NVLink Switch Bandwidth130 TB/s14.4 TB/s

individual specification 如下

GB300 NVL72HGX B300
FP4 Tensor Core20 PFLOPS | 15 PFLOPS18 PFLOPS | 14 PFLOPS
FP8/FP6 Tensor Core210 PFLOPS9 PFLOPS
INT8 Tensor Core2330 TOPS307 TOPS
FP16/BF16 Tensor Core5 PFLOPS4.5 PLFOPS
TF32 Tensor Core22.5 PFLOPS2.2 PFLOPS
FP3280 TFLOPS75 TFLOPS
FP64/FP64 Tensor Core1.3 TFLOPS1.2 TFLOPS
GPU Memory | Bandwidth279 GB HBM3E | 8 TB/s270 GB HBM3E | 7.7 TB/s
Multi-Instance GPU (MIG)77
Decompression EngineYesYes
Decoders7 NVDEC3
7 nvJPEG
7 NVDEC3
7 nvJPEG
Max Thermal Design Power (TDP)Configurable up to 1,400 WConfigurable up to 1,100 W
InterconnectFifth-Generation NVLink: 1.8 TB/s
PCIe Gen6: 256 GB/s
Fifth-Generation NVLink: 1.8 TB/s
PCIe Gen6: 256 GB/s
Server OptionsNVIDIA GB300 NVL72 partner and
NVIDIA-Certified Systems™
NVIDIA HGX B300 partner and
NVIDIA-Certified Systems

References

Licensed under CC BY-NC-SA 4.0
Last updated on January 14, 2026 at 11:10 AM
Built with Hugo
Theme Stack designed by Jimmy