GPU vs TPU: A Comprehensive Guide to Specialized Hardware Accelerators

Understanding the architecture, performance characteristics, and optimal use cases for GPUs and TPUs in modern computing.

Author

Updated

May, 14, 2026

PDF

Introduction

我们在前面介绍了关于大语言模型的 scaling law, 如 Kaplan scaling law (Kaplan et al., 2020), Chinchilla scaling law (Hoffmann et al., 2022). 其核心结论为,大语言模型的能力随算力,模型大小,数据量的提升而提升。 其中,算力由 GPU/TPU/NPU 决定,因此,我们在本节介绍这些硬件的相关知识。

  1. Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., de Las Casas, D., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., van den Driessche, G., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., … Sifre, L. (2022). Training Compute-Optimal Large Language Models. https://arxiv.org/abs/2203.15556
  2. Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling Laws for Neural Language Models. https://arxiv.org/abs/2001.08361

GPU

TPU