GPU vs TPU: A Comprehensive Guide to Specialized Hardware Accelerators

Understanding the architecture, performance characteristics, and optimal use cases for GPUs and TPUs in modern computing.

Introduction

我们在前面介绍了关于大语言模型的 scaling law, 如 Kaplan scaling law (Kaplan et al., 2020), Chinchilla scaling law (Hoffmann et al., 2022). 其核心结论为，大语言模型的能力随算力，模型大小，数据量的提升而提升。其中，算力由 GPU/TPU/NPU 决定，因此，我们在本节介绍这些硬件的相关知识。

Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., de Las Casas, D., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., van den Driessche, G., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., … Sifre, L. (2022). Training Compute-Optimal Large Language Models. https://arxiv.org/abs/2203.15556
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling Laws for Neural Language Models. https://arxiv.org/abs/2001.08361

GPU vs TPU: A Comprehensive Guide to Specialized Hardware Accelerators

Author

Updated

PDF

Introduction

GPU

TPU