Introduction
我们在前面介绍了关于大语言模型的 scaling law, 如 Kaplan scaling law (Kaplan et al., 2020), Chinchilla scaling law (Hoffmann et al., 2022). 其核心结论为,大语言模型的能力随算力,模型大小,数据量的提升而提升。 其中,算力由 GPU/TPU/NPU 决定,因此,我们在本节介绍这些硬件的相关知识。
- Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., de Las Casas, D., Hendricks, L. A., Welbl, J., Clark, A., Hennigan, T., Noland, E., Millican, K., van den Driessche, G., Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., … Sifre, L. (2022). Training Compute-Optimal Large Language Models. https://arxiv.org/abs/2203.15556
- Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling Laws for Neural Language Models. https://arxiv.org/abs/2001.08361