DeepSeek-R1: the first open sourced reasoning models

Introduction

DeepSeek-R1 (Guo et al., 2025) 是 DeepSeek 在 2025 年 1 月发布的一个使用 pure reinforcement learning 来提高模型 reasoning 能力的 LLM.

Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X., Zhang, X., Yu, X., Wu, Y., Wu, Z. F., Gou, Z., Shao, Z., Li, Z., Gao, Z., Liu, A., … Zhang, Z. (2025). DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature, 645(8081), 633–638. 10.1038/s41586-025-09422-z