Introduction
DeepSeek-R1 (Guo et al., 2025) 是 DeepSeek 在 2025 年 1 月发布的一个使用 pure reinforcement learning 来提高模型 reasoning 能力的 LLM.
Method
Implementation
Result
- Guo, D., Yang, D., Zhang, H., Song, J., Wang, P., Zhu, Q., Xu, R., Zhang, R., Ma, S., Bi, X., Zhang, X., Yu, X., Wu, Y., Wu, Z. F., Gou, Z., Shao, Z., Li, Z., Gao, Z., Liu, A., … Zhang, Z. (2025). DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature, 645(8081), 633–638. 10.1038/s41586-025-09422-z