Overview of MiniCPM series

MiniCPM-5

1.08B parameters, 128K context length

MiniCPM5-1B Training Pipeline

MiniCPM5-1B Training Pipeline. Sourced from https://huggingface.co/openbmb/MiniCPM5-1B

pretraining data: Ultra Fineweb, Ultra-fineweb-l3, ultradata-math

post-training: multi-teacher OPD

SFT: 200B deep-thinking SFT, 200B hybrid thinking SFT (UltraData-SFT-2605)

RL:

performance

MiniCPM5-1B Score Gain from RL+OPD

MiniCPM5-1B Score Gain from RL+OPD. Sourced from https://huggingface.co/openbmb/MiniCPM5-1B

MiniCPM5-1B overlong response ratio drop from RL+OPD

MiniCPM5-1B overlong response ratio drop from RL+OPD. Sourced from https://huggingface.co/openbmb/MiniCPM5-1B