1.08B parameters, 128K context length
MiniCPM5-1B Training PipelineMiniCPM5-1B Training Pipeline. Sourced from https://huggingface.co/openbmb/MiniCPM5-1B
pretraining data: Ultra Fineweb, Ultra-fineweb-l3, ultradata-math
post-training: multi-teacher OPD
SFT: 200B deep-thinking SFT, 200B hybrid thinking SFT (UltraData-SFT-2605)
RL:
- DAPO-math-17k (JustRL)
- Trivial QA
- NQ-Open
- LongWriter-Zero-RLData
performance
MiniCPM5-1B Score Gain from RL+OPDMiniCPM5-1B Score Gain from RL+OPD. Sourced from https://huggingface.co/openbmb/MiniCPM5-1B
MiniCPM5-1B overlong response ratio drop from RL+OPDMiniCPM5-1B overlong response ratio drop from RL+OPD. Sourced from https://huggingface.co/openbmb/MiniCPM5-1B