Mao Song(毛松)'s Homepage

Fix Point Theorem

不动点定理

Notes on roofline model

roofline model 是 infra 的理论分析基础，为算法设计与优化提供思路

Notes on Step3-VL 10B

阶跃星辰在 26 年 1 月提出了 Step3-VL-10B, 一个强调 perception, complex reasoning 以及 human-centric alignment 的开源多模态大模型

Notes on Kimi-k2.5

Kimi 在 2026 年 2 月发布了 Kimi K2.5, 一个 multimodal agentic model, Kimi K2.5 基于 Kimi K2 开发，在预训练阶段使用了图文联合训练的方式，在 post-training 阶段使用了 zero-vision SFT 和 multimodal RL 来提高模型的 reasoning 能力以及泛化能力，Kimi K2.5 还提出了 Agent Swarm 来提高解决复杂任务的效率。

Machine Learning math RL

Notes on KL divergence

在强化学习中，KL divergence 常被用作 policy 正则项，但很多不稳定现象并非来自 KL 本身，而是来自其估计方式。本文展示了为什么“无偏的 KL 估计”并不能保证“无偏的 KL 梯度”，并系统分析了不同 KL estimator 在 on-policy 与 off-policy 场景下的行为差异。通过理论推导与实验验证，文章揭示了 KL 作为 loss 与 reward shaping 时的本质区别，并解释了实践中低方差 KL 设计背后的原因