Mao Song(毛松)'s Homepage

Home Blog Browse About

Browse

Filter the full list by keyword, category, or tags. Matching is instant in your browser.

Search

Categories

Pick one category to filter. Click again to clear.

Tags

Multiple tags narrow the list: a post must include every selected tag.

Archive

LLM Scaling law May, 11, 2026
Publish‑ready workflow that lets you focus on ideas, not infrastructure

tutorial
LLM Hands on LLM (1) Tokenizer May, 11, 2026
Tokenizer 总结与 BPE tokenizer 的高效实现

Tokenizer
LLM Hands on LLM(2) Transformer May, 11, 2026
基于Qwen3讲解transformer的架构以及核心代码

cs336 Transformer
Overview of unified MLLM May, 11, 2026
Overview of unified MLLM

tutorial
MLLM Overview of Visual Foundation Models May, 11, 2026
Zhipu
Infra Theory of scalability May, 11, 2026
roofline GPU
LLM Overview of Policy Optimization Methods May, 11, 2026
Qwen RL
LLM Overview of position encoding May, 11, 2026
position encoding
LLM MoE tutorial May, 11, 2026
本 blog 详细介绍了 MoE 模型的一些关键设计与相关实验结果，为 MoE 模型的学习提供基础。

MoE Architecture
LLM Notes on olmoe May, 11, 2026
AllenAI 在 24 年 9 月提出了 olmoe, 一个全开源的基于 MoE 架构的大语言模型，参数量为 7B-A1B，作者详细介绍了模型的设计，数据以及训练策略. 论文获得了ICLR2025 oral

MoE Oral Allen AI
LLM Overview of optimizers for LLMs May, 11, 2026
Overview of optimizers for LLMs

tutorial
Infra Overview of Parallelism May, 11, 2026
parallelism Google
LLM Overview of Long Context capabilities May, 11, 2026
Long Context position encoding
Math Math Foundations May, 11, 2026
tutorial
MLLM Overview of MiMo series May, 11, 2026
xiaomi
MLLM Overview of MiniMax series May, 11, 2026
Attention MoE MiniMax
MLLM Overview of Kimi series May, 11, 2026
kimi Reasoning
Machine Learning KL divergence from machine learning to reinforcement learning May, 11, 2026
在强化学习中，KL divergence 常被用作 policy 正则项，但很多不稳定现象并非来自 KL 本身，而是来自其估计方式。本文展示了为什么“无偏的 KL 估计”并不能保证“无偏的 KL 梯度”，并系统分析了不同 KL estimator 在 on-policy 与 off-policy 场景下的行为差异。通过理论推导与实验验证，文章揭示了 KL 作为 loss 与 reward shaping 时的本质区别，并解释了实践中低方差 KL 设计背后的原因
LLM LLM FLOPs Computation May, 11, 2026
我们介绍如何计算基于 transformer 架构的 LLM 的 FLOPs, 计算完成之后，我们可以推导出算力 $C$ 与模型参数量 $N$，数据集大小 $D$ 之间的关系，即 $C\approx 6ND$.

Transformer Scaling Law
LLM LLM Memory Computation May, 11, 2026
本文中，我们将介绍如何计算 LLM 在训练和推理过程中的内存需求以及简要介绍对应的优化方法。

Transformer Training Inference
LLM LLM Parameter Computation May, 11, 2026
我们介绍一下如何计算 LLM 的参数量。我们将基于 Qwen3 模型架构出发，对模型架构进行拆解，然后给出 LLM 参数量计算公式。

distributed training
MLLM Overview of Gemini Series May, 11, 2026
Gemini 3.0 是是 Google 新一代最强模型，model card 介绍了 Gemini 3.0 系列的评估结果以及基本能力

Google MoE
LLM Overview of GPT series May, 11, 2026
openAI 发布了 gpt-oss 大语言模型，包含 120B-A5.1B 以及 20.9B-A3.6B 两个 size, 作者强调了模型的 instruction following, tool use, 以及 adaptive thinking 能力

openAI
MLLM Overview of InternVL series May, 11, 2026
InternVL
MLLM Overview of Keye-VL series May, 11, 2026
Kuaishou Reasoning Video
Bringing paper to life: A modern template for scientific writing May, 11, 2026
Publish‑ready workflow that lets you focus on ideas, not infrastructure

research template
LLM Overview of Attention Mechanism May, 11, 2026
A hands-on guide to understanding and implementing the Attention mechanism in deep learning models.

attention deep learning cs336
LLM Overview of DeepSeek series May, 11, 2026
DeepSeek
LLM Reinforcement Learning for Large Language models: An Overview May, 11, 2026
tutorial
Overview of RLHF May, 11, 2026
Overview of RLHF

tutorial
LLM Overview of Qwen series May, 11, 2026
Qwen在24年1月份发布了Qwen1.5，包含 0.5B, 1.8B, 4B, 7B, 14B, 32B, 72B, 以及 110B 6个size，还有一个MoE模型。

Qwen
MLLM Overview of Qwen-VL series May, 11, 2026
Overview of Qwen-VL series

Qwen
LLM Overview of Gemma series May, 11, 2026
Overview of Gemma series

Long context
MLLM Overview of GLM series May, 11, 2026
Zhipu Reasoning
LLM Overview of LLaMA series May, 11, 2026
LLaMA
MLLM Ovewview of Multimodal Large Language Models May, 11, 2026
Apple
Overview of OPD May, 11, 2026
Overview of OPD

tutorial
LLM Overview of Flash Attention series May, 11, 2026
作者提出了 flashattention, 一个通过降低 multi head attention 内存访问开销来提高 attention 计算效率的方法

Attention

© 2026 Mao Song(毛松)'s Homepage

Built with Astro