LLM
Unified perspective on dLLM and LLM
MLE和KL divergence之间的等价性推导
Machine Learning
Relationship between MLE and KL divergence
MLE和KL divergence之间的等价性推导
MLLM
Reasoning
Notes on MiMo-VL
MiMo-VL基于MiMo-7B,是一个多模态推理大语言模型
LLM
Hands on LLM(1) Tokenizer
Tokenizer总结与BPE的高效实现
LLM
Notes on attention bias
为什么transformer没有QKV bias
1
…
8
9
10
…
25