D
DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]
DeepSeek's V4 technical paper documents a working FP4 quantization-aware training pipeline for a frontier-scale mixture-of-experts model — a meaningful step beyond the FP8 approach the V3 paper introduced. The...
May 11, 2026•13 min read