Post-training of DeepSeek-V4-Pro has been successfully...

DeepSeek V4-Pro's Post-Training on Huawei's Ascend 910C Is Not Just a PR Win

TL;DR

DeepSeek confirmed that post-training of its V4-Pro model was completed on Huawei's Ascend 910C chip, bypassing Nvidia hardware for that phase of development. That's a real milestone in China's AI self-reliance push — the gap between domestic and US-controlled silicon is narrower than it was two years ago. Whether full pre-training at frontier scale can follow remains genuinely open, and that gap is still large.

Key Takeaways

DeepSeek confirmed post-training of V4-Pro on Huawei Ascend 910C chips — the first publicly acknowledged use of domestic Chinese silicon for a training phase on a frontier-class model
DeepSeek-V3, the predecessor model, required approximately 2,788,000 Nvidia H800 GPU-hours for pre-training, as detailed in DeepSeek's December 2024 technical report; the shift to Ascend for V4-Pro's post-training phase marks a deliberate departure
Post-training — covering instruction fine-tuning, reinforcement learning from human feedback, and preference optimization — is roughly one to two orders of magnitude less compute-intensive than pre-training; the milestone is genuine, but the harder test (full pre-training at scale) remains unconfirmed on domestic chips
US export controls enacted in October 2023 and expanded in October 2024 barred Nvidia from shipping A100, H100, and H800 chips to Chinese buyers, according to the U.S. Bureau of Industry and Security, accelerating Chinese demand for Huawei's Ascend line
Specific performance comparisons between Ascend 910C and Nvidia H100 or H200 are not independently benchmarked in publicly available third-party research as of mid-2025; treat vendor-sourced comparisons with appropriate skepticism
Unitree Robotics, recently cleared for IPO by Chinese regulators, exemplifies the class of physical AI company now structurally dependent on domestic chip supply as export restrictions persist
The software side is maturing alongside hardware: Chinese labs have been rewriting training frameworks — including PyTorch-Ascend integrations — to run efficiently on Huawei silicon, a dynamic that has been underreported relative to the chip headlines

Hangzhou Ships a Milestone — and Context Is Everything

DeepSeek is based in Hangzhou, a city whose AI character runs closer to commercial pragmatism than to Beijing's policy apparatus or Shenzhen's hardware-first culture. Hangzhou produced Alibaba; it tends to ship things that work rather than announce things that sound impressive. When DeepSeek's team confirmed that post-training for V4-Pro was completed on Huawei's Ascend 910C, the signal worth paying attention to is not the headline — it's that a frontier Chinese AI lab treated domestic chips as production infrastructure, not as a demonstration.

The reflex reaction in Western coverage tends toward one of two framings: either this proves China has solved its chip problem, or it's a politically motivated press release with no technical substance. Both readings miss what's actually specific about this announcement.

DeepSeek's V3 was pre-trained on approximately 2.788 million Nvidia H800 GPU-hours. That foundational run happened on US-origin hardware that Chinese labs can no longer legally acquire. The question hanging over every lab in Beijing, Hangzhou, and Shenzhen since those controls bit is not "can we do fine-tuning on domestic chips" — it's "can we eventually close the loop entirely." V4-Pro's post-training on the Ascend 910C is a confirmed answer to the smaller version of that question. The answer is yes, for post-training. Pre-training at frontier scale is still the harder, still-open question.

What the Ascend 910C Actually Is — and What We Don't Know

Huawei's Ascend line has iterated quickly. The 910B was positioned as a domestic alternative to the Nvidia A100 — functional for training medium-scale models, though Chinese labs documented reliability challenges and software ecosystem gaps throughout 2023 and 2024. The 910C is a step up: higher memory bandwidth, improved throughput, and better support for the mixed-precision operations that modern LLM training requires. Huawei's Ascend product documentation describes the architecture at a high level.

What I won't offer here are specific FLOP-per-second comparisons with Nvidia H100 or H200 positioned as settled fact, because independent third-party benchmarking of those comparisons isn't publicly available. What researchers at CSET (Georgetown's Center for Security and Emerging Technology) and others who track China's semiconductor capacity have noted is that the gap between Huawei's best accelerators and Nvidia's export-controlled tiers has been narrowing, particularly for inference and fine-tuning workloads — exactly the regime where V4-Pro's post-training would sit.

The software co-evolution story is the part most Western coverage underplays. CANN, Huawei's compute architecture for neural networks, has absorbed significant engineering investment. DeepSeek and other Chinese labs haven't been passively waiting for better chips; they've been rewriting training frameworks to extract usable efficiency from what's available. That's the same optimization instinct that made DeepSeek's compute-efficiency story compelling when V3 launched. The engineers who learned to do more with fewer H800s are now applying the same approach to Ascend hardware.

Why Post-Training Is the Right Starting Point — and What Comes Next

This distinction deserves more attention than it usually gets, because collapsing "post-training" and "pre-training" into "AI training" produces bad analysis.

Pre-training — building a base model from scratch on internet-scale corpora — requires sustained, massively parallel compute for weeks or months. Thousands of high-end GPUs running reliably in concert, tight interconnects, very low failure rates. This is where Nvidia's hardware plus CUDA ecosystem moat is deepest. DeepSeek V3's pre-training run consumed the equivalent of 2.788 million H800 GPU-hours; modest efficiency losses on less mature hardware translate directly into longer timelines and higher costs.

Post-training — the instruction-tuning, RLHF, and alignment work that teaches a base model to behave coherently and follow instructions — runs on a fraction of that compute. Weeks rather than months, hundreds of GPUs rather than thousands. It's important work, but it's a different class of engineering problem.

Starting here was sensible. DeepSeek validated the Ascend 910C stack on a real production workload where engineering failures are recoverable, accumulated learnings about the chip's actual behavior at scale, and produced a public signal that domestic silicon is production-ready for at least one meaningful phase of frontier model development. The next test — whether a Chinese lab attempts full pre-training of a frontier-scale model on Ascend hardware and publishes the technical details — is when the story becomes genuinely decisive.

The Export Control Arc: What the Ban Actually Changed

The US controls on advanced AI chips — October 2023, then significantly expanded October 2024 — were designed to slow Chinese AI development by restricting access to the most capable training hardware. The effect was real but delayed: labs that had pre-purchased Nvidia H800 inventories kept using them, and H800s continued appearing in Chinese research papers published well into 2025. DeepSeek's V3 technical report was built on that stockpile.

The constraint becomes structural as stockpiles deplete and training compute requirements scale. According to Reuters and the Financial Times, Huawei has been delivering Ascend 910C units at volume to Chinese hyperscalers and AI labs throughout 2024 and 2025; exact shipment figures are disputed, but the directional story is observable in the output: Chinese labs are producing frontier-class models with decreasing stated dependence on US-origin chips.

For the strategic picture on what Huawei's involvement means specifically for DeepSeek's model development, the companion coverage we published on Huawei's role in refining DeepSeek's model stack goes deeper on the chip-software co-development dynamic — worth reading alongside this piece.

Unitree and Physical AI: Why This Story Extends Beyond LLMs

The Unitree reference in coverage of this announcement isn't a non-sequitur — it's context for where the hardware independence story is traveling.

Unitree Robotics, based in Hangzhou, builds quadrupedal robots and humanoid platforms that have become reference hardware for the global robotics research community. Its recent IPO clearance by Chinese regulators positions it alongside AGIBOT as part of what analysts are describing as an emerging domestic humanoid robot duopoly. Robotics inference — running the neural networks that control real-time movement, perception, and manipulation — has different compute requirements than LLM training: lower latency, tighter power budgets, edge deployment contexts.

But it still requires AI accelerators. A regulatory environment that blocks Nvidia chip sales affects robotics inference hardware as much as LLM training clusters, on a slightly different timeline. If Chinese AI labs can demonstrate that domestic chips handle LLM post-training reliably today, that's groundwork for the robotics inference use case at scale tomorrow. The physical AI story and the chip self-reliance story are the same story at different time horizons.

Chip Comparison: What's Confirmed and What Isn't

Chip	Vendor	Status for Chinese buyers	Primary use case	Key uncertainties
H100 / H200	Nvidia	Export-controlled since Oct 2023	Pre-training, large-scale fine-tuning	Unavailable to Chinese labs
H800	Nvidia	Export-controlled since Oct 2024	Pre-training (reduced bandwidth)	Existing stockpiles only
A800	Nvidia	Export-controlled	Mid-scale training	Existing stockpiles only
Ascend 910B	Huawei	Available	Fine-tuning, inference, mid-scale training	Reliability reports mixed; software gaps documented
Ascend 910C	Huawei	Available and ramping	Post-training, fine-tuning, inference	Full pre-training at frontier scale: unverified
Biren BR100	Biren Technology	Limited	Inference	Software ecosystem at early stage
Cambricon MLU series	Cambricon	Available	Inference, edge	Limited large-scale LLM training support

Performance figures for Ascend 910C against Nvidia H100 or H200 are not independently benchmarked in publicly available research as of mid-2025. This table reflects confirmed availability and use cases, not vendor-claimed performance parity.

What Would Have to Happen Before "Full Hardware Independence" Is the Right Frame

The V4-Pro post-training milestone is real. Here is the checklist of what would need to follow before the stronger claim can be made with confidence:

Pre-training confirmation: A Chinese lab publishes a credible technical report showing a frontier-scale model (100B+ effective parameters) was pre-trained from scratch on Huawei Ascend or other domestic chips, with compute and loss curves disclosed
MFU transparency: Model FLOP utilization figures for Ascend 910C in production training runs are published or independently verified — currently these numbers are opaque or vendor-sourced
Scale reliability: Reports of sustained multi-week training runs on Ascend cluster sizes comparable to DeepSeek's V3 setup (2,000+ chips in parallel) without significant hardware failure rates
Software stack parity: CANN and PyTorch-Ascend compatibility issues are documented and systematically resolved; third-party developer tooling reaches workable parity with CUDA-based workflows
Reproducibility across labs: Other frontier Chinese labs — Zhipu AI, Moonshot (Kimi), Baidu — independently confirm post-training workloads on Ascend without major undisclosed engineering workarounds
Policy counter-moves tracked: US government responses — further entity list expansions, equipment controls on Huawei's own chip fabrication supply chain — will shape how much runway Chinese labs have to close remaining gaps

None of these are reasons to dismiss what was demonstrated. They're reasons to read the next set of announcements carefully rather than extrapolating from this one.

Where This Is Heading

The chip gap is a shrinking asymmetry, not a permanent wall. Nvidia's position was never a ceiling — it was a multi-year head start in hardware plus a decade of CUDA ecosystem lock-in. Export controls slowed Chinese access to that hardware but simultaneously accelerated domestic investment at a scale that wouldn't have materialized under normal commercial competition. The question is pace, not direction.

Software co-evolution is the underrated factor. Chinese AI labs are rewriting their training stacks around what's available domestically. That optimization pressure is precisely what drove DeepSeek's efficiency achievements in V3. The same instinct applied to Ascend hardware is not a coincidence — it's the same engineering culture.

Physical AI creates the next test bed. Unitree, AGIBOT, and the cluster of robotics startups concentrated in Shenzhen and Hangzhou will need to train and deploy at scale on domestic chips. Whether Ascend and its successors can handle real-time inference for embodied AI — more demanding in latency and edge constraints than LLM serving — is the next performance test to watch.

The policy ratchet only tightens. US semiconductor export controls have become durable across administrations. The incentive for Chinese labs to close the hardware independence gap is structural and growing, not contingent on a single administration's posture.

The pre-training question is the next real signal. If DeepSeek or another lab announces that a next-generation base model's full training run was completed on domestic chips and backs it with a technical report, that changes the strategic calculus for anyone tracking China's AI trajectory. It's worth having that scenario in your planning horizon before the announcement arrives.

FAQ

Does V4-Pro's post-training on Ascend mean DeepSeek no longer needs Nvidia hardware? Not yet. Post-training is one phase. The pre-training of whatever base model underpins V4-Pro has not been confirmed as running on domestic chips. Concluding full Nvidia independence from a post-training announcement would significantly overstate what's been demonstrated.

Is the Ascend 910C comparable to the Nvidia H100? No independent third-party benchmark comparison is publicly available as of mid-2025. Huawei and allied sources claim competitive performance for certain workloads; those claims haven't been verified against the H100 in controlled settings. The honest answer is: we don't have confirmed numbers.

Should Western companies building on DeepSeek models worry about chip provenance? The model weights are unaffected by what hardware they were trained on. If you're using DeepSeek via API or running open-weight versions locally, the chip story is upstream of your deployment. The more substantive questions for enterprise users are around data governance, model auditability, and supply chain transparency — not chip origin.

How does this affect Western AI labs like OpenAI or Anthropic? Directly, it doesn't — they retain full access to the Nvidia stack. The strategic implication is longer-horizon: if Chinese labs can build and iterate frontier models without US-controlled hardware, export controls provide less durable competitive insulation than their architects intended.

What's the significance of "post-training" as the specific framing? It's precise language, and the precision is intentional. "Post-training on Ascend 910C" is a limited but credible claim; "trained on Ascend" would be a much stronger and currently unsupportable one. Chinese AI labs have learned — partly from the international scrutiny DeepSeek itself attracted after V3 — that overclaiming invites credibility damage. Careful framing suggests the team knows exactly what they've demonstrated and what they haven't.

Where should I track this story reliably? CSET at Georgetown publishes rigorous research on China's AI hardware ecosystem. Reuters and the Financial Times cover Huawei's chip business with sourced reporting. DeepSeek's GitHub repositories and technical papers remain the primary source for model-level claims — the lab has been notably transparent for a Chinese AI company, which makes their announcements more credible than most.

What's the connection to Unitree that keeps appearing in coverage of this story? It's a structural connection, not a direct one. Unitree represents the class of Chinese physical AI company that, like Chinese LLM labs, will be forced to build on domestic chips as Nvidia access remains restricted. The chip self-reliance question that DeepSeek is answering for LLMs is the same question Unitree and its robotics peers will need to answer for edge AI inference. The stories share an underlying driver.