article

DeepSeek-V4 and Huawei Ascend: the model, the compute stack, and the pressure on Nvidia

A grounded analysis of DeepSeek-V4, Huawei Ascend, coding agents, API pricing, and the pressure this combination puts on Claude, OpenAI, Nvidia, and the global AI market.

PublisherWayDigital

Published2026-04-24 05:44 UTC

Languageen

Regionglobal

CategoryProduct Notes

DeepSeek-V4 and Huawei Ascend: the model, the compute stack, and the pressure on Nvidia

DeepSeek-V4 matters because it puts three things together that rarely arrive in one release: a one-million-token context window, stronger coding-agent capability, and API prices far below the leading closed frontier models. It also brings a second story into focus: Chinese domestic AI compute, especially Huawei Ascend, is moving from a backup option into the center of model training, adaptation, and inference economics.

DeepSeek-V4 official performance chart — Performance chart from DeepSeek’s official Hugging Face model page. Source: DeepSeek / Hugging Face.

What DeepSeek-V4 actually delivers

DeepSeek-V4 comes in two versions: DeepSeek-V4-Pro and DeepSeek-V4-Flash. The official model card lists V4-Pro as a MoE model with 1.6T total parameters and 49B activated parameters. V4-Flash has 284B total parameters and 13B activated parameters. Both support a 1M-token context window. DeepSeek’s API documentation also lists a 384K-token maximum output and support for both non-thinking and thinking modes.

The point is not just size. The point is long-task usability. Many models look strong in short prompts, then become expensive or unstable when asked to handle full repositories, long documents, multi-step tool use, and multi-file code changes. V4 treats long context as a default capability rather than a rare premium feature.

DeepSeek says V4 uses a hybrid attention design that combines CSA and HCA, together with DSA-style sparse attention. In the 1M-token setting, DeepSeek reports that V4-Pro needs only 27% of the single-token inference FLOPs and 10% of the KV cache required by DeepSeek-V3.2. That is the technical reason the million-token window can become a practical product feature.

DeepSeek-V4 long-context efficiency illustration — DeepSeek-V4 long-context and efficiency illustration. Source: DeepSeek release materials.

Coding and agents: V4 is trying to win workflows, not demos

The most important shift is that DeepSeek-V4 is aimed at coding agents and real workflows, not just chat or benchmark answers. The official model card says V4-Pro-Max reaches top-tier coding benchmark performance and narrows the gap with leading closed models on reasoning and agentic tasks. DeepSeek also supports both OpenAI-style and Anthropic-style APIs, which makes it easier to plug V4 into coding-agent stacks such as Claude Code-like tools, OpenCode, OpenClaw, and CodeBuddy.

Coding capability has two layers. The first is whether a model can write code: functions, scripts, pages, and small fixes. The second is whether it can finish engineering work: understand a repo, edit multiple files, run commands, read errors, recover, and keep going. V4’s value is mainly in the second layer. A 1M-token context window gives it more room for repositories, docs, logs, and tool traces. Agent optimization makes it more useful in multi-step work. Lower pricing makes more attempts economically possible.

This is where it pressures Claude Code and OpenAI’s agent ecosystem. Claude Code remains stronger as a finished product across terminal, IDE, web, desktop, permissions, and diff review. OpenAI remains stronger as a broad platform with the Responses API, Agents SDK, web search, containers, computer use, and multimodal tooling. DeepSeek-V4 attacks from a different angle: open weights, lower prices, long context, and a model that teams can wire into their own systems.

Pricing is the immediate shock

DeepSeek’s official API pricing is per 1M tokens:

DeepSeek-V4-Flash: $0.028 cached input, $0.14 cache-miss input, and $0.28 output.
DeepSeek-V4-Pro: $0.145 cached input, $1.74 cache-miss input, and $3.48 output.

Anthropic lists Claude Opus 4.7 at $5 input and $25 output per 1M tokens, and Claude Sonnet 4.6 at $3 input and $15 output. OpenAI’s pricing page lists GPT-5.4 at $2.50 input, $0.25 cached input, and $15 output. GPT-5.5 appears on the same OpenAI pricing page but is marked “coming soon,” so it should not be treated as a generally available comparison point.

The gap is large. V4-Pro’s output price is a fraction of Claude Sonnet 4.6 and GPT-5.4. V4-Flash pushes even harder. For enterprise workloads, this is not a minor discount. Coding agents, long-document analysis, customer support knowledge bases, research workflows, and batch content processing all burn tokens quickly. A step-change in price changes which applications are worth running continuously.

Huawei Ascend: why the compute story matters

The hardware facts need to be handled carefully. DeepSeek’s official documents do not disclose the full training-cluster hardware stack for V4. However, 36Kr’s Intelligent Emergence reported that one reason for V4’s delay was related to moving the training framework from Nvidia to Huawei Ascend, and that DeepSeek had encountered a training failure and chip re-adaptation problem in mid-2025. Zhidx also reported that DeepSeek-V4 had been adapted and optimized for domestic chips including Huawei Ascend and Cambricon, and that V4-Pro’s current service throughput is constrained by high-end compute, with prices expected to fall further after Ascend 950 supernodes enter batch availability in the second half of the year.

That should not be rewritten as “DeepSeek officially confirmed every part of V4 was trained only on Ascend.” But it is enough to show the direction: DeepSeek-V4 has become a key test case for whether domestic compute can support frontier-model training, adaptation, and inference at scale.

Huawei Ascend ecosystem image — Huawei Ascend ecosystem. Source: HiAscend official website.

Ascend is not just one chip. The HiAscend site lists a broader stack: Atlas 900 A3 SuperPoD, Atlas 800T/800I A3 supernode servers, CANN, MindSpore, PyTorch adaptation, MindSpeed, MindIE, and MindCluster. For large-model economics, the key is not only peak single-chip performance. It is the coordination among chips, interconnect, compilers, communication libraries, training frameworks, inference engines, and model architecture.

DeepSeek-V4’s low price follows that system logic. MoE reduces per-token compute by activating only part of the model. CSA/HCA/DSA reduce long-context FLOPs and KV cache pressure. If domestic compute adaptation matures, supply-chain risk and unit compute cost can fall further. The model architecture helps it “compute less.” Systems engineering helps it “compute reliably.” Domestic accelerators help it “compute affordably.” The combination is the real story.

What it means for Nvidia and the global AI market

DeepSeek-V4 plus Ascend does not instantly replace Nvidia. The Nvidia stack still has major strengths: high-end GPUs such as H100/H200/B200, a mature CUDA ecosystem, cloud availability, developer habit, and battle-tested engineering. Frontier labs outside China will not abandon Nvidia because of one model release.

But the China market is changing. If strong domestic models can train, adapt, and serve efficiently on Ascend-class hardware, Chinese companies have a stronger reason to reduce dependence on restricted Nvidia accelerators. Under export controls, domestic compute is not only a cost choice. It is a supply-chain security choice.

The global effect is more subtle but important. Frontier AI competition has often been reduced to “who has more Nvidia GPUs.” DeepSeek-V4 shows that model architecture, training methods, inference efficiency, open-weight distribution, and domestic chip adaptation can also reshape the field. It does not end Nvidia’s lead. It does force the market to reprice Nvidia’s moat: CUDA remains powerful, but it is no longer the only possible answer.

Compared with Claude and OpenAI, V4’s strengths and limits are clear

Against Claude: Claude Opus 4.7 and Sonnet 4.6 remain strong in stable complex reasoning, product polish, and the full Claude Code workflow. DeepSeek-V4 is stronger on openness, price, 1M context, and deployability. If you want a finished coding assistant, Claude Code is easier. If you want a controllable, low-cost, replaceable model for your own agent stack, V4 is compelling.
Against OpenAI: OpenAI remains stronger as a platform, especially around tool use, Agents SDK, containers, search, multimodal features, and ecosystem maturity. DeepSeek-V4 is stronger on price, open weights, and long-context economics. It may not win every capability, but it can sharply reduce total cost in many enterprise workloads.
Against domestic Chinese models: V4 raises the bar by combining pricing, openness, long context, and agent capability. Chinese model companies now have to prove not only benchmark performance, but also low-cost serving, domestic-compute adaptation, and usefulness in real workflows.

Conclusion: V4 changes the default option

DeepSeek-V4 is not just a story about a Chinese model catching up with overseas models. It changes the default assumptions. Complex agents used to default to Claude or OpenAI. Long context used to default to expensive. Frontier AI training used to default to Nvidia. V4 loosens all three assumptions.

It is not a complete answer. Closed frontier models still have advantages in stability, tooling, multimodality, and product experience. Ascend still has to keep proving long-term stability at training and inference scale. But the question has changed. If an open model can offer million-token context, strong coding-agent capability, low API prices, and growing compatibility with domestic compute, then the cost structure, supply-chain structure, and competitive structure of global AI are all up for renegotiation.

DeepSeek-V4 and Huawei Ascend: the model, the compute stack, and the pressure on Nvidia

DeepSeek-V4 and Huawei Ascend: the model, the compute stack, and the pressure on Nvidia

What DeepSeek-V4 actually delivers

Coding and agents: V4 is trying to win workflows, not demos

Pricing is the immediate shock

Huawei Ascend: why the compute story matters

What it means for Nvidia and the global AI market

Compared with Claude and OpenAI, V4’s strengths and limits are clear

Conclusion: V4 changes the default option

Sources

Comments