DeepSeek V4 and R2 Deep Dive

Key Findings

After DeepSeek R1 was released in January 2025, it triggered a single-day loss of $589 billion in Nvidia's market capitalization — the largest single-day loss in stock market history — forcing a global reassessment of the industry narrative that "AI must depend on top-tier American compute"^[1]; CNBC reports that DeepSeek V4 is imminent, with markets anticipating another shockwave hitting Nasdaq tech stocks^[2]
DeepSeek V4 is expected to feature a next-generation dynamic computation architecture with 1 trillion (1T) parameters, incorporating a novel multi-head Conditional (mHC) attention mechanism, Engram Conditional Memory, and DSA sparse attention technology, pushing the context window to 1 million tokens and claiming to surpass GPT-5 and Gemini 3 Ultra across multiple benchmarks^[2]
The release of DeepSeek R2, the reasoning model, was delayed by several months, primarily because training on Huawei Ascend 910C chips failed — Ascend's inference performance reaches only 60% of the Nvidia H100^[4], and the maturity gap between the CANN software stack and the CUDA ecosystem forced DeepSeek to fall back to Nvidia GPUs to complete training^[3]
The Taiwanese government has banned the use of DeepSeek's cloud services across all government agencies^[8], but on-premises deployment of open-source models falls outside the scope of the ban — enterprises can privately deploy DeepSeek's open-source models while ensuring data sovereignty and regulatory compliance, which is the core strategic recommendation this article proposes for Taiwanese businesses^[10]

1. The Rise of DeepSeek: From Quantitative Hedge Fund to AGI Lab

The story of DeepSeek begins at an unconventional starting point. Its founder, Liang Wenfeng, was neither a Silicon Valley serial entrepreneur nor a well-known AI researcher from academia, but the founder of High-Flyer, a Chinese quantitative hedge fund. Established in 2015, High-Flyer rose quickly in the Chinese quantitative investment space, with assets under management surpassing tens of billions of RMB at its peak. Through quantitative trading, Liang developed a deep appreciation for the critical value of computing infrastructure. As early as 2021, he began large-scale procurement of Nvidia GPUs, and before the US imposed chip export controls on China, High-Flyer had accumulated more than 10,000 A100 GPUs — a stockpile of compute that would become the material foundation for DeepSeek's rise.

In May 2023, Liang formally established DeepSeek, positioning it as a pure research laboratory with artificial general intelligence (AGI) as its ultimate goal. This positioning stood in sharp contrast to most Chinese AI companies — Baidu, Alibaba, and ByteDance all developed their large models to serve their respective commercial ecosystems, while DeepSeek declared from day one that it would not pursue short-term commercialization and would instead focus on exploring the technical frontiers of AGI. In multiple internal memos, Liang emphasized that DeepSeek's mission was not to build a product but to answer a fundamental question: "How does general intelligence emerge in silicon-based systems?"

This pure research orientation, combined with the compute resources accumulated through the hedge fund, enabled DeepSeek to adopt a long-term strategy exceedingly rare in the Chinese tech industry. Its early team was primarily composed of top doctoral students from Tsinghua University, Peking University, and the Chinese Academy of Sciences — small in headcount but extremely high in technical density. Liang personally participated in the design and review of core algorithms, and his quantitative trading background gave him an almost obsessive pursuit of computational efficiency — how to extract maximum model performance from minimal compute. This DNA deeply shaped the technical trajectory of all subsequent DeepSeek models: rather than scaling up parameters or compute, the focus was on breakthroughs in architectural innovation and training efficiency.

From DeepSeek-Coder (a code generation model) in late 2023 to DeepSeek-V2 (which first introduced Multi-head Latent Attention and the DeepSeekMoE architecture) in mid-2024, DeepSeek iterated at an astonishing pace, with each generation of models delivering performance far exceeding expectations for their scale. But what truly captured global attention was the moment in January 2025 that changed the AI industry narrative — the release of DeepSeek R1.

2. DeepSeek R1 Retrospective: The Open-Source Reasoning Model That Shook the World

On January 20, 2025, DeepSeek released R1 with no advance notice — a 671-billion-parameter Mixture of Experts (MoE) reasoning model that activates only approximately 37 billion parameters per token^[1]. R1's technical paper and model weights were released simultaneously under the MIT License, allowing fully free commercial use. On virtually all mainstream benchmarks, R1 achieved performance on par with — and in some cases surpassing — OpenAI's top reasoning model at the time, o1, while its reported training cost was approximately $5.9 million, a fraction of what OpenAI spent training GPT-4.

R1's core technical innovation lay in its training paradigm. Unlike traditional supervised fine-tuning (SFT), R1 adopted an "RL-first" strategy: it first trained the base model using pure reinforcement learning (GRPO — Group Relative Policy Optimization) on math and coding tasks, allowing the model to autonomously learn reasoning without human-annotated examples — including self-reflection, hypothesis testing, and backtracking. Only then was a small amount of curated Chain-of-Thought data used for supervised fine-tuning, followed by RL alignment with human preferences. The breakthrough of this pipeline was that it demonstrated high-quality reasoning capabilities can "emerge" from reinforcement learning rather than depending entirely on expensive human-annotated data.

Benchmark Performance

R1's benchmark performance stunned the entire industry:

Benchmark	DeepSeek R1	OpenAI o1	Notes
AIME 2024	79.8%	79.2%	American Invitational Math Exam — R1 narrowly surpasses o1
MATH-500	97.3%	96.4%	Mathematical reasoning benchmark — near-perfect score
Codeforces Rating	1,962	1,891	Competitive programming — expert-level performance
GPQA Diamond	71.5%	75.7%	Graduate-level science questions — the only area where R1 slightly trails
MMLU	90.8%	91.8%	Massive Multitask Language Understanding — virtually tied
Chinese C-Eval	91.8%	83.2%	Chinese comprehensive ability — substantial lead

Market Impact: The $589B Shockwave

The market's reaction to R1's release was unprecedented. On January 27, 2025, Nvidia's stock plunged nearly 17% in a single day, wiping out approximately $589 billion in market value — the largest single-day market capitalization loss for any company in stock market history. The investor panic followed a clear and logical rationale: if a Chinese company could train an o1-tier model for less than $6 million using a batch of "outdated" A100 GPUs, did the entire "AI requires infinite compute" investment thesis need to be reconsidered? Were the hundreds of billions of dollars in projected Nvidia GPU demand severely overestimated?

R1 simultaneously dealt a devastating blow to the pricing structure of AI services. DeepSeek's API was priced at just $0.55 per million input tokens and $2.19 per million output tokens — roughly 96% cheaper than OpenAI o1's pricing. This was not an incremental cost optimization but a paradigm-shifting price disruption. OpenAI, Anthropic, and Google all reduced the prices of their respective reasoning models in the weeks following R1's release, and the entire industry was forced to redefine "the reasonable price point for AI reasoning services."

The deeper impact lay in the shift of narrative. Before R1, the prevailing Silicon Valley thesis held that cutting-edge AI capabilities belonged exclusively to American tech giants with access to top-tier compute, and that China could only develop second-rate models under chip controls. R1 shattered this assumption with hard evidence — it proved that with the right architectural design and training strategies, compute disadvantages can be dramatically narrowed. The fully open-source approach under the MIT License further enabled researchers and enterprises worldwide to freely use, modify, and deploy R1, accelerating the global diffusion of reasoning model technology.

3. DeepSeek V4: The Coming Technical Breakthrough

After R1 triggered global tremors, the AI community's attention turned to DeepSeek's next move. In late February 2026, multiple sources confirmed that DeepSeek was preparing to release two new models: DeepSeek V4 (the fourth generation of its general-purpose foundation model) and DeepSeek R2 (the second generation of its reasoning model)^[2]. Although the full technical specifications have not yet been officially disclosed, leaked internal information, preliminary academic papers, and industry insider accounts allow us to piece together V4's technical profile.

Architecture Scale: 1 Trillion Parameter MoE

DeepSeek V4 is expected to adopt a 1-trillion (1T) parameter MoE architecture, representing approximately a 50% increase over V3's 671 billion parameters. However, consistent with DeepSeek's long-standing efficiency-first philosophy, the number of active parameters per token is expected to be held to 50-60 billion — meaning that at inference time, V4's computational cost will not be significantly higher than V3's, while the model's knowledge capacity and expressive power will be dramatically enhanced. The core advantage of the MoE architecture lies in its ability to distribute knowledge across hundreds of expert sub-networks while maintaining inference efficiency, routing each token only to the small number of experts most relevant to it.

Three Key Technical Innovations

Based on currently available information, V4 is expected to introduce three critical architectural innovations:

1. Multi-head Conditional Attention (mHC). The Multi-head Latent Attention (MLA) employed in V3 already dramatically reduced memory footprint during inference by compressing the Key-Value cache. V4's mHC evolves this concept further — it introduces conditional gating into the attention mechanism, enabling different attention heads to dynamically activate or deactivate based on the semantic characteristics of the input tokens. This means the model can use fewer attention heads when processing simple passages (reducing latency and energy consumption) while automatically engaging all attention heads when encountering critical passages requiring fine-grained understanding. This adaptive mechanism makes V4 far more efficient than traditional fixed-head architectures for long-context processing.

2. Engram Conditional Memory (ECM). This is V4's most ambitious innovation, inspired by the neuroscience concept of "engram memory." ECM introduces a set of learnable long-term memory vectors into the Transformer architecture that do not vary with sequence position but persist throughout the entire inference process. When the model processes ultra-long documents, ECM acts as a "working memory buffer" — key information is compressed and written into ECM vectors, and subsequent attention operations can directly query these memory vectors without revisiting the entire historical sequence. This design is the key technical foundation for V4's expansion of the context window to 1 million tokens — the computational cost of traditional full attention mechanisms at the million-token scale is O(n²), whereas ECM effectively reduces it to approximately O(n log n).

3. DeepSeek Sparse Attention (DSA). V3 already employed an early version of sparse attention, and V4's DSA represents a more systematic sparsification strategy. DSA combines three mechanisms: fixed-pattern sparsity (local sliding window), learnable sparsity (learning which tokens are important to one another), and hierarchical sparsity (shallow layers use local attention, deep layers use global attention). The net result is that within a 1-million-token context, each token needs to perform attention computation with only approximately 2-5% of all other tokens, with virtually no degradation in model quality.

Expected Performance

According to CNBC's reporting^[2], internal testing at DeepSeek shows that V4 has already surpassed GPT-5 and Gemini 3 Ultra across multiple benchmarks. Specific figures have not yet been independently verified, but industry insiders have disclosed the following expectations:

MMLU-Pro: Expected to reach 92-94%, surpassing all currently public models
1-million-token long context: Maintaining over 95% information extraction accuracy within the 1-million-token range on RULER and Needle-in-a-Haystack tests
Chinese comprehension: C-Eval expected to exceed 95%, solidifying the performance ceiling for Chinese-language AI models
Multilingual capabilities: Significantly enhanced comprehension and generation for East Asian languages (Japanese, Korean, Vietnamese)
Training cost: Despite the 50% increase in parameter count, V4's training cost is expected to stay within $10-15 million — still far below GPT-5's hundreds of millions in training costs

Technical assessment note: The performance figures cited above come from unofficial channels and have not been independently verified by third parties. DeepSeek's track record suggests its published data is generally reliable, but enterprises should base strategic planning on third-party evaluations conducted after the official release. We recommend closely monitoring real-time rankings from independent evaluation platforms such as LMSYS Chatbot Arena and OpenCompass.

4. R2's Delays and the Huawei Ascend Dilemma

If V4 represents DeepSeek's ambition in architectural innovation, then R2 — the second generation of DeepSeek's reasoning model — exposes a deeper and more intractable structural challenge in Chinese AI development: the reliability of domestic compute infrastructure. R2 was originally scheduled for release in the second half of 2025, but has been delayed by more than six months, and the story behind that delay is far more complex than it appears on the surface^[3].

The Huawei Ascend Training Failure

In early 2025, after R1 had captured global attention, the Chinese government placed high expectations on DeepSeek — it was seen as a flagship case of China's push for self-reliant, domestically controlled AI. Under the dual pressures of policy guidance and supply chain security, DeepSeek launched an ambitious initiative: training R2 on Huawei Ascend 910B/910C accelerators to reduce its dependence on Nvidia GPUs. This was not merely a technical validation exercise for DeepSeek — it was a critical litmus test for China's broader semiconductor "de-Americanization" strategy.

However, serious problems emerged quickly during training. According to SiliconAngle's reporting^[3], DeepSeek's large-scale training on Ascend chips encountered frequent failures and stability issues. The Ascend 910C performed reasonably well on single-card inference tasks, but in distributed training scenarios involving thousands of cards — which are essential for training a model of R2's hundred-billion-plus parameter scale — inter-chip communication latency, memory consistency errors, and training interruptions piled up. Training jobs crashed frequently, completed training progress was repeatedly lost, and the overall effective training time ratio fell far below what could be achieved with Nvidia GPUs.

Huawei urgently dispatched a team of senior engineers to DeepSeek's training center to troubleshoot the stability issues on-site. But the root cause was not simply hardware defects — it was a systemic gap in the software ecosystem.

CANN vs. CUDA: A Generational Gap in Software Ecosystems

Huawei Ascend uses a software stack called CANN (Compute Architecture for Neural Networks), positioned as the counterpart to Nvidia's CUDA ecosystem. However, CUDA has undergone over 15 years of continuous iteration and has built a comprehensive ecosystem encompassing compilers, libraries, debugging tools, performance profilers, and distributed training frameworks (NCCL), with more than 4 million developers worldwide contributing accumulated practical experience and best practices. CANN has been available for only a few years, and its ecosystem depth lags behind CUDA by a significant generational margin.

Specifically, the software-layer issues that the DeepSeek team encountered during Ascend training included: the distributed training framework HCCL (Huawei's equivalent of NCCL) achieving 30-40% lower communication efficiency than NCCL in large-scale clusters, severely dragging down multi-node, multi-card training throughput; insufficient operator library coverage in CANN, requiring DeepSeek's custom operators (such as the custom kernels for the MLA attention mechanism) to be redeveloped and optimized on CANN at enormous engineering cost; and inadequate maturity of debugging and performance-tuning tools, meaning that when training encountered issues like NaN (Not a Number) values or gradient explosions, root cause analysis was far less efficient than in a CUDA environment.

Ultimately, after months of failed attempts to achieve stable training, DeepSeek made a pragmatic but politically awkward decision: to fall back to Nvidia GPUs to complete R2's training^[3]. This decision forced R2's release schedule to slip by several months and simultaneously sent a clear signal to the entire industry — domestic substitution is theoretically viable, but in engineering practice, it still faces challenges that cannot be underestimated.

Ascend 910C Performance Positioning

Tom's Hardware's test report provides a more quantitative perspective^[4]: the Huawei Ascend 910C delivers inference performance at approximately 60% of the Nvidia H100. This figure requires careful interpretation — it means that for inference scenarios (enterprise deployment, API services), Ascend is already a "usable" if not "optimal" option; however, in large-scale training scenarios, the 60% single-card performance gap is further amplified by the additional overhead of distributed communication, making real-world usability significantly lower than the 60% figure on paper.

Comparison Dimension	Nvidia H100	Huawei Ascend 910C	Gap
FP16 Inference Throughput	Baseline 100%	~60%	40% gap
Distributed Training Efficiency (1,000+ cards)	Baseline 100%	~35-45%	55-65% gap (incl. communication overhead)
HBM Memory Bandwidth	3.35 TB/s	~2.0 TB/s	40% gap
Software Ecosystem Maturity	CUDA (15+ years, 4M+ developers)	CANN (3-4 years, early-stage ecosystem)	Order-of-magnitude gap
Supply Availability (China market)	Export-controlled, inventory only	Domestically produced, stable supply	Ascend has the advantage

Huawei's chip roadmap indicates that the next-generation Ascend 920 (expected in the second half of 2026) will adopt more advanced process technology, targeting inference performance at 80-90% of the H100. However, even if the hardware gap narrows, catching up on the CANN software ecosystem will still require years of sustained investment and industry-wide collaboration. The lesson from R2's training failure is crystal clear: chip performance is just the tip of the iceberg — the completeness and maturity of the software stack are the decisive factors determining the actual usability of compute infrastructure.

5. The US-China Chip War and Technological Sovereignty

R2's Ascend training predicament is not an isolated incident — it is a microcosm of the broader US-China technology competition. Since the US Department of Commerce first imposed AI chip export controls on China in October 2022, chips have become the most critical strategic asset in AI geopolitics — and the structural disadvantages China faces in this conflict are far more profound than most people realize^[5].

Escalating Export Controls

US chip controls on China have gone through three waves of escalation. The first round in October 2022 banned the export of advanced AI chips (including the A100 and H100) and related semiconductor manufacturing equipment to China. Nvidia subsequently released downgraded versions — the A800 and H800 — to circumvent the controls, but the second round in October 2023 further tightened the compute threshold, bringing these downgraded versions under the ban as well. The third round in late 2024 extended restrictions to advanced packaging technology, HBM (High Bandwidth Memory), and certain EDA (Electronic Design Automation) tools, attempting to choke off China's AI compute upgrade path across the entire supply chain.

By early 2026, the policy landscape underwent a subtle shift. After the new US administration took office in January, it made strategic adjustments to chip control policies — maintaining the embargo on top-tier AI chips (such as the H200 and B200) while relaxing export restrictions on certain mid-to-low-end chips and manufacturing equipment. The stated rationale was "avoiding excessive controls that harm the global competitiveness of American semiconductor companies," but the deeper calculation was that overly harsh controls were actually accelerating China's domestic chip substitution efforts — R1 being the most powerful proof of this.

CFR Assessment: The 17x Gap Warning

The Council on Foreign Relations (CFR) published a widely noted report in early 2026^[5] that systematically assessed the US-China AI compute gap. The report's core conclusion was sobering: measured by "effective compute available for frontier AI training," by the end of 2027, US available AI compute could reach 17 times that of China. This gap stems not only from single-chip performance differences but from systemic deficits across three dimensions: the generational gap in advanced process nodes (TSMC 3nm vs. SMIC 7nm), supply bottlenecks for critical components like HBM, and the maturity gap in software ecosystems.

However, the CFR report also included an important caveat: a compute gap does not directly equate to an AI capability gap. DeepSeek R1 already proved that, driven by architectural innovation and training efficiency, less compute can produce model performance equivalent to that achieved with top-tier compute. This means that even if the US maintains an overwhelming compute advantage, Chinese AI labs may still remain competitive at the model level through "efficiency innovation" — though the difficulty of this path will continue to increase as the compute gap widens.

"Operation Gatekeeper" and Gray-Market Supply Chains

In the second half of 2025, the US Bureau of Industry and Security (BIS) launched an enforcement initiative codenamed "Operation Gatekeeper," aimed at tracking and severing gray-market supply chains that route advanced AI chips to China through third countries — primarily Singapore, Malaysia, and the UAE. The operation has led to several intermediaries being placed on the Entity List and prompted the governments of Singapore and the UAE to strengthen their own export control compliance mechanisms.

For Taiwan, the geopolitical implications of this US-China chip war are self-evident. As the world's sole manufacturer of the most advanced AI chips, TSMC sits at the absolute center of this contest. Any adjustment to control policies — whether tightening or loosening — directly impacts TSMC's capacity allocation, customer structure, and geopolitical risk profile. When Taiwanese enterprises plan their AI strategies, they must factor in the geopolitical risks of the chip supply chain — this is not only a matter of cost but also of technology accessibility and long-term strategic autonomy.

6. The Rise of China's Open-Source AI Ecosystem

DeepSeek is not the sole representative of Chinese AI capabilities. In fact, from 2025 through early 2026, the entire Chinese open-source AI ecosystem experienced a systemic explosion in both scale and velocity that is reshaping the global AI model power map^[6].

Qwen 3.5: Alibaba's Counteroffensive

In mid-February 2026, Alibaba's Tongyi Lab released Qwen 3.5 — a flagship model with 397 billion parameters^[7]. Qwen 3.5 delivered outstanding performance across multiple benchmarks, particularly reaching new heights in Chinese comprehension, multi-turn dialogue, and function calling capabilities. The Qwen series likewise centers on an open-source strategy, offering a complete model family ranging from 0.5B to 397B parameters under the Apache 2.0 License.

Qwen's rise triggered a landmark shift in the global open-source AI community: on Hugging Face, the cumulative download count for the Qwen model family surpassed Meta's Llama series for the first time in January 2026, making it the most downloaded open-source AI model family in the world^[6]. The symbolic significance of this data point is immense — it signals that in terms of actual adoption of open-source AI, Chinese models have transformed from "followers" to "front-runners." Qwen models are widely used in research projects, startups, and enterprise applications around the globe, with community activity and the number of derivative models experiencing explosive growth.

ByteDance and the Broader Ecosystem

ByteDance's Doubao large model has also been iterating rapidly. In early 2026, ByteDance released the Doubao Pro series for enterprise clients, offering general capabilities approaching GPT-4o at highly competitive prices. Unlike DeepSeek's pure research orientation, ByteDance's strategy is to deeply integrate large model capabilities into its massive commercial ecosystem — from Douyin's content recommendations, to Feishu's workplace intelligence, to Volcano Engine's enterprise AI platform. This "application-driven model iteration" approach complements DeepSeek's "research-driven" trajectory, jointly driving the prosperity of the Chinese AI ecosystem.

Additionally, Baidu's ERNIE, Zhipu's GLM series, Yi-34B and successors from 01.AI, and Moonshot's Kimi continue to iterate. MIT Technology Review's analysis notes^[6] that the collective rise of China's open-source AI ecosystem is generating a "flywheel effect": open-sourcing models attracts global community feedback and improvements, improved models attract more users, a larger user base generates more training data and application insights, which in turn drives further model iteration. This virtuous cycle is causing the growth of China's open-source AI ecosystem to accelerate rather than decelerate.

Structural Shifts in the Ecosystem Landscape

Zooming out, the rise of China's open-source AI ecosystem is reshaping the global AI power structure. Before 2024, global open-source AI was essentially dominated by Meta's Llama series, supplemented by Mistral (France) and a handful of academic models. By early 2026, the landscape has been thoroughly transformed:

Model Family	Organization	Country	Largest Model	Hugging Face Monthly Downloads (est.)
Qwen	Alibaba	China	397B (Qwen 3.5)	Highest
DeepSeek	DeepSeek	China	671B (V3) / 1T (V4 expected)	Very high
Llama	Meta	USA	405B (Llama 3.1)	High
Yi	01.AI	China	300B+	Medium-high
Mistral	Mistral AI	France	123B (Mistral Large)	Medium
Gemma	Google	USA	27B (Gemma 2)	Medium

This table makes clear that among the top six global open-source AI model families, China holds three seats (Qwen, DeepSeek, Yi) and leads the US in both download volume and community activity. The implications of this structural shift extend far beyond the technical — it means that a growing number of AI applications worldwide are being built on foundation models developed in China, and China's influence at the foundational AI technology layer is expanding rapidly.

7. Strategies for Taiwanese Enterprises: Risks and Opportunities

The imminent release of DeepSeek V4/R2, Huawei Ascend's advances and setbacks, and the rise of China's open-source AI ecosystem — these trends intertwine to present Taiwanese enterprises with a complex but navigable set of strategic challenges. The key is this: rather than making a binary choice of "use or don't use" Chinese AI models, the goal is to build a layered strategic framework that achieves a precise balance between risk management and technology dividends.

Scope and Boundaries of the Government Ban

In February 2025, Taiwan's Executive Yuan and the Ministry of Digital Affairs issued a directive banning all government agencies from using DeepSeek's cloud-based AI services^[8]. The core rationale behind the ban is data security — all data transmitted through DeepSeek's API (including prompts, uploaded documents, and conversation logs) passes through servers located in China, subject to China's Data Security Law and National Intelligence Law, creating a legal risk that data could be accessed by the government^[9].

However, the ban has clearly defined boundaries: it targets only DeepSeek's cloud API services and does not cover on-premises deployment of open-source models. The model weights that DeepSeek has released under the MIT License (including R1, V3, and the upcoming V4/R2) can be legally downloaded and deployed by any organization on its own servers or in a cloud environment of its choosing. In an on-premises deployment scenario, all data processing occurs entirely within infrastructure controlled by the enterprise, with no data passing through DeepSeek's or any Chinese entity's servers, thereby eliminating the legal risk of data leakage to China.

A Framework for Data Sovereignty

IAPP (International Association of Privacy Professionals) analysis points out^[9] that DeepSeek's data security risks can be entirely mitigated through architectural design — the key is decoupling "model capabilities" from "data flows." IBM's research team further elaborated on the "AI goes local" trend^[10]: against a backdrop of escalating global geopolitical tensions, enterprises are increasingly inclined to deploy open-source models on-premises rather than relying on cross-border API services. DeepSeek's fully open-source strategy provides the ideal technical foundation for this "localized AI" demand.

We recommend that Taiwanese enterprises adopt the following three-tier data sovereignty architecture:

Tier 1: Highly sensitive data (trade secrets, defense-related information, personal data). Strictly prohibit the use of any cross-border AI API. Use only locally deployed models (DeepSeek R1-Distill, Qwen, Llama, or Taiwan-LLM), running on enterprise-owned GPU servers or Taiwan-region cloud environments (such as GCP Taiwan region or AWS Japan region). All inference data must remain within the enterprise-controlled boundary.

Tier 2: Moderately sensitive data (internal reports, general business documents). AI API services hosted in democratic, rule-of-law jurisdictions (such as OpenAI, Anthropic Claude, Google Gemini) may be used, but service agreements on data handling and retention policies should be verified. Avoid services where data is processed through servers in China or other countries with inadequate data protection regulations.

Tier 3: Low-sensitivity data (publicly available information, anonymized data, general Q&A). Various AI API services may be used flexibly, including the most cost-effective options. Even at this tier, it is advisable to avoid including personally or organizationally identifiable information in prompts.

Practical deployment recommendation: For Taiwanese SMEs looking to deploy DeepSeek models on-premises, the most cost-effective starting point is DeepSeek R1-Distill-Qwen-32B — this distilled model from R1 achieves approximately 85-90% of the full R1's performance on Chinese reasoning tasks, yet can run on a single workstation equipped with 4x RTX 4090 GPUs (hardware cost approximately NT$250,000-300,000). For better-resourced enterprises, the full DeepSeek V3 (671B) can be deployed on a cluster of 8x A100/H100 GPUs, delivering top-tier Chinese language comprehension and generation capabilities. Once V4 is officially open-sourced, we recommend prioritizing evaluation of V4's distilled versions as the primary deployment model.

Strategic Action Checklist

Based on the analysis above, we propose the following specific strategic recommendations for Taiwanese enterprises:

1. Immediate actions (0-3 months):

Audit all AI tools and services currently used within the enterprise, establishing an "AI tool whitelist" and "data classification standards"
Deploy API monitoring mechanisms at the network layer to detect and log all calls to external AI APIs (particularly those from Chinese service providers)
Assess the technical feasibility and cost of deploying DeepSeek R1-Distill or Qwen on-premises
Conduct AI data security awareness training for all employees — with emphasis on the data risks of cross-border APIs

2. Short-term planning (3-6 months):

Complete a first on-premises AI model proof of concept (PoC), selecting a moderately complex business scenario for testing
Establish an internal benchmarking framework for model evaluation, enabling rapid assessment and adoption once V4/R2 are officially released
Collaborate with legal teams to develop AI governance policies compliant with Taiwan's Personal Data Protection Act and the forthcoming AI Basic Act
Monitor the progress of Huawei's Ascend 920 — if its performance reaches 80% or more of the H100, it may serve as an alternative compute option to reduce Nvidia dependency

3. Medium-to-long-term positioning (6-12 months):

Build a hybrid AI platform with a router architecture — automatically routing requests to on-premises models or cloud APIs based on task type and data sensitivity
Evaluate joining local Taiwanese AI alliances or shared compute platforms (such as the National Science and Technology Council's AI Cloud) to lower the compute investment threshold for individual enterprises
Track the latest developments in China's open-source AI ecosystem — each major update from Qwen, DeepSeek, Yi, and others may redefine cost-efficiency best practices
Establish a long-term AI talent development program, focusing on practical skills in model deployment, fine-tuning, and MLOps

Conclusion: Building Resilience Amid Uncertainty

The arrival of DeepSeek V4 and R2 marks yet another leap forward for Chinese AI capabilities. Huawei Ascend's training setbacks remind us that the road to compute self-reliance remains long and fraught with obstacles; yet the collective rise of China's open-source AI ecosystem — from DeepSeek to Qwen to ByteDance — is irreversibly reshaping the global AI power landscape.

For Taiwanese enterprises, the greatest risk is not choosing the wrong model but losing strategic agility amid a rapidly shifting AI landscape. Through a layered data sovereignty architecture, the technical capability for on-premises deployment, and continuous tracking of the global AI ecosystem, Taiwanese enterprises are fully equipped to capture the technology dividends of this Chinese AI wave while safeguarding data security. The key is to act now — because the chain reaction triggered by V4's release will leave an ever-shrinking window for latecomers to respond.

1. The Rise of DeepSeek: From Quantitative Hedge Fund to AGI Lab