DeepSeek and the Sanctions Paradox: When Restrictions Drive Innovation

10 min readJan 30, 2025

DeepSeek, an emerging Chinese startup, has captured global attention with its artificial intelligence models DeepSeek-V3 and DeepSeek-R1, both characterized by advanced reasoning capabilities and computational efficiency. While DeepSeek-V3 represents the company’s response to OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, DeepSeek-R1 has distinguished itself as a model specialized in step-by-step reasoning, outperforming open-source alternatives like Meta’s Llama3 and Mistral. Despite U.S. restrictions on the export of advanced semiconductors, DeepSeek managed to develop R1 at a cost of just $6 million, a fraction of the billion-dollar investments made by Western competitors. As detailed in the paper ”DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning”, available on arXiv, this was made possible through the use of NVIDIA H800 GPUs for training and Huawei Ascend 910C chips for inference.

Technological Innovations: MoE, Multi-Token Prediction, and the “Aha Moment”

Beyond the mixture of experts (MoE) architecture, which optimizes computational resources by activating only the necessary parts of the model and reducing GPU load, DeepSeek has introduced a significant innovation in step-by-step reasoning. The R1 model has demonstrated abilities similar to OpenAI’s O1 thanks to an advanced reinforcement learning system, enabling it to autonomously develop complex logical chains without the need for vast supervised datasets.

Another key strength of DeepSeek-R1 is its use of a multi-token prediction system, allowing it to generate multiple tokens per cycle while maintaining high accuracy (85–90%), effectively doubling inference speed compared to traditional Transformer-based models. Additionally, DeepSeek’s MoE architecture has innovated load-balancing management, enabling the model to achieve impressive scalability: despite containing 671 billion parameters, only 37 billion are active during inference, making it possible to run on relatively affordable hardware, such as two Nvidia 4090 GPUs costing less than $2,000.

One of the model’s most surprising aspects is the so-called “aha moment”, which spontaneously emerged during training. According to the paper, the DeepSeek-R1 model reportedly paused, flagged potential reasoning issues, and then resumed with a different approach — all without being explicitly trained to do so. This emergent behavior arose naturally from the interaction between the model and its reinforcement learning environment and is a significant distinction from traditional models, demonstrating a notable advancement in AI models’ self-verification and self-correction capabilities.

Finally, for certain functions, it appears that they have used Nvidia’s PTX (Parallel Thread Execution) programming, which is similar to assembly, instead of CUDA. PTX, as an intermediate instruction set architecture, enables fine-grained optimizations such as register allocation and thread/warp-level adjustments, providing more direct control over the hardware. This approach, although complex to manage, demonstrates the high level of expertise of DeepSeek’s engineers.

The results speak for themselves: DeepSeek-R1 achieved 79.8% accuracy, matching OpenAI’s O1, and scored exceptionally well on benchmarks such as MATH-500 (97.3%) and in Codeforces programming competitions, ranking in the 96.3rd percentile. Furthermore, the DeepSeek team managed to distill these capabilities into a version with only 14 billion parameters, outperforming much larger models, proving that reasoning ability depends not just on the number of parameters but on how a model is trained to process information. To make the technology even more accessible, DeepSeek has developed distilled variants of R1, such as DeepSeek-R1-Distill, available in lighter versions (1.5B, 7B, 8B, etc.), designed for developers who want to run the models locally on their own PCs without relying on expensive cloud infrastructure.

The computational efficiency of DeepSeek-R1, more than 45 times greater than other cutting-edge models, demonstrates that advanced AI can be developed without extremely expensive hardware infrastructure. An even more impressive example is DeepSeek-V3, the company’s flagship model, which was trained on a budget of just over $5 million. This figure is unimaginable for models like GPT-4 or Llama 3.0, for which Meta is estimated to have spent around $60 million solely on training.
If DeepSeek can further scale its technologies, it could redefine the industry, making artificial intelligence more accessible and increasing competitive pressure on giants like OpenAI, Google, Meta, and Anthropic.

Economic Impact, Open Source, and the New Global Competition

The economic impact of DeepSeek-R1’s success was immediate: the news caused a 3% drop in NASDAQ, affecting companies like NVIDIA (which recorded a 17% loss), reflecting investors’ concerns about China’s growing AI competitiveness. This was contrary to market expectations, as just days earlier, on January 22, 2025, Donald Trump had announced the Stargate Project: a massive $500 billion investment plan to strengthen American AI leadership, in collaboration with the CEOs of SoftBank, OpenAI, and Oracle.

However, concerns about the potential negative impact of DeepSeek-R1 on the semiconductor market may be premature or misinterpreted. The drop in NVIDIA’s stock suggests that investors fear a reduced demand for GPUs due to DeepSeek’s more efficient training method. But according to some analysts, this fear is unfounded: DeepSeek-R1 does not decrease the need for GPUs; rather, it optimizes how they are utilized, making the process more scalable. In other words, increasing the number of GPUs still leads to performance improvements — an effect that revives scaling laws, which had previously shown signs of saturation. This dynamic ties into the Jevons Paradox, which states that technological progress that improves the efficiency of a resource’s use does not necessarily reduce its consumption but can actually increase it. If DeepSeek-R1 lowers the cost of AI training, it could encourage more companies to develop increasingly large and complex models, driving further demand for advanced hardware. Paradoxically, instead of harming the GPU market, DeepSeek’s innovation could strengthen it in the long run.

At the same time, DeepSeek’s decision to release R1 as an open-source model could redefine industry strategies, accelerating innovation but also raising concerns about data security and ethical AI usage. Yann LeCun, Chief AI Scientist at Meta, emphasized that the real takeaway from DeepSeek’s success is not just China’s advantage, but the superiority of open-source models over proprietary ones.

DeepSeek has decided to open-source its model in a context that goes against traditional business strategies: typically, innovators strive to maintain their competitive advantage. However, as a Chinese company, DeepSeek faces trust challenges, especially in Western markets, where a Chinese AI API might raise skepticism. By making its code open, it instead offers transparency and control, allowing users to self-host or rely on AI providers.

Having found more efficient solutions to bypass export restrictions on advanced chips, DeepSeek R1 is significantly more cost-effective, priced at around $7 per million tokens compared to OpenAI’s $60. If the end result is similar for the user, paying such a high premium becomes hard to justify. This is especially relevant for AI application developers, where prompt engineering and customization are necessary regardless. In this scenario, opting for a more affordable model like R1 can be a strategic decision without significant compromises on performance.

In an interesting interview, DeepSeek’s founder, Liang Wenfeng, emphasized that the reduction in pricing for their artificial intelligence services was not primarily aimed at acquiring users but rather reflected a genuine decrease in internal costs and the belief that AI should be accessible to everyone. He explained that, initially, DeepSeek did not have the explicit goal of offering a drastically cheaper alternative to OpenAI and other competitors. However, through optimizations in the training and deployment process, they managed to lower operational costs far beyond expectations, a result that surprised him, as it had not been planned as an initial strategy.

Although OpenAI still plays a central role due to model distillation, R1 could push industry giants to optimize their processes, combining efficiency with their vast resources.

The DeepSeek case highlights a key trend: while companies like OpenAI and Google keep their models closed, the open-source movement continues to gain ground, fostering faster and more accessible global innovation.

Sanctions and Innovation: The Technological Paradox

But amid all this, an even more surprising dynamic has emerged. The U.S. semiconductor restrictions, intended to slow China’s technological advancement, have paradoxically fueled innovation, forcing companies like DeepSeek to develop more efficient solutions to compensate for the lack of advanced hardware. DeepSeek-R1, born in response to these limitations, has achieved performance comparable to Western models despite using downgraded NVIDIA H800 GPUs and adopting Huawei Ascend 910C chips, with plans for future improvements using the 920C.

According to an analysis by MIT Technology Review, Chinese companies had to consume up to four times the computing power compared to their Western counterparts to achieve the same results, due to U.S. restrictions on advanced chip exports. However, DeepSeek compensated for this inefficiency through innovative hardware usage and model optimization techniques.

Beyond semiconductor restrictions, another key factor that has indirectly fueled DeepSeek’s growth is Presidential Proclamation 10043, introduced in 2020 by the Trump administration. This policy limited access to U.S. universities for Chinese students and researchers from institutions involved in Military-Civil Fusion, a Chinese government strategy to transfer civilian innovations to the military sector. Before then, top Chinese talents viewed earning a PhD in the United States as a prestigious achievement. However, with stricter visa restrictions and a growing climate of hostility, many began opting to stay in China or move to Europe. This shift created a wave of highly skilled talent remaining in the country, directly contributing to the rise of companies like DeepSeek.

This phenomenon demonstrates how sanctions and restrictions, instead of hindering China, have incentivized the development of alternative hardware and advanced computational strategies, strengthening its technological independence. The same dynamic is now occurring globally: even Switzerland, historically neutral, has recently faced similar restrictions from the U.S., limiting access to high-end AI chips. This raises concerns about the scope of U.S. measures, which risk not only obstructing China but also affecting strategic allies, calling into question international scientific cooperation.

Lessons from the Past: When Tariffs Failed

The DeepSeek case is not an isolated event in the history of innovation. There have been numerous precedents where restrictions and protectionism had unexpected effects, stimulating the development of alternative solutions. Here are some emblematic examples that demonstrate how necessity has often accelerated technological progress.

Rolex and British Tariffs (1914–1919): Rolex was not originally founded as a luxury brand. In 1905, when Hans Wilsdorf and Alfred Davis established the company in London, their initial goal was to produce affordable watches. Rolex’s original mission was to bring low-cost timepieces to the market, but this strategy became unsustainable due to high tariffs and taxes imposed by the British government on imports. As a result, Rolex was forced to position itself as a premium-priced product from the start. In 1919, Wilsdorf relocated the company to Geneva, Switzerland, where he could focus on producing high-quality and precision watches, initiating Rolex’s transformation into a luxury brand. In 1926, it launched the Oyster, the world’s first waterproof watch, and in 1931, it developed the first self-winding watch, marking its definitive shift to a brand synonymous with innovation and excellence. Today, Rolex is one of the most prestigious luxury watch brands, with annual revenues in the billions. This case demonstrates how protectionist policies paradoxically pushed a company initially set to produce budget watches to become a symbol of luxury.
Germany and the Battleships Bismarck and Tirpitz: After World War I, the Treaty of Versailles imposed strict limits on Germany’s ability to build large warships, banning the production of units exceeding 10,000 tons. To circumvent these restrictions, Germany designed the “pocket battleships” of the Deutschland class — smaller but technologically advanced vessels with lighter armor and 280mm guns capable of challenging much larger ships. However, after unilaterally abandoning the Treaty of Versailles in the 1930s, Germany launched an ambitious naval rearmament program, culminating in the construction of the battleships Bismarck and Tirpitz. These significantly larger and more powerful warships benefited from innovations developed during the design of their smaller predecessors. The Bismarck, in particular, proved to be one of the most formidable battleships ever built, inflicting heavy losses on the Royal Navy before being sunk in 1941. This example illustrates how the military restrictions imposed by the Allies forced Germany to innovate, leading to the creation of more advanced and deadly warships than the treaties had anticipated.
The Soviet Union and the Space Race: After World War II, the United States sought to limit the Soviet Union’s access to advanced missile technologies by imposing restrictions on the export of crucial components and maintaining a technological edge in the aerospace sector. However, rather than halting Soviet development, these measures incentivized the USSR to heavily invest in its own missile research. As a result, in 1957, the Soviet Union launched Sputnik 1, the first artificial satellite in history. This event marked the beginning of the space race, catching the United States off guard and proving that technological restrictions had pushed the USSR to develop its own capabilities independently, accelerating its space program and temporarily surpassing American advancements.

Conclusion

The history of innovation is full of paradoxes, and DeepSeek is yet another example. If it continues to refine its models and prove that it can compete with OpenAI, Google, Meta, and Anthropic with a fraction of the resources, we could witness a radical transformation of the industry, with a shift toward more accessible and efficient open-source models. At the same time, the Western technological dominance built over years of strategic advantage may be more vulnerable than ever.

The historical lesson is clear: attempts to contain innovation through restrictions and protectionism rarely achieve the intended effect. Rather than blocking a competitor, they often force it to find alternative, more efficient, and independent solutions. Thus, instead of slowing China down, the barriers imposed by the U.S. may have accelerated its race toward technological self-sufficiency, with consequences that will extend far beyond the artificial intelligence market. If history teaches us anything, it is that real power does not lie in restricting access to technologies but in remaining the global benchmark for innovation.

Originally published at Levysoft.