OpenAI says it has evidence that DeepSeek is using its model in violation of regulations and will launch charges against the Chinese AI company DeepSeek
OpenAI recently told the Financial Times that it had discovered evidence that Chinese AI startup DeepSeek used OpenAI's proprietary models when training its own open source competing products.
This...what's going on?
The $589 billion butterfly effect
The story starts with the DeepSeek-R1 model.
This Chinese model, which only used 2,048 Nvidia H800 graphics cards and cost 5.6 million US dollars to train, actually tied o1 in the reasoning ability list, and even surpassed o3, which has not yet been opened for use, in some scenarios!
What’s even more amazing is that its training cost is only 1/60 of GPT-4.
The consequence is that this past Monday, NVIDIA's stock price plummeted 17% in a single day, and its market value evaporated by US$589 billion.
Although it rebounded 9% the next day, the AI hardware investment logic has been torn apart, and the Silicon Valley AI bubble seems to be bursting.
The trouble caused by "distillation"?
The key to the matter lies in a technology called "distillation".
The so-called distillation means that developers use the output of large models to train small models, so that small models can obtain similar performance at a lower cost.
This is a common practice in the industry.
But here’s the problem: DeepSeek may be using this method to train its own competing models, which is a clear violation of OpenAI’s terms of service.
"The key is whether you take it (the model output) out of the platform and use it to create your own model," said a person familiar with the matter at OpenAI.
"Dubious" low cost
DeepSeek's performance is indeed surprising. They claim to have only used:
- 2048 Nvidia H800 graphics cards
- Cost $5.6 million
- Trained a V3 model with 671 billion parameters
These numbers are a drop in the bucket compared to the investment OpenAI and Google spend in training models of the same size!
This makes industry experts feel strange.
They found that DeepSeek's model responses suggested it might have been trained on the output of GPT-4, a clear violation of its terms of service.
Complexity of the problem
Ritwik Gupta, an AI PhD candidate at the University of California, Berkeley, noted:
“It is a common practice in startups and academia to use the output of a commercial LLM (such as ChatGPT) to train another model. This allows for free human feedback steps.”
This also highlights the difficulty of technical protection.
OpenAI said in its latest statement: "We know that Chinese companies, as well as others, have been trying to distill the models of leading American AI companies."
To this end, OpenAI is taking a number of measures:
- Implement countermeasures to protect intellectual property rights
- Carefully decide which cutting-edge features to include in the release model
- Work closely with the U.S. government to protect the most powerful models from theft by competitors
David Sacks, the U.S. President Donald Trump's AI and crypto czar, also expressed his position on this:
“There is substantial evidence that DeepSeek distills knowledge from OpenAI models, and I think OpenAI is very unhappy with that.”
The contradiction lies in: How to define technological borrowing and intellectual property plagiarism?
OpenAI and Microsoft blocked API accounts suspected of being DeepSeek last year.
However, some industry insiders commented: "To completely put an end to this type of operation is harder than looking for a needle in the Pacific Ocean."
Interestingly, OpenAI itself is facing copyright infringement lawsuits from the New York Times and well-known authors. The lawsuits accuse OpenAI of using their articles and books to train models without permission.
DeepSeek’s response and technical report
In the face of accusations from Microsoft and OpenAI, the DeepSeek team made it clear in the technical report of the latest model R1 that the output data of the OpenAI model was not used, and said that high performance was achieved through reinforcement learning and a unique training strategy.
DeepSeek adopts a multi-stage training method, including basic model training, reinforcement learning (RL) training, fine-tuning, etc. This multi-stage cyclic training method helps the model absorb different knowledge and capabilities at different stages.
At the same time, DeepSeek’s open source V3 model disclosed relevant deep underlying optimization details in detail through technical reports. The V3 model was developed to even bypass CUDA by optimizing NVIDIA GPU low-level assembly language PTX for maximum performance. Although this optimization strategy is effective, it also greatly increases development difficulty and maintenance costs.
Yann LeCun, Turing Award winner and chief scientist of Meta AI, also believes that the market’s cost response to DeepSeek is unreasonable. However, he was looking at it from a reasoning standpoint. He pointed out that people often think that huge investments are mainly used to train more powerful models, but in fact most of the money is spent on enabling these AI services to stably serve billions of users. And as AI capabilities increase, the cost of keeping the service running will become higher. The key is whether users are willing to pay for these enhanced functions.
Is using the output of a large model to train a small model considered technological innovation or plagiarism?
This dispute over the intellectual property rights of AI models may have just begun.