TL;DR
- OpenAI's Mark Chen critiques the hype around DeepSeek's low-cost AI model training, calling it an overblown narrative.
- While acknowledging DeepSeek's achievements, he emphasizes that cost reductions do not inherently enhance capabilities in AI.
- Chen asserts that deeper insights into the complexities of cost and performance optimization are necessary for understanding AI advancements.
In an ongoing debate over DeepSeek’s groundbreaking AI model, OpenAI’s Chief Researcher Mark Chen is expressing skepticism regarding the public excitement around DeepSeek’s cost-effective approach to AI development.
As DeepSeek gains attention for its budget-friendly training of AI models, Chen has stepped forward to temper the narrative that portrays its cost efficiencies as revolutionary.
We will continue to improve our ability to serve models at lower cost, but we remain optimistic in our research roadmap, and will remain focused in executing on it. We're excited to ship better models to you this quarter and over the year!
— Mark Chen (@markchen90) January 28, 2025
“Congrats to DeepSeek on producing an o1-level reasoning model,” Chen noted. “However, I think the external response has been somewhat overblown, especially in narratives around cost.”
DeepSeek’s developers have showed an ability to train their model for less than $6 million using second-tier chips — a significant departure from the multi-million dollar budgets typically associated with AI development.
Chen argues that optimizations based on specific paradigms challenge these cost-cutting claims.
The OpenAI Chief Research officer elaborated that having dual paradigms like pre-training and reasoning allows for optimization across two axes, leading to lower costs but with other implications.
“We can optimize for a capability over two axes instead of one,” he added, emphasizing the broader context of scalability and capability that must accompany such developments.
While acknowledging DeepSeek’s accomplishments, he warns that the relationship between cost and capabilities in AI is more complex than current buzz suggests.
“As research in distillation matures, pushing on cost and capabilities are increasingly decoupled,” he pointed out.
“The ability to serve at lower cost (especially at higher latency) doesn’t imply the ability to produce better capabilities.”
Chen maintained optimism about his own company OpenAI’s roadmap, with plans to continually advance their models while finding paths to cut costs without compromising capabilities.
Meanwhile, DeepSeek’s ascent is raising eyebrows across the AI sphere and eliciting comparisons to OpenAI’s ChatGPT, in particular the reasoning model o1.
The Chinese-developed AI assistant recently wowed the market by climbing to the top of U.S. iPhone charts despite stringent U.S. restrictions on AI tech exports.
Just a day, Chen’s compatriot and OpenAI CEO Sam Altman similarly expressed belief in OpenAI’s roadmap, while congratulating the Chinese rival that has recently risen to celebrity status.
“Deepseek’s r1 is an impressive model, particularly around what they’re able to deliver for the price,” he said. “We will obviously deliver much better models and also it’s legit invigorating to have a new competitor!”
Nvidia CEO Jensen Huang has also said that DeepSeek’s success only shows more of a need for his company’s chips even as the market sentiments suggest otherwise.
What does it take to achieve financial independence and retire early? Fire Fast by Dzambhala helps you understand and plan it out.
Join the vibrant privacy-ensured Dzambhala community on
Want to give feedback on this story? Write to us at [email protected]