OpenAI’s Deep Research may soon outshine even the most knowledgeable human experts, predicts Frank Downing, Director of Research at ARK Invest.
Downing suggests that OpenAI’s new AI agent is positioned to pass ‘Humanity’s Last Exam’, an exam filled with the toughest academic questions, within a year.
The agent has already scored a groundbreaking 26.6% on the exam, leaving previous high scores of about 9% from models like OpenAI’s o1 and DeepSeek’s R-1 far behind.
Deep Research is powered by OpenAI’s o3 model, touted for its web browsing and data analysis capabilities. Users hooked up to ChatGPT can leverage this agent for a monthly fee of $200, enabling it to perform exhaustive searches and provide detailed, well-cited reports in a fraction of the time it would take a human.
Yet, it’s not without flaws: hallucinations, sporadic response times, and challenges in assessing source credibility are among the drawbacks, Downing notes.
Advocates believe Deep Research is a leap forward. Downing points out that similar benchmarks, SWE-Bench and ARC-AGI, were previously considered challenging, until recent AI models easily overcame them.
He anticipates a similar trajectory for Humanity’s Last Exam. If OpenAI continues its current pace, the intricacies of AI could surpass expert-level technical knowledge and reasoning capabilities within a year.
The Bigger Picture: OpenAI’s work with Deep Research underscores a transformative moment in AI-driven research capabilities.
As systems grow more efficient and effective in handling complex information, the demand for human researchers to solve intricate problems might decline, necessitating adjustments in public policy and labor markets to cope with the changing landscape.
The growth of the reasoning models is terribly fascinating. OpenAI released its first model in line of reasoning models — the o1, in preview mode in September, roughly six months ago.
Google followed with its own reasoning model, the Gemini 2.0 Flash Thinking model in an experimental mode available to developers.
OpenAI announced o3 line in December, but before it could be launched widely — DeepSeek happened.
DeepSeek, the Chinese company, has been around for a while, including launching V3 before the hype that impressed folks in AI.
Yet, it was the R1 reasoning model that surpassed the benchmarks of o1, while being available for use at a fraction of the costs, that made the product mainstream media’s darling (alongside some geopolitical spice).
OpenAI was forced to do some damage control and release o3 to Pro users (alongside Deep Research) and a free o3-mini version to all its users, including non-paying ones.
It is clear that this is the front for the next leg of AI race and the next few models (or more likely agents built on top of existing models) would compete to show competence in completing end-to-end tasks and solving complex problems.