In a remarkable feat of engineering, a team of five engineers from Hugging Face, including co-founder and Chief Scientist Thomas Wolf, managed to create a free, open-source version of OpenAI's Deep Research in just 24 hours. This new functionality, initially available only to subscribers of OpenAI's $200 per month ChatGPT Pro plan, now has a more accessible counterpart, Open Deep Research, which can browse the web, scroll through pages, handle files, and even perform calculations using data.
The team at Hugging Face has strived to enhance the user experience and has made the source code available on GitHub for inspection and feedback. They have also detailed the entire development process of Open Deep Research, with the team leader revealing more behind-the-scenes information in recent interviews.
The 24-hour "cloning" task saw the team designing a basic architecture at 2 am, integrating the o1 model at 7 am, achieving autonomous web scrolling technology by 3 pm, and completing a dynamic file parsing module by 9 pm. Similar to OpenAI's Deep Research and Google's Gemini-based "Deep Research" released in December last year, Hugging Face's solution involves adding an "agent" framework to existing AI models, enabling them to perform multi-step tasks such as collecting information, building reports, and presenting them to users.
Open Deep Research consists of an AI model (OpenAI's o1) and an open-source "agent framework" that helps the model plan its analysis and guide its use of tools like search engines. Despite many excellent large models being freely available in open-source form, OpenAI has not disclosed much about the agent framework behind Deep Research. Thus, the team embarked on a 24-hour mission to replicate their results and open-source the required framework.
The primary improvement the team aimed to address in traditional AI agent systems was the adoption of so-called "code agents." They argue that having agents express their actions in code has many advantages, with the most significant being that code is inherently designed to express complex sequences of actions.
Hugging Face has recreated a core component that ensures the project runs smoothly. They used their open-source "smolagents" library, which employs "coded agents" instead of JSON-based agents. These coded agents write actions in program code, allegedly increasing task completion efficiency by 30%. This allows the system to handle complex sequences of actions more concisely.
In terms of tools, like other open-source AI applications, the developers behind Open Deep Research do not waste time on iterative design with the help of external contributors. The team leveraged others' work, significantly reducing development time. While achieving optimal performance might require a fully functional web browser interaction like Operator, they started with an extremely simple text-based web browser for initial proof-of-concept and used a basic text checker for reading various text file formats.
These tools were derived from Microsoft Research's outstanding Magentic-One agents, with minimal modifications, as their goal was to achieve high performance while keeping complexity as low as possible. Their short-term roadmap for improving these tools includes increasing the number of readable file formats, proposing more refined file processing methods, and replacing the existing browser with a vision-based browser.
The Hugging Face team also recognizes that OpenAI's Deep Research tool may benefit from the excellent web browser introduced with Operator. They plan to create a Graphical User Interface (GUI) agent, an agent that "can view the user's screen and operate directly through mouse and keyboard actions." To this end, they are hiring a full-time engineer to help advance this project and more.
In terms of model selection, Open Deep Research stands on the shoulders of OpenAI's large language models and reasoning models via API. However, it is also adaptable to other open-weight AI models. The innovation lies in Open Deep Research's agent structure, which integrates everything together and allows the AI model to complete research tasks autonomously.
Aymeric Roucher from Hugging Face, involved in the Open Deep Research project, revealed how the team selected AI models. "We did not choose open models because research found that closed models performed better. But we will publicize the entire development process and show the code. Everyone can switch to any other model; it supports a fully open-source process."
Roucher added, "I tried various large models, including DeepSeek R1 and o3-mini. For this use case, o1 worked the best. But as we proceed with the open-R1 initiative, we will consider replacing o1 with better open models." Regarding o3-mini, the team said, "It is indeed fast, but it does not perform as well as o1 and gpt-4o. I think if the model is too small, it still can't handle tough tasks." Speaking of DeepSeek R1, they said, "Its performance is not as good as o1. It's not 'dumb' like many large language models, but rather due to insufficient adaptation to framework guidelines. So we are considering fine-tuning to address this issue!"
While the core large model or SR model of this research agent is crucial, Open Deep Research demonstrates that building the correct agent layer is key, and benchmark tests have proven that multi-step agent methods significantly enhance the capabilities of large language models: OpenAI's GPT-4o (without an agent framework) scored an average of 29% on the GAIA benchmark, while OpenAI Deep Research scored as high as 67%.
Notably, besides Open Deep Research, there are other OpenAI Deep Research tool "replicas" that rely on open-source models and tools, including node-DeepResearch and OpenDeepResearcher. However, the original Deep Research is supported by a version of the o3 model, and these alternatives may not match up without a model comparable to o3 behind them.
On the GAIA benchmark for general artificial intelligence assistants, Open Deep Research achieved a 54% accuracy score. In comparison, OpenAI's Deep Research tool scored 67.36%. Hugging Face explained in their post that the GAIA test includes complex multi-step questions, such as:
In the 1960 film "The High and the Mighty," a real ocean liner was used as a prop. Which fruits that appeared in the breakfast menu of the ocean liner in October 1949 are present in the 2008 painting "Embroidery of Uzbekistan"? List these fruits in a comma-separated list, starting from the 12 o'clock position and moving clockwise according to their appearance in the painting, using the plural form of the names.
To answer such questions correctly, AI agents must search multiple different sources and combine them into a coherent answer. Many problems in GAIA are quite challenging even for humans, effectively testing the capabilities of agent-based AI.
While the performance of this open-source research agent may not yet truly rival OpenAI, its emergence at least gives more developers the freedom to research and improve the technology. The Open Deep Research project also demonstrates the research community's ability to quickly replicate and publicly share proprietary AI features, which were previously only available from commercial providers.
Some netizens exclaimed, "This is significant! Open-source alternatives are just what the field of artificial intelligence needs. Considering the development timeline, achieving a 55% score in the GAIA benchmark test is already quite remarkable—I'm looking forward to seeing its future development."
Roucher summarized, "I think the benchmark test results are of great guiding significance for solving difficult problems. However, in terms of speed and user experience, our solution still can't match proprietary results in terms of optimization level." According to him, Hugging Face's next improvement plan includes not only supporting more file formats and vision-based web browsing capabilities but also attempting to clone OpenAI Operator, which can perform various other types of tasks in a web browser environment (such as viewing computer screens and controlling mouse/keyboard input).
Furthermore, Roucher said, "The response has been great. Many new contributors have joined and have made supplementary suggestions. It really feels like surfing on the wave of technological development, thanks to the strong support from the community!" Some netizens commented, "This is a classic Streisand effect (note: trying to prevent the public from learning about certain information, but instead making the information more widely known). You've annoyed a group of excellent engineers who spend all day coding at their companies and then continue to code for free at home at night. You tell them they can't do something, and they are determined to do it. We don't need OAI (abbreviation for OpenAI)."
It is worth mentioning that the astonishing development speed of open-source AI seems to have also intimidated OpenAI, which is keen to shift towards a profit-oriented model. Previously, after witnessing the popularity of DeepSeek, OpenAI's CEO Sam Altman stated that OpenAI has "always been on the wrong side of history" in terms of open-source AI. Today, just two days after the launch of Open Deep Research, OpenAI posted an announcement that the ChatGPT search function has officially been launched for all users, without the need for registration or login. This means that now everyone can use ChatGPT for web searches.