Human Scientists Are Still Better Than AI Ones – For Now

New research indicates that human scientists and engineers still lead AI models. Super-intelligent AI models fail when they pit themselves against a newly found Discovery World simulator. The new deep, core need of humanity is manifest in humanity’s creativity while hypothesizing hypotheses and running separate experiments – something early AI produced predictions for protein interactions with a percentage of their success that they attain based on focusing on specific issues.

Don’t Miss This: Investments in generative AI startups topped $3.9B in Q3 2024

DiscoveryWorld is an idea by Peter Jansen and his team at the Allen Institute for Artificial Intelligence in Washington State. The facility will allow researchers to test the hypothesis that AI can reproduce the scientific discovery process. “It’s a virtual environment where we can evaluate different AI systems’ ability to conduct research without the cost and timeline of real-world experiments,” says Jansen.

It comes with eight research areas ranging from archaeology to chemistry, physics, plant biology, epidemiology, rocket science, language translation, and proteomics – the study of proteins. It also provides three difficulty levels: easy, regular, and challenging, mirroring real-world scientific problems in a game-like format. The researcher used three AI agents based on OpenAI’s GPT-4o, all using different approaches to solve problems in DiscoveryWorld. The first one, “ReAct,” learns how to act at each step from its new observations.

The second, “Plan+Exec,” does plan a little bit before following a similar approach as ReAct. The third, “Hypothesizer,” has a memory feature: it keeps track of its central hypothesis while analyzing ongoing results. In addition to these AI models, 11 researchers with advanced degrees in natural sciences tried to solve the same DiscoveryWorld challenges.

The results were informative enough: whereas the AI agents could solve only 15% to 20% of all regular and challenge-level tasks, the human scientists and engineers solved about 66% on average. This paper was presented at the NeurIPS conference in San Diego and emphasized human intuition for research.

According to experts like Jeff Clune of the University of British Columbia, such complexities are intricate to simulate; real-world science is full of complexities. He admits that the project is ambitious but mentions that replicating actual scientific processes in a virtual environment is still challenging. Other researchers, like Molly Crockett and Lisa Messeri, think that focusing on personal discovery will limit the number of perspectives on a question, which is necessary for deep insights. However, while AI was striding forward at lightning speed, human scientists remain integral to the impetus of scientific discovery.

Related Articles