Xinzhiyuan Report
Editors: YHluck Dinghui
【Xinzhiyuan Guide】Google, Stanford, and others have successively launched "AI scientists" to assist human scientists in promoting a paradigm shift in scientific research. After personally trying them, scientists were either shocked by their deep insights or questioned their lack of inspiration and human warmth. Can AI replace human thought?
Thomas Montine, a pathologist at Stanford University, held a meeting as "usual" on a Sunday morning in April.
He first assigned tasks to several "neuroscientists," a "neuropharmacologist," and a "medicinal chemist" — to research potential treatments for Alzheimer's disease.
A few minutes later, he received a research report of over ten thousand words.
In this meeting, no one interrupted, no one went off-topic, and no one played with their phones. How did "they" communicate?
Welcome to the daily life of scientists led by AI — "virtual AI scientists," an unprecedented redefinition of the basic unit of scientific research.
With the help of LLMs, "AI scientists" are reshaping the research process.
From Google and Stanford to the Shanghai AI Laboratory, scientists are testing AI teams composed of virtual scientists.
These research teams, made up of "chatbots," are assisting scientists with brainstorming, experiment design, literature integration, and even proposing research hypotheses.
Can this new human-AI "co-research" model become the prototype for future scientific paradigms?
The team of computational biologists at Stanford University announced the Virtual Lab system in November 2024, and Montine was using a version of it.
Paper address: https://www.biorxiv.org/content/10.1101/2024.11.11.623004v1.full.pdf
Coincidentally, a research group at the Shanghai AI Laboratory also launched a similar virtual scientist system, VirSci, in October 2024.
Open-source address: https://github.com/open-sciencelab/Virtual-Scientists?tab=readme-ov-file
Paper address: https://arxiv.org/pdf/2505.12039
Most notably, Google researchers are exploring this concept.
In February of this year, Google launched a multi-agent AI system built on Gemini 2.0, serving as a "virtual scientific collaborator."
These "virtual scientists" help real scientists generate novel hypotheses and research proposals, thereby accelerating the process of scientific and biomedical discovery.
Paper address: https://arxiv.org/pdf/2502.18864
In these systems, AI scientists can not only exchange "ideas" but also access the internet and write code.
Multiple "AI scientists" can be combined into a larger system, able to focus on their respective problems without distraction.
Rick Stevens, a computer scientist at the University of Chicago and Argonne National Laboratory in Illinois, said:
In a sense, this is essentially not much different from having more colleagues.
Except they don't get tired and are trained across the board.
Recently, Nature published an article delving into the most authentic feelings of scientists about these "AI scientists."
What is the style of a "research team" composed entirely of AI chatbots when they hold a meeting? A room full of Nobel laureates? Or a group of undergraduates?
Are these "AI scientists" just simple chatbots, or do they possess more complex underlying technology?
What are the differences between the three "AI Collaborative Scientist" systems?
Stanford University's system was built by James Zou's team using GPT-4o.
It defaults to two AIs working: one as the "Chief Investigator," responsible for leading the brainstorming, and the other as a "Critic," specifically offering useful improvement suggestions.
Open-source address: https://github.com/zou-group/virtual-lab
Google's system was created by DeepMind's Alan Karthikesalingam and Vivek Natarajan, among others, using Gemini 2.0.
Compared to Stanford's system, it is more academic-oriented, specializing in biomedical research.
The system architecture is as follows:
AI Collaborative Scientist System
According to Pichai, it is a "virtual assistant for scientists" that utilizes advanced reasoning capabilities to synthesize vast amounts of literature, accelerating scientific breakthroughs by "generating novel hypotheses" and proposing detailed "research strategies."
The difference between Google's and Stanford's systems is that the former does not allow users to assign scientific specialties to agents.
Simply put, Google's AI system can do several things:
Generate ideas, analyze and critique, transform old ideas into new ones, check for similar ideas, rank all ideas, and finally reflect on its overall performance.
The VirSci system from Shanghai AI Laboratory was proposed by Nanqing Dong et al.
It acts like an organizer, specifically coordinating "group armies" to get things done.
According to the team, VIRSCI includes five key steps: collaborator selection, topic discussion, idea generation, novelty evaluation, and summary generation.
It can be seen that LLMs in these systems not only exchange ideas with each other but also search the internet, execute code, and interact with other software tools, making them part of "autonomous AI."
So, how do they differ from human scientists?
Rick Stevens, a computer scientist at Argonne National Laboratory, stated directly that in a sense, this is essentially no different from having more colleagues.
They can work tirelessly 24 hours a day and have received comprehensive training.
Human Scientists VS "AI Scientists"
What happens when human scientists truly begin to work with these "virtual colleagues"?
Are the ideas proposed by AI scientists inspiring and amazing, or merely logically consistent but lacking practical value?
Is their existence a catalyst for amplifying inspiration, or does it become another form of information noise?
Gary Peltz: I almost fell out of my chair
Gary Peltz, a medical researcher at Stanford University, frequently uses AI and was one of the first testers of Google's AI Co-Scientist project.
He hoped to use the system to find drugs for treating liver fibrosis.
At the time, Google's system was not yet fully developed, so he sent his requirements to a Google staff member.
About a day later, he received the output from Google's AI Co-Scientist system, some excerpts of which are below.
"When I read it, I almost fell out of my chair," Peltz said.
Peltz had just written a proposal emphasizing the importance of epigenetic changes in liver fibrosis, and this "AI Co-Scientist" surprisingly targeted the same topic for its suggested treatments.
The AI Co-Scientist proposed three drugs, while Peltz proposed two others (all these drugs are already approved for treating other diseases).
To accelerate system development and testing, Google hired Peltz.
Over the next few months, Peltz's lab tested these five drugs in their human organoid models.
Two of the three suggestions made by AI showed potential for promoting liver regeneration and inhibiting fibrosis, while neither of Peltz's two suggestions worked.
Peltz said this experience left a deep impression on him: "These large language models (LLMs) are as important to early human society as fire."
Of course, not everyone agreed; other liver researchers commented that the drug suggestions proposed by the AI were neither particularly innovative nor deep enough.
Researchers at the Icahn School of Medicine at Mount Sinai believed that "these suggestions are quite common sense and don't offer much profound insight."
But Peltz stated, "What particularly shocked him was that the AI did not prioritize what I valued."
Reading the AI report felt similar to discussing with a postdoctoral researcher.
"The AI sees the problem completely differently from me."
Francisco Barriga: AI thinks like me
Francisco Barriga is from the Cancer Genomics group at the Vall d'Hebron Institute of Oncology in Barcelona.
Francisco Barriga is a biochemist, professionally a mouse model expert and genome engineer, completely non-coding and with virtually no AI experience.
He participated in this trial with hesitation, suspecting he would serve as a technologically unsophisticated control group member.
Barriga asked the AI to design mouse model experiments to test the ability of specific biochemical compounds to affect tumor or immune cells using the minimum number of mice.
This is a topic he is very familiar with.
Barriga stated that the "AI scientist" team's proposed solution was exactly what he would have done:
The AI scientist team chose "the right model, the right experiment."
However, Barriga said he always felt that something crucial was missing from the process.
"There was absolutely no human participation in this process."
These AI agents took turns "speaking," often using numbered lists, and never being rude, interrupting, or arguing.
"It lacked that intuitive leap, like the inspiration you might gain from a casual chat with a plant biologist over coffee in the hallway at three in the afternoon."
Of course, Barriga could add a plant biologist—or a quantum physicist, or anyone else—to his virtual team, but he has not tried doing so yet.
"Perhaps it can be used for bouncing ideas off. But will it change my daily way of working? I doubt it," Barriga added.
This system might serve as a reference for his doctoral students when they encounter difficulties:
If they have a problem and I'm too busy to attend to it, perhaps I can be replaced.
Peltz and Barriga represent two attitudes of real human scientists towards "AI scientists" — surprise and hesitation.
As Barriga joked, perhaps AI scientists can at most replace him in giving advice to his doctoral students.
Catherine Brownstein, who studies rare diseases at Boston Children's Hospital in Massachusetts, perhaps knows better how to interact with these "AI scientists."
Large Language Models (LLMs) can increase speed, efficiency, and broaden ways of thinking.
But she cautioned that users usually must have expertise to be able to spot errors.
You must have a general understanding of what you are talking about, otherwise, it's easy to be completely misled.
Although Brownstein's attitude seems to be "moderate," she is very grateful for these "virtual scientists."
When Brownstein used AI to review a paper she was writing, the AI suggested she ask the patients.
The patients believed which direction the next step of the research should go.
This suggestion surprised and grateful her. AI seemed to have more human touch than humans, or perhaps AI considered things more comprehensively.
She said she should have thought of it, but in fact, she didn't.
I felt so embarrassed then, I paused and stared at the screen for a full minute, thinking:
My goodness, how did I stray so far from my initial passion for patient-centered research?
This type of experience seems to illustrate the prototype of a new type of collaboration: AI is not meant to replace scientists, but to become an always-on, always-focused, unbiased thought partner.
Perhaps the future of science is not always dominated by AI, but is a symphony where human "imperfect" debates and intuitions collide with AI's "perfect" calculations and analyses.
The ultimate great discoveries may emerge from the orderly and disorderly interplay between AI assistants and human scientists.
References:
https://www.nature.com/articles/d41586-025-02028-5
https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/