Typically, in the weeks leading up to I/O, the outside world wouldn't hear much about the conference because Google usually saves its best models for the event itself. But in the Gemini era, Google is likely to suddenly release their strongest AI models on a random Tuesday in March, or announce cool breakthroughs like AlphaEvolve a week early.
Because in the era of large models, getting the best models and products to users as quickly as possible is a demonstration of a company's technological capability.
At 1:00 AM Beijing time on May 21st, as multiple products were launched at the Google I/O 2025 conference, waves of enthusiastic applause erupted at the venue.
During this keynote speech, Google CEO Sundar Pichai, as the keynote speaker, spent over an hour rapidly introducing numerous updates from Google in areas such as AI, mobile operating systems, and search. In this single conference, Gemini was mentioned 95 times and Artificial Intelligence was mentioned 92 times, based on preliminary statistics.
Below are some important updates from this conference, starting with the model layer.
Introducing Deep Think reasoning model and better 2.5 Flash for Gemini 2.5 Pro
The climax of this conference was Google's announcement of the introduction of the Deep Think reasoning model and a better 2.5 Flash for Gemini 2.5 Pro.
Google announced at the conference that it has started testing a reasoning model called “Deep Think” for Gemini 2.5 Pro. DeepMind CEO Demis Hassabis stated that the model adopts “cutting-edge research results,” enabling it to weigh multiple hypotheses before responding to queries.
2.5 Pro Deep Think achieved impressive results on the 2025 USAMO, one of the most challenging math benchmarks currently available. It also took the lead on LiveCodeBench, a more difficult benchmark for competitive programming, and scored 84.0% on MMMU, which tests multimodal reasoning.”
However, Google stated that further in-depth safety assessments and expert opinions are needed before widespread release, so it will first be open to trusted testers through the Gemini API.
Google also released a more powerful Gemini 2.5 Flash model, which achieves significant optimization in speed and efficiency: improved inference efficiency, reduced token consumption, and surpasses previous generations in benchmark tests for multimodal processing, code generation, and long text understanding.
2.5 Flash is Google’s most efficient workhorse model, designed for speed and low cost—and it’s now improved across several dimensions. It’s improved on key benchmarks for reasoning, multimodality, code, and long context, while also becoming even more efficient, using 20-30% fewer tokens in our evaluations.
The official version will be launched in early June. Currently, developers can preview it through Google AI Studio, enterprise users can experience it through Vertex AI, and general users can try it in the Gemini application.
Although I/O primarily showcased the efficiency breakthroughs of 2.5 Flash, Google announced that it will introduce the “Thinking Budgets” concept from this model into the more advanced 2.5 Pro version. This feature allows users to balance token consumption with output precision/speed.
In addition, Google is integrating “Project Mariner” into the Gemini API and Vertex AI. Developed based on Gemini, this project enables navigation and completion of user-specified tasks through a browser and is expected to be widely available to developers this summer. Concurrently, Google is also launching a text-to-speech preview feature for the 2.5 Pro/Flash models through the Gemini API, supporting two speaker voices in 24 languages.
It is worth mentioning that the Gemini 2.5 series introduces several new features.
First are native audio output and Live API improvements. The Live API has launched a preview of audio and video input and native audio output conversations, allowing you to build conversational experiences directly using a more natural and expressive Gemini.
It also allows users to control its tone, accent, and speaking style. For example, users can ask the model to use a dramatic voice when telling a story. It also supports using tools so it can perform searches on behalf of the user.
A range of early features that users can now try includes:
Emotional conversation, where the model can detect the user's emotions in their voice and respond appropriately.
Proactive audio, where the model will ignore background conversations and know when to respond.
Thinking in the Live API, where the model leverages Gemini's thinking capabilities to support more complex tasks.
Google will also release a new text-to-speech preview feature for the 2.5 Pro and 2.5 Flash versions. These features support multiple speakers for the first time, enabling two-channel text-to-speech through native audio output.
Similar to Native Audio conversations, the text-to-speech feature is expressive and can capture very subtle nuances, such as whispering. It supports over 24 languages and can seamlessly switch between languages.
Second is the enhancement of computer operation capabilities. Google is introducing the computer operation capabilities of Project Mariner into the Gemini API and Vertex AI. It supports multitasking, allowing up to 10 tasks to be executed simultaneously, and adds a “Learn and Repeat” function, enabling AI to automatically complete repetitive tasks.
Third is a significant enhancement of protection against security threats, such as indirect prompt injection. This refers to malicious instructions embedded in data retrieved by the AI model. Google's new security methods significantly improve Gemini's protection rate against indirect prompt injection attacks during tool use, making Gemini 2.5 our safest model series to date.
Fourth is the addition of three major practical features to enhance the developer experience:
Enhanced Thought Summarization function. The Gemini API and Vertex AI now add a “Thought Summarization” function for the 2.5 Pro/Flash models, which can structure the model's original reasoning process into a clear format with titles, key details, and operational instructions (such as when to call tools). This design aims to help developers understand the model's decision logic more intuitively, improving interactive interpretability and debugging efficiency.
Expanded Thinking Budget mechanism. Following 2.5 Flash, the thinking budget function now covers the 2.5 Pro model, allowing developers to balance response quality and latency costs by adjusting token allocation. Users can freely control the depth of the model's thinking, and even turn off the function completely. The full thinking budget support for Gemini 2.5 Pro official version will be released in the coming weeks.
Gemini SDK compatibility with MCP tools. The Gemini API adds native SDK support for MCP, simplifying integration with open-source tools. Google is exploring托管 solutions such as deploying MCP servers to accelerate agent application development. The team will continue to optimize model performance and development experience while strengthening fundamental research to expand Gemini's capabilities. More updates are coming soon.
Regarding the next steps for Google Gemini, Google DeepMind CEO Hassabis stated that they are working to expand their best Gemini model into a “world model,” enabling it to plan and imagine new experiences by understanding and simulating the world like a human brain.
AI Mode is the future of Google Search
As one of Google's core businesses, every iteration of Google Search attracts industry attention.
Google stated that the Gemini model is helping Google Search become smarter, more agentic, and more personalized.
Since its launch last year, AI Overviews have reached over 1.5 billion users across 200 countries and regions. As people use AI Overviews, Google finds they are more satisfied with search results and search more frequently. In Google's largest markets like the US and India, AI Overviews have driven query type growth by over 10%, and this growth rate continues to increase over time.
Pichai called it one of the most successful product launches in search over the past decade.
Now, for users who want a complete end-to-end AI search experience, Google is launching a new AI mode. It completely reshapes the search experience. With more advanced reasoning capabilities, users can ask longer, more complex queries in AI mode.
In fact, early testers are asking queries two to three times longer than traditional search queries, and users can also explore more deeply with follow-up questions. All these features are directly available in a new tab within search.
Pichai said, “I've been using Google Search frequently, and it has completely changed the way I use Google Search. I'm excited to tell you that AI mode will be available to all users in the US starting today. With our latest Gemini model, our AI responses are not only at the quality and accuracy you expect from Google Search, but also the fastest in the industry. Starting this week, Gemini 2.5 will also be available in Google Search in the US.”
Introducing Video Model Veo 3
In terms of multimodality, Google announced the upcoming launch of its latest advanced video model, Veo 3, which now features native audio generation capabilities. Google will also launch Imagen 4, Google's latest and most powerful image generation model. Both models are available in the Gemini application, opening up a whole new world of creativity.
Google brings these possibilities to filmmakers with a new tool called Flow. Users can create movie clips and extend short segments into longer scenes.
Prompt: A wise old owl soaring high, peering through the moonlit clouds above the forest. This wise old owl carefully circles the clearing, looking down at the forest floor. After a moment, it swoops down to the moonlit path and lands beside a badger. Audio: Wing flapping, bird calls, loud and pleasant rustling of wind, and intermittent buzzing, breaking branches underfoot, and squawks. This is a light orchestral piece, with woodwinds throughout, a cheerful and optimistic rhythm, full of innocent curiosity.
A wise old owl and a nervous badger sit on a moonlit forest path. "They left a 'ball' today. It bounced higher than I could jump," stammered the badger, trying to understand what this meant. "What magic is this?" hooted the owl thoughtfully. Audio: Owl hooting, badger's nervous chirping, rustling leaves, cricket chirping.
A wise old owl flies out of the frame, and a nervous young badger runs in the other direction. In the background, a squirrel scurries past, making a rustling sound on dry autumn leaves. Audio: Bird calls, loud rustling of falling leaves, and intermittent buzzing, sounds of branches breaking underfoot, and the sound of a squirrel moving through dry fallen leaves. In the distance, the hooting of the owl, the nervous chirping of the badger, the rustling of leaves, and the chirping of crickets are heard, these sounds filled with innocent curiosity.
Coding Assistant Jules Enters Public Beta
At the conference, Google announced that Jules is officially entering public beta, and developers worldwide can experience it directly.
Jules is an asynchronous agent-based coding assistant that integrates directly with developers' existing codebases. It clones the developer's codebase into a secure Google Cloud virtual machine (VM), understands the full context of the project, and performs tasks such as: writing tests, building new features, providing audio update logs, bug fixes, and changing dependency versions.
Jules runs asynchronously, allowing developers to focus on other tasks while it runs in the background. When finished, it presents its plan, reasoning process, and the diffs of the changes made. Jules is private by default; it does not use the user's private code for training, and the user's data remains isolated in the execution environment.
Jules uses Gemini 2.5 Pro, enabling it to utilize some of the most advanced coding reasoning techniques available today. Combined with its cloud VM system, it can quickly and accurately handle complex multi-file changes and concurrent tasks.
Specifically, what can Jules do?
Works with real codebases: Jules doesn't need a sandbox. It can leverage the full context of existing projects to intelligently infer changes.
Parallel execution: Tasks run inside cloud VMs, enabling concurrent execution. It can handle multiple requests simultaneously.
Visible workflow: Jules shows you its plan and rationale before making changes.
GitHub integration: Jules works directly within the user's GitHub workflow. No context switching or extra setup is required.
User controllability: Modify the presented plan before, during, and after execution to maintain control over your code.
Audio summaries: Jules provides audio changelogs of recent commits, turning your project history into contextual changelogs you can listen to.
Project Astra, Prototype of Google's General AI Assistant
At last year's Google I/O developer conference, one of the most interesting demos was Project Astra, an early version of a multimodal AI that could recognize its surroundings in real time and answer related questions conversationally. While the demo offered a glimpse into Google's plans for a more powerful AI assistant, the company carefully pointed out that what we saw was just a “research preview.”
However, a year later, Google has outlined the vision for Project Astra, hoping that it will power a future version of Gemini, making it a “universal AI assistant.” To achieve this goal, Project Astra has undergone some significant upgrades. Google has been upgrading Astra's memory—the version we saw last year could only “remember” for 30 seconds at a time—and has added computer control capabilities, allowing Astra to now perform more complex tasks.
This multimodal, all-seeing robot is not a real consumer product and will not be available to anyone in the short term except for a small number of testers. Astra represents Google's grandest, wildest, and most ambitious dream for how AI can serve humanity in the future. Google DeepMind Research Director Greg Wayne stated that he sees Astra as the “concept car for a universal AI assistant.”
Ultimately, the features available in Astra will be ported to Gemini and other applications. This already includes the team's work on voice output, memory, and some basic computer usage capabilities. As these features become mainstream, the Astra team finds new directions to work on.
Project Aura Smart Glasses Are Back
Let's look at hardware. It seems the era of Google smart glasses is back. Today, Google and Xreal announced a strategic partnership at the conference to jointly develop a new Android XR device called Project Aura.
This is the second device officially launched since the release of the Android XR platform in December last year. The first was Samsung's Project Moohan, but that is an XR headset more similar to Apple Vision Pro. Project Aura, on the other hand, maintains a close relationship with Xreal's other products. The technically accurate term should be “optical see-through XR” device. More colloquially, it is a pair of immersive smart glasses.
Xreal's glasses, like the Xreal One, are like embedding two mini-TVs in a pair of ordinary sunglasses, looking slightly bulky. Xreal's previous glasses could connect to phones or laptops to view content on the screen, whether it was a show being played or confidential documents you wanted to edit on a plane. Their advantage is that users can adjust the opacity to see (or block out) the surrounding world. Project Aura also holds this same philosophy.
However, Google did not reveal more information about this hardware at the conference. Xreal spokesperson Ralph Jodice stated that more information will be released at the Augmented World Expo next month. Some known information indicates it will have Gemini built-in and a larger field of view. In the product renders, we can see cameras on the hinges and nose bridge, as well as microphones and buttons on the temples.
This suggests that the hardware will be upgraded compared to Xreal's existing devices. Project Aura will be powered by a Qualcomm chipset optimized for XR. Like Project Moohan, Project Aura also hopes developers will start building applications and use cases now so they can be ready before a consumer product is actually released. Speaking of which, Google and Xreal stated in a press release that Android XR applications developed for headsets can be easily ported to other devices like Project Aura.
Interestingly, Google's strategy for the next era of smart glasses is similar to how it initially launched Wear OS—Google provides the platform, and third parties are responsible for the hardware. While details are scarce, this will be the second official device launched on the Android XR platform.
Disclaimer: This article is translated and compiled by InfoQ and does not represent the platform's views. Unauthorized reproduction is prohibited.
Recommended articles today
Jeff Dean: AI will replace junior engineers within a year, Netizens: “Altman only draws big pictures, what Jeff says is fatal”
Zero Offers from a Thousand Resumes, 42-year-old PHP Programmer Makes a Living Driving for Ride-Sharing: In the AI era, the midlife crisis is happening?
Upset! ByteDance Seed only solved one sign-in problem in the CCPC finals, while DeepSeek R1 got zero?
Borg scheduling evolved! Google's super AI Agent is here: Can design algorithms, improve system efficiency, helped by Terence Tao, netizens hail it as the king of science!
Event Recommendation
AICon 2025 is coming strong, Shanghai in May, Beijing in June, twin cities联动, a full view of AI technology frontier and industry implementation. The conference focuses on the deep integration of technology and applications, bringing together topics such as AI Agent, multimodal, scene applications, large model architecture innovation, intelligent data infrastructure, AI product design, and overseas expansion strategies. Scan the QR code now to purchase tickets and explore the boundaries of AI applications together!