Explosion! Google I/O Conference King Returns: Gemini "World Model" Emerges, Search Gets a "Brain Transplant", Create Original Movies with One Sentence

Just now! Google held the I/O 2025 conference, and the biggest feeling is that Google is back in the leading position in AI. Google is building a true AI operating system centered on Gemini, with the initial appearance of a "world model".

This year's Google I/O conference updated and released a huge amount of technology in one go.

First, it needs to be clear that the highly anticipated Gemini 2.5 Ultra model did not arrive as scheduled. What we got is a $250 "Ultra Tier" subscription, not the Ultra model itself that we were eagerly awaiting. However, with the launch of Gemini 2.5 Pro "Deep Think", the Pro model has undergone a major revolution, and its actual capability is comparable to the Ultra level.

So, what exactly is new? (The list is indeed very long):

Models and Agent Tools

Gemini 2.5 Pro "Deep Think": Possesses parallel thinking capabilities, specifically designed for complex mathematical and coding tasks, and offers a configurable "thinking budget" to enhance control, pushing Gemini 2.5 Pro to the extreme, with performance as follows:

Gemini 2.5 Flash May 20 version: Faster, more economical, and adds a "thinking summary" function to increase transparency. Its performance is infinitely close to Gemini 2.5 Pro.

Gemini Diffusion: Google's first application of diffusion technology to text generation, launching the experimental model Gemini Diffusion, which is 5 times faster than previous top models.

Jules: Comparable to OpenAI's Codex, an asynchronous coding agent that can handle error fixing and feature prototype development in the background. Requires registration and waiting to use.

Multimodal Capability Explosion

Google Meet: Adds real-time translation function.

Veo 3: Significantly improved video generation capability, generating videos with 4K realistic effects, and supporting native audio, dialogue, and noise synthesis.

Imagen 4: Comparable to and surpasses OpenAI gpt-4o's image generation capability, but is 3 times faster. A 2K image model, faster and more accurate in layout and text-to-image generation.

Flow: A brand new filmmaking tool, co-created with Hollywood directors, "Flow" combines the capabilities of Veo 3 and Gemini, and can build complete film scenes based on text prompts.

Flow allows creators to "direct" AI more intuitively: upload your own character and scene materials, or instantly generate with Imagen; describe the desired shot through precise camera instructions, and Flow will automatically generate clips and maintain consistency of characters and scenes. You can iterate infinitely, adjust shots, extend or trim clips, just like in traditional editing software. Flow's goal is to bring filmmaking into a new "flow" state, allowing creativity to grow naturally, changing film creation from "step-by-step" to "burst of inspiration".

Google Search Completely Reshaped: A Brand New "AI Mode"

More complex queries: Users can now ask complex questions two to three times longer than traditional search, such as "I have a light gray sofa and want to find a blanket that will brighten the room. I have 4 active kids at home, and friends often visit." AI Mode will dynamically generate responses with text and images, including links, business information, and ratings.

Deep Search: For questions requiring more detailed answers, AI Mode can perform "Deep Search". It can send dozens or even hundreds of queries simultaneously, integrating data from across the web, Knowledge Graph, Shopping Graph, and Maps community, and generate an expert-level report with complete citations within minutes, saving you a lot of research time.

Complex analysis and visualization: AI Mode can help you analyze complex data and generate visualized charts. For example, if you want to know the batting average and on-base percentage of famous baseball players using "torpedo bats" this season and last season, it can immediately generate a table and generate charts based on subsequent questions, just like having a dedicated sports analyst!

Search Live: Project Astra's real-time capability is also integrated into Search! Through your phone camera, you can have a "video call" with Search, letting it see what you see, and provide real-time help. Whether it's DIY home repair, difficult homework, or learning new skills, it can become your "remote expert".

Agentic Checkout: AI Mode can also help you complete shopping tasks! It will browse multiple websites, analyze hundreds of options, help you filter, compare prices, and even directly link to the checkout page, helping you quickly grab tickets. In the future, it will also support restaurant reservations and local service appointments.

Google Joins AI Glasses Development: AI Will Not Only Change the Digital World, But Also Profoundly Affect the Physical World.

Immersive headset: Project Moohan, in collaboration with Samsung, is the first Android XR device. It provides an "infinite screen" experience. In the XR version of Google Maps, you just tell Gemini where you want to go and you can "teleport" to any corner of the world; you can also watch games in the MLB app as if sitting in the front row of the stadium, while discussing player data with Gemini. It will be available later this year.

Lightweight glasses: Google demonstrated the latest Android XR glasses prototype, which is light and portable, can be worn all day, and integrates cameras, microphones, and speakers. The optional in-lens display can also privately show information when needed. This means your AI assistant will truly "see" and "hear" what you see and hear, providing real-time, context-aware help, like wearing "superpower glasses"! In a live demonstration, it could identify the coffee shop name on a coffee cup, help you navigate, make a coffee reservation, and even perform real-time cross-language translation. Google announced that Warby Parker and Gentle Monster will be among the first eyewear brands to collaborate with Android XR. In the future, you will be able to wear stylish AI glasses that fit your style, and developers will also begin developing for the glasses platform later this year.

Other

Gemma 3n: An ultra-lightweight multimodal model (supports text, image, audio, video), specifically designed for smartphones and edge devices.

Lyria RealTime: Interactive music large language model, supports live performance, and can be fine-tuned via API.

MedGemma & SignGemma: Two open professional models, respectively used for medical image analysis and sign language translation.

Agentic Colab: A notebook environment capable of self-repairing code and automating tasks.

Gemini Code Assist 2.5: Free programming assistant and code review agent, now supports 2 million token context.

Firebase Studio: AI workspace that converts Figma designs into full-stack applications and automatically sets up the backend.

Stitch: Can generate UI designs and frontend code based on descriptions or images.

Google AI Studio Upgrade: Directly integrates Gemini 2.5 Pro, Imagen 4, and Veo 3 in the editor, and provides GenAI SDK.

New Gemini API features: Including native audio output, real-time API, asynchronous function calls, computer usage API, URL context, and MCP support.

Project Beam: Successor to the Starline project, developed in collaboration with HP to develop 3D video call hardware.

Project Astra Upgrade: An active multimodal assistant that can see, hear, and speak.

The above is a brief summary of the content released at this Google conference.

Concluding Remarks

First, this clearly shows how Google is putting all its effort into developing its AI ecosystem. If Apple was known for its excellently coordinated device ecosystem in the past, Google is now taking this concept to a new level through AI. Specifically: Gemini can now work proactively within the system.

Furthermore, thanks to its native language module coordinated across all products, Gemini is more deeply integrated into almost all Google products. Whether it's Google Watch, XR glasses, or Pixel phones, Gemini can adapt perfectly and provide corresponding extended functions based on device characteristics (e.g., the map overlay function in XR devices, the effect is amazing!).

Therefore, if Apple previously achieved interconnection of all devices through iCloud, Google is now going a step further.

During the press conference, DeepMind CEO and Nobel laureate Demis Hassabis mentioned that they are working hard to expand Gemini into a "world model". He defined it as "a model capable of planning and imagining new experiences by simulating various aspects of the world, just like a brain." Google is definitely working on this internally; this is the ultimate move to achieve AGI.

Google, the king is back.

Explosion! Google I/O Conference King Returns: Gemini "World Model" Emerges, Search Gets a "Brain Transplant", Create Original Movies with One Sentence

Share Short URL