Xinzhiyuan Report
Editor: Aeneas KingHZ
【Xinzhiyuan Guide】Future AI roadmap exposed! Google invented Transformer, but admits in its roadmap: the existing attention mechanism cannot achieve "infinite context", which means the next generation of AI architecture must be "rewritten from scratch". Is the era of Transformer really coming to an end? What are Google's plans for the future?
Just recently, Google's future AI roadmap was revealed!
Logan Kilpatrick, Google's Product Lead, introduced the future of the Gemini model in his speech at the AI Engineer World Fair.
In the future, Gemini's full modality will be the focus, and the model is gradually becoming an agent, with reasoning capabilities continuing to expand.
Key points at a glance:
Full Modality (r) - Natively supports image + audio generation, with video coming next.
Early Diffusion Experiments (r) - Related to diffusion models.
Default Agent Capabilities (m) - First-class tool invocation and usage capabilities, but more importantly, the model is gradually becoming an agent.
Reasoning Capabilities Continue to Expand (s) - Research breakthroughs are coming one after another.
More Small Models (s) - More content will be shared soon.
Infinite Context (r) - This is impossible with the current attention mechanism and context processing methods. We need entirely new innovations at the core architecture level to achieve this goal.
Large Models - Scale is everything.
Note: (r), (s), and (m) indicate the progress of each item in Google's roadmap:
(s) = short: Short-term / coming soon - indicates projects already in progress or about to be launched.
(m) = medium: Mid-term - projects still under development, to be launched in the coming quarters.
(r) = research: Research / long-term projects - still in experimental phase or require breakthrough advancements before release.
Silicon Valley Tech Giants' Battle
Mid-Year AI Report Card Review
It can be seen that Google is currently riding high, with Gemini 2.5 Pro firmly regaining ground and once again proving its status as a leader in the AI field.
Chubby, a popular figure on X, also conducted a "mid-year review" of Silicon Valley's tech giants.
OpenAI
Still in the lead, with o3, o3 pro, and the upcoming GPT-5, their position remains solid. They maintain regular updates, frequently release AI tools, and their ever-growing user base speaks for itself.
DeepSeek
DeepSeek has launched significant updates after achieving considerable success with r1, but the world is currently awaiting subsequent product r2. There is no clue yet as to how DeepSeek will continue to advance in the future.
Anthropic
Still a leader in software development (SWE). If its CEO's words are true, agents and further developments will automate all processes in the coming years, handled by general agents. Currently, Anthropic is focused on the business sector (which can also be seen from their lower rate limits) and continues to maintain a strong position.
However, the biggest winner this year might be Google, which has almost leaped from an underdog to a leading position. Gemini has achieved remarkable success. Regular product updates, many announcements, including excellent TPU positioning, make Google's future look bright.
Meta
Undeniably, Meta has fallen behind. Llama 4 failed, and Behemoth has not yet been released. Mark Zuckerberg has formed a new super intelligence team to try and catch up again. Whether Alexandr Wang joining Meta from Scale AI will be a turning point remains to be seen.
Grok
Grok 3.5 is also coming soon. It's difficult to evaluate at the moment. Grok is clearly in an advantageous position with its Colossus cluster. However, whether it can train better models remains to be seen.
Among these, Google, which received the highest praise, what major moves will it make next?
Let's take a closer look at Logan Kilpatrick's speech to find key clues.
Company-wide Consensus: Gemini 2.5 Pro is a Major Turning Point for Google
At this conference, Logan Kilpatrick, former OpenAI member and Google AI Studio Product Lead, delivered a packed speech, revealing many details about Gemini 2.5 Pro and Google's future plans for Gemini.
Regarding Logan Kilpatrick, there's an interesting anecdote: it's said that Gemini's joke-making ability was trained entirely on his tweets, which is why they aren't funny.
Currently, Logan Kilpatrick is responsible for Gemini API development and AGI research.
In his speech, Logan Kilpatrick quickly covered three parts:
Some interesting announcements about Gemini 2.5 Pro;
A review of Gemini's progress over the past year;
A look into the future – the model itself, the Gemini App, and future plans for the developer platform.
Regarding Gemini 2.5 Pro, he believes it is considered a "turning point" by both Google internally and the external developer ecosystem –
It has been fully deified in mathematics, programming, and reasoning, firmly holding the top spot on all leaderboards.
It has laid a solid foundation for Gemini's future.
Gemini's Vision
"Unified Assistant"
Logan Kilpatrick posed a question to the audience: What is the connection between Google's various products in the past?
Most people would think of: Google Account. But the Google Account itself doesn't "retain state"; its role is just to let you log in to various independent products.
Now, Gemini is becoming the "unified thread" – the line connecting all Google services.
The Gemini App is very interesting and cool, reflecting how Google thinks about the future of AI products.
He believes that Google's future will look like this:
Gemini will become the unified interface, connecting all Google products, forming a true "universal assistant."
Most AI products currently still involve "user active operation" – you have to actively ask questions, actively request functions.
But the most exciting part is the next stage of AI:
"Proactive AI" – AI actively discovers problems for you, provides suggestions, and automatically handles tasks.
And now, Google is fully betting on a new paradigm shift:
Multimodal Capabilities: Native audio processing already supports Astra and Gemini Live, Veo technology remains industry-leading, and video integration will be the next key focus.
Model Evolution: Transitioning from a pure token processor to an agent with systematic reasoning capabilities, with "reasoning expansion" being particularly noteworthy.
Architectural Innovation: Including a small model ecosystem, infinite context solutions (requiring breakthroughs in existing attention mechanisms), and the astonishing token processing capabilities demonstrated in early diffusion experiments.
Advancing Towards a "Full-Modality Unified Model"
From a model perspective, Gemini was originally conceived as a unified multimodal model: capable of processing audio, images, and video.
Google has made great strides in this area:
Google I/O announced Gemini's native speech capabilities (text-to-speech TTS, speech synthesis, speech interaction);
It already supports natural conversation, sounding very natural;
These capabilities have been integrated into Astro and Gemini Live.
Astro is Google's research prototype, exploring ways to bring breakthrough capabilities to its products.
Currently, Astro integrates the following capabilities:
Google is also advancing "Veo" related capabilities (Video + Other), which have achieved SOTA levels on multiple metrics and will be integrated into the main Gemini model in the future.
Additionally, Google is researching "diffusion-based reasoning" – Gemini Diffusion. However, this project is still at the research frontier and has not yet entered the main line, but its prospects are exciting.
Gemini Diffusion has extremely high throughput, capable of sampling over 1000 tokens per second.
Agents Becoming Mainstream
Recently, Logan Kilpatrick has been thinking: As system reasoning capabilities become stronger, what will future AI products look like?
In the past, developers always treated models as black-box tools:
Input tokens, output tokens;
Then build various scaffolding externally to enhance functionality.
But now, the situation has changed:
Models themselves are becoming more systematized and increasingly capable of autonomous actions, no longer just "passive calculators."
He believes that the "reasoning process" will become a core turning point: how to expand the model's reasoning capabilities.
He is very much looking forward to the question:
Will much of the scaffolding done externally in the past be integrated into the model's internal reasoning process in the future? This will completely change how developers build products.
More Roadmaps: Small Models, Large Models, Infinite Context
In addition, Google will also focus on the following new products and research:
More "small models" - lightweight, suitable for mobile and low-power devices;
Larger models - to meet user expectations for ultimate capabilities;
More importantly: a research breakthrough in "infinite context."
One of the significant flaws of current AI model architectures (such as Transformer) is their inability to effectively support infinite context.
Google believes that since the attention mechanism cannot be infinitely expanded, a new structure must be created.
They are actively exploring: how to enable models to incorporate, understand, and efficiently process ultra-large scale context.
Key upcoming developer features include:
Embeddings: Although they feel like "early AI tools," they remain a core component. Most RAG applications rely on embeddings. Google is about to release a state-of-the-art Gemini embedding model and expand it to more developers.
Deep Research API: Users love the "deep research" functionality. Google is consolidating these capabilities into a dedicated API interface for developers of research-oriented products.
Veo3 and Imagine 4 API access: Coming soon.
The last key point: Google plans to reposition "AI Studio":
No longer a 2C product, but clearly positioned as a "developer platform."
In the future, AI Studio will become a true development tool platform, embedding Agent building capabilities, such as Jules or developer-specific code Agents, to provide developers with a complete building experience.
2024: Gemini's Most Exciting Year
For Google's Gemini team, the past year can be described as "the most insane year."
At Google I/O, Sundar Pichai showed a slide: In the past 12 months, the Google Gemini team seemed to have compressed 10 years of development work.
From a personal perspective, Logan Kilpatrick believes Google's true strength lies in:
Not only doing fundamental AI research, but also advancing research in multiple fields such as science, geometry, and robotics,
These research efforts will eventually feed back into the main Gemini model.
In his Google I/O speech, Pichai also showed another slide: In the past year, Google's server AI inference task processing volume increased by 50 times!
Logan Kilpatrick believes: "This indicates an explosive growth in demand for the Gemini model from the external developer ecosystem."
Actually, the key behind this is not just technology, but organizational structural change.
In early 2023, Google integrated multiple AI research teams into DeepMind, setting a new direction:
No longer limited to theoretical research, but to create truly practical models that serve both Google's internal and external developer ecosystems.
Afterward, they took a second step, integrating the product teams into DeepMind as well. This means:
DeepMind is responsible for model research and development;
And also for building products and delivering them to global users.
Recently, Google also appointed Koray Kavukcuoglu, DeepMind's Chief Technology Officer, to a new Senior Vice President position – Chief AI Architect.
Koray Kavukcuoglu
Works closely with the research team to bring cutting-edge model capabilities into the real world –
Logan Kilpatrick personally enjoys this process of "frontier collaboration."
This pace of innovation is very exciting, and he believes it's just the beginning.
The internal formula for Google DeepMind is simple, summarized in one sentence:
Find the best people, leverage infrastructure advantages, then... keep releasing!
References:
https://www.youtube.com/watch?v=U-fMsbY-kHY&t=1676s
https://www.semafor.com/article/06/11/2025/google-names-new-chief-ai-architect-to-advance-developments