Unveiling Google's AI Revolution: Beam, Gemini, and the Future of Communication and Productivity

Google Beam: A Groundbreaking 3D Video Communication Platform

Google has announced the launch of Google Beam, a revolutionary video communication platform that utilizes advanced technology to provide a truly immersive 3D experience. Beam uses an array of six cameras to capture users from different angles, and then employs AI to merge these video streams and render them on a 3D light field display. This results in a natural and deeply engaging conversational experience, with near-perfect head tracking down to the millimeter and at 60 frames per second.

The first Google Beam devices will be available for early customers later this year, in collaboration with HP. This innovative technology promises to redefine the way we communicate and interact with one another, breaking down barriers and creating a more natural and seamless video experience.

Real-Time Speech Translation in Google Meet

Google is bringing real-time speech translation to Google Meet, helping to break down language barriers. Here's how it works:

When booking a vacation rental in South America, where you don't speak the language, you can turn on speech translation in Google Meet. The translation matches the speaker's tone and expressions, allowing for a natural and free-flowing conversation across languages.

English and Spanish translation is now available for Google Meet subscribers, with more languages rolling out in the coming weeks. This feature leverages Google's underlying technology from the Starline project to provide real-time, high-quality speech translation, bringing us closer to having natural conversations across languages.

Project Astra: An Advanced AI Assistant that Understands the World Around You

Project Astra is an early research project that explores the future capabilities of a universal AI assistant. It can understand the world around you by leveraging camera and screen sharing capabilities.

Some key capabilities of Project Astra include:

Telling you when you're mistaken, such as mistaking a garbage truck for a convertible or thinking palm trees are short when they are actually tall.
Helping with tasks by looking up information, finding relevant videos, and even making phone calls on your behalf.
Integrating with your apps and devices to provide a seamless, hands-free experience. For example, it can add calendar events, control your computer, and more.

Project Astra is being rolled out to everyone on Android and iOS, bringing these advanced AI assistant capabilities to users. The goal is to make interacting with technology more natural and intuitive by leveraging the power of AI to understand the world around you.

Project Mariner: AI Agents with Computer Use Capabilities

Stepping back, we think of agents as systems that combine the intelligence of advanced AI models with access to tools they can take actions on your behalf and under your control. Computer use is an important agentic capability - it's what enables agents to interact with and operate browsers and other software.

Project Mariner was an early step forward in testing computer use capabilities. We released it as an early research prototype in December and have made a lot of progress since:

Multitasking: Mariner can now oversee up to 10 simultaneous tasks.
Teach and Repeat: Mariner can learn a plan for similar tasks after being shown a task once.

We are bringing Mariner's computer use capabilities to developers via the Gemini API. Trusted testers like Automation Anywhere and UiPath are already starting to build with it, and it will be available more broadly this summer.

Computer use is part of a broader set of tools we will need to build for an agent ecosystem to flourish, like our open agent-to-agent protocol and the model context protocol introduced by Anthropic. We are excited to see these technologies work together to make agents even more useful.

Agent Mode: Automating Complex Tasks in the Gemini App

In the Gemini app, we're introducing a powerful new feature called Agent Mode. This mode harnesses the capabilities of our advanced AI models, including Project Mariner, to automate complex tasks on your behalf.

Say you're looking to find an apartment for you and two roommates in Austin, each with a budget of $1,200 per month. You want a place with a washer/dryer or nearby laundromat. Normally, this would require hours of scrolling through endless listings on sites like Zillow.

With Agent Mode, the Gemini app goes to work behind the scenes. It scours listings that match your criteria, using Project Mariner to adjust very specific filters as needed. If it finds an apartment you want to check out, Gemini can even access the listing details and schedule a tour on your behalf.

And it doesn't stop there. Gemini will continue browsing for new listings that meet your requirements, freeing you up to focus on the fun stuff, like planning the housewarming party.

This is just one example of how Agent Mode can streamline your life. Whether you're researching travel options, managing your finances, or tackling any other complex task, Gemini's AI agents are here to assist you, working tirelessly in the background to get the job done.

We're excited to put the power of agents into your hands, and we can't wait to see how you'll use this feature to simplify your day-to-day. Agent Mode is coming soon to Gemini app subscribers, so stay tuned for more updates.

Personalized Context: Bringing Relevant Context Across Google Apps

With your permission, Gemini models can use relevant context across your Google apps in a way that is private, transparent, and fully under your control.

For example, in Gmail, personalized smart replies can now sound like you. If your friend writes asking for advice on a road trip to Utah, Gemini can look up your past notes and emails to generate a helpful reply that captures your typical tone and word choices. It can include details like your recommended driving times and favorite adjectives to make the response more personal.

This personalized context will be available in Gmail this summer for subscribers, allowing you to be a better friend and communicator without having to do all the work yourself. Gemini handles the heavy lifting of finding and applying the relevant information from your past interactions.

The goal is to make your Google apps more intelligent and tailored to you, while maintaining transparency and control over how your personal data is used. You can choose to connect or disconnect this feature at any time.

Gemini 2.5 Flash: A Powerful and Efficient Language Model

Today, I'm thrilled to announce that we're releasing an updated version of Gemini 2.5 Flash, our most efficient workhorse model. The new Flash is better in nearly every dimension, improving across key benchmarks for reasoning, code, and long context. In fact, it's second only to 2.5 Pro on the LM Arena leaderboard.

Gemini Flash has been incredibly popular with developers who love its speed and low cost. The new 2.5 Flash will be generally available in early June, with Pro soon after. This updated model delivers groundbreaking performance, making it the go-to choice for developers who need a powerful yet efficient language model.

Gemini 2.5 Pro Deep Think: Pushing the Boundaries of Model Performance

We've been busy exploring the frontiers of thinking capabilities in Gemini 2.5 as we know from our experience with AlphaGo, responses improve when we give these models more time to think. Today we're making 2.5 Pro even better by introducing a new mode we're calling Deep Think.

Deep Think pushes model performance to its limits, delivering groundbreaking results. It uses our latest cutting-edge research in thinking and reasoning, including parallel techniques. So far, we've seen incredible performance:

It gets an impressive score on the USA-M 2025, currently one of the hardest math benchmarks.
It leads on the Live CodeBench, a difficult benchmark for competition-level coding.
And since Gemini has been natively multimodal from the start, it also excels on the MA-Main benchmark measuring this capability.

Because we're defining the frontier with 2.5 Pro Deep Think, we're taking a little bit of extra time to conduct more frontier safety evaluations and get further input from safety experts. As part of that, we're going to make it available to trusted testers via the Gemini API to get their feedback before making it widely available.

Gemini Diffusion: Revolutionizing Text and Code Generation

Unlike traditional language models that generate text one token at a time, Gemini Diffusion generates text by iteratively refining random noise into coherent outputs. This parallel processing approach makes it significantly faster than previous models.

Gemini Diffusion leverages the power of diffusion modeling, a technique pioneered by Google DeepMind for image and video generation. By applying this approach to text and code generation, Gemini Diffusion excels at tasks like editing, including in the context of math and code.

The parallel generation process allows Gemini Diffusion to iterate on a solution very quickly and error-correct during the generation process. This results in extremely low latency - the version being released today generates text five times faster than even the fastest Gemini model, 2.0 Flashlight, while matching its coding performance.

Gemini Diffusion is currently being tested with a small group, and Google will continue to work on further lowering latency across all Gemini models, with a faster 2.5 Flashlight version coming soon.

Native Audio Output: Expressive and Multilingual Text-to-Speech

In addition to the new 2.5 Flash model, Google is also introducing new previews for text-to-speech. These now have a first-of-its-kind multi-voice support, built on native audio output. This means the model can converse in more expressive ways, capturing the subtle nuances of how we speak. It can even seamlessly switch between languages, all with the same voice.

The new text-to-speech previews work in over 24 languages and can easily transition between them, allowing for more natural and fluid conversations. The model can even switch to a whisper-like tone when appropriate. This advanced text-to-speech capability is a significant step forward, enabling more expressive and multilingual voice interactions in Google's AI applications.

Coding with Gemini 2.5 Pro: Bringing Ideas to Life with AI

As you heard from Demis, Gemini 2.5 Pro is incredible at coding. Let me show you how you can take any idea you have and bring it to life.

If you've ever been to the American Museum of Natural History in New York City, it has a set of amazing exhibits. To bring that to you today, I got 2.5 Pro to code me a simple web app in Google AI Studio to share some photos and learn more.

Here's what I have so far, but I want to make it more interactive and I'm still brainstorming the design, but I've got some ideas.

Standard two-dimensional web design is one thing, but I wanted to make it 3D. Luckily for me, 2.5 Pro can help. I'm going to add the image of the sphere and ask 2.5 Pro to update my code based on the image.

[2 minute pause]

Here's what Gemini generates. We went from that rough sketch directly to code, updating multiple files. It thought for 37 seconds and you can see the changes it made.

We did all of this in AI Studio, so once I finished prototyping, I can simply deploy the code along with my Gemini API key. Here's our final app in Chrome - look at these animations, and I didn't need to have advanced knowledge of 3.js libraries or figure out the complex 3D math to build this.

I can make this experience even richer with multimodality. I used 2.5 Flash to add a question to each photo, inviting you to learn more. And with Gemini's native audio, I can make it talk.

[Audio plays] That's a pangolin, and its scales are made of keratin, just like your fingernails. Wow, now we're talking. You can hear how you can add expressive audio right into your apps.

And before I share more, I'll leave this demo with another fun layout that 2.5 Pro coded just for us.

Jules: The Coding Agent for Effortless Code Maintenance

Google has introduced Jules, a coding agent that can handle complex tasks in large codebases with ease. Jules integrates with GitHub and can tackle a variety of code-related tasks, from fixing bugs to making updates, in a matter of minutes.

Some key capabilities of Jules include:

Automated Code Maintenance: Jules can automatically plan the necessary steps, modify files, and execute updates to keep your codebase up-to-date. For example, it can update an older version of Node.js in your project.
Seamless Integration: Jules integrates directly with GitHub, allowing you to submit tasks and have Jules handle the rest without disrupting your existing workflow.
Efficient Execution: By leveraging the power of Google's Gemini models, Jules can complete complex tasks in a fraction of the time it would take a human developer.
Public Beta Access: Google has announced that Jules is now in public beta, allowing anyone to sign up and start using the coding agent at jewels.google.

With Jules, developers can focus on higher-level tasks and leave the tedious code maintenance work to the AI agent. This can significantly improve productivity and free up time for more strategic initiatives.

AI Overviews: Intelligent and Personalized Google Search

Our Gemini models are helping to make Google search more intelligent, agentic, and personalized. One great example of progress is our AI Overviews.

Since launching at I/O last year, AI Overviews have scaled up to over 1.5 billion users every month in more than 200 countries and territories. As people use AI Overviews, we see they are happier with their results and they search more often. In our biggest markets like the US and India, AI Overviews are driving over 10% growth in the types of queries that show them.

What's particularly exciting is that this growth increases over time. AI Overviews are one of the most successful launches in search in the past decade.

AI Overviews are also one of the strongest drivers of growth for visual searches in Google Lens. Lens grew 65% year-over-year with more than 100 billion visual searches already this year. People are asking more queries and they're also asking more complex queries with our latest Gemini models.

Our AI Overviews are at the quality and accuracy you've come to expect from search and are the fastest in the industry. For those who want an end-to-end AI search experience, we are introducing an all-new AI Mode. It's a total reimagining of search with more advanced reasoning capabilities powered by Gemini 2.5.

AI Mode: The Future of Search, Powered by Gemini

Google is introducing a revolutionary new search experience called AI Mode, powered by their advanced Gemini AI models. AI Mode represents a total reimagining of search, offering users more intelligent, personalized, and capable search capabilities.

Some key features of AI Mode include:

Query Fanout and Deep Search

AI Mode utilizes a technique called "query fanout" to break down complex queries into multiple sub-queries, searching across the web and various data sources to provide comprehensive, expert-level responses. The "deep search" capability can issue hundreds of searches to create fully cited, in-depth reports on topics.

Personal Context

By connecting to users' Google accounts, AI Mode can leverage personal data and history to provide highly customized recommendations and suggestions, always under the user's control.

Agentic Capabilities

Integrating Project Mariner's agentic abilities, AI Mode can now perform tasks on the user's behalf, such as finding and booking event tickets, without the user having to manually navigate multiple websites.

Multimodal Search

AI Mode seamlessly combines text, images, and even live camera feeds to provide the most relevant and helpful information for a wide range of queries, from DIY projects to sports analysis.

Advanced Reasoning and Visualization

Powered by Gemini 2.5 models, AI Mode can tackle complex, open-ended questions, providing dynamic visualizations and data analysis to deliver insightful responses.

Google is rolling out AI Mode to all users in the US starting today, providing a glimpse into the future of search - an intelligent, personalized, and capable assistant that can handle any query with ease.

Personal Context: Customized Recommendations in AI Mode

Starting in Labs soon, AI mode will be able to make your responses even more helpful with personalized suggestions based on your past searches. You can opt in to connect other Google apps, starting with Gmail, to enable personal context.

With personal context, AI mode can leverage relevant information from your Gmail, such as recent restaurant bookings, subscriptions, and travel plans, to provide customized recommendations. For example, based on your recent searches and bookings, AI mode may suggest outdoor seating options and nearby art exhibits that align with your preferences and upcoming trip to Nashville.

Personal context in AI mode is always under your control - you can choose to connect or disconnect at any time. This feature aims to make search truly yours, with recommendations tailored specifically for you.

Deep Search: Thorough, Expert-Level Responses

For questions when you want an even more thorough response, Google is bringing deep research capabilities into AI mode. You already come to search today to really unpack a topic, but this brings it to a much deeper level.

Google is calling this "deep search". Deep search uses the same query fan-out technique, but multiplied. It can issue dozens or even hundreds of searches on your behalf. It reasons across all those disparate pieces of information to create an expert-level, fully cited report in just minutes. It includes links to the web throughout, so you can easily explore and take action.

This is a core part of how Google has built AI mode overall, and how they've always thought about AI and search. They believe AI will be the most powerful engine for discovery that the web has ever seen, helping people discover even more of what the web has to offer and find incredible, hyper-relevant content.

Agentic Capabilities in AI Mode: Ticket Booking and Price Tracking

Search can now take work off your plate while still under your control. With the new agentic capabilities in AI mode, you can simply say "Find two affordable tickets for this Saturday's Reds game in the lower level."

Search kicks off a query fanout, looking across several sites to analyze hundreds of potential ticket options. It does the tedious work of filling in forms with all the criteria you asked for, and puts it all together, reasoning across the results to analyze real-time pricing and inventory.

The task is then complete, and you get great ticket options with helpful context, so you can make an informed decision. The seats have a good view and are at a reasonable price. Search helps you skip a bunch of steps, linking you right to finish checking out. Ticket secured.

Additionally, Search can now help you track prices. You can set a target price for an item, and Search will continuously check websites where it's available, then notify you when the price drops. You can then choose to review the details or just let the agent complete the purchase for you.

These new agentic capabilities in AI mode make search more intelligent and personalized, taking work off your hands while still keeping you in control.

Search Live: Multimodal Search with Live Camera and Screen Sharing

Google is introducing a new feature called Search Live, which brings the capabilities of Project Astra's live camera and screen sharing into the search experience. With Search Live, users can use their camera and screen to interact with search in real-time, making it easier to get help with tasks like DIY home repairs, school assignments, or learning new skills.

Some key highlights of Search Live:

Users can use their camera to show what they're working on, and search can provide helpful information in real-time, like a video call with search.
Screen sharing allows users to share their screen with search, so the AI can see what they're looking at and provide more contextual assistance.
Search Live works across over 45 languages and 150 countries, making it a globally accessible feature.
Conversations with Search Live are 5x longer than text-based interactions, as users can more naturally discuss and work through their queries.
Search Live integrates with other Google apps like Calendar, Maps, and Keep, allowing users to take actions directly from the search experience.

Overall, Search Live represents a significant step forward in making search more interactive and multimodal, allowing users to leverage visual information and real-time collaboration to get the help they need.

AI Try-On: Visualizing Apparel in the Gemini App

Google is bringing advanced 3D shape understanding to the Gemini app, allowing users to better visualize how clothing will look on their own body. This AI-powered try-on experience works with your photo, not a pre-captured image or model.

The AI model is able to show how the material will fold, stretch, and drape on the user's body shape. This state-of-the-art technology allows visualizing billions of apparel products on a wide variety of people.

Users can now get a realistic sense of how a dress or other clothing item might look on them before making a purchase. The Gemini app can also help find the desired item at the target price, with a new agentic checkout feature to continuously track prices and notify the user when it drops.

This AI-powered try-on and shopping experience is a game-changer, making online apparel shopping more personalized and efficient for users.

Gemini Live: Conversational AI Assistance with Camera and Screen Sharing

Gemini Live is a powerful AI assistant that combines natural language conversations with camera and screen sharing capabilities. Some key features of Gemini Live include:

Conversational AI: Gemini Live enables intuitive, interactive conversations in over 45 languages across 150+ countries. The conversations are 5x longer than typical text-based interactions.
Camera and Screen Sharing: Users can leverage their device's camera and screen to engage with Gemini Live, enabling hands-on assistance for tasks like DIY projects, homework help, and more.
Integrations: Gemini Live can be connected to popular apps like Calendar, Maps, Keep, and Tasks, allowing users to seamlessly complete tasks without switching between apps.
Continuous Improvements: Gemini Live's capabilities are continuously expanded through Project Astra, with new features like visual conceptualization and AI filmmaking being added over time.

Gemini Live is available for free in the Gemini app on Android and iOS, providing an engaging, multimodal AI assistant experience for users.

Imagine 4: Powerful Image Generation in the Gemini App

Starting today, we're bringing our latest and most capable image generation model, Imagine 4, into the Gemini app. Imagine 4 represents a big leap forward in image quality and capabilities.

The images generated by Imagine 4 are richer, with more nuanced colors and fine-grained details. The shadows, water droplets, and other subtle elements come through beautifully in the photos.

Imagine 4 has also significantly improved at text and typography. In the past, adding text to generated images didn't always work quite right. But now, Imagine 4 makes creative choices like using dinosaur bones in the font or adjusting the spacing and layout to create visually striking text-based designs, as seen in the music festival poster example.

In addition to the higher image quality, Imagine 4 is also much faster, with a super-fast variant that is 10 times faster than the previous model. This allows you to quickly iterate through many ideas and find the perfect design.

With Gemini's native image generation capabilities, you can easily edit and refine the generated images right within the app. The combination of powerful image generation and seamless editing makes Imagine 4 a game-changer for creating visuals, posters, and more.

We're excited to put this latest advancement in image generation technology into the hands of Gemini app users today.

VO3: The Next Generation of AI-Powered Video Creation

Google has unveiled VO3, their latest state-of-the-art video generation model that takes video creation to new heights. VO3 builds upon the success of its predecessor, V2, by introducing several groundbreaking capabilities:

Native Audio Generation: VO3 can now generate sound effects, background sounds, and even dialogue for your videos. This means your characters can speak naturally, adding a new level of immersion to your creations.
Enhanced Visual Quality: The visual quality of VO3-generated videos has been further improved, with even better understanding of physics and more realistic rendering.
Seamless Integration with Imagine 4: VO3 seamlessly integrates with Google's latest image generation model, Imagine 4, allowing you to easily incorporate high-quality visuals into your videos.

To showcase the power of VO3, Google has introduced a new AI filmmaking tool called "Flow". Flow combines the best of VO3, Imagine 4, and Gemini, empowering creators to bring their ideas to life with unprecedented ease and creativity.

With Flow, you can:

Easily upload or generate custom images and assets using Imagine 4.
Assemble your scenes and shots with a single prompt, and Flow will handle the camera work and character consistency.
Extend or trim your clips as needed, iterating until you achieve the perfect result.
Download your final video, complete with the native audio generated by VO3.

The integration of VO3's audio capabilities with the visual prowess of Imagine 4 and the intelligent automation of Flow creates a truly transformative experience for video creators. No longer are you limited by technical constraints; VO3 and Flow empower you to focus on your creative vision, letting the AI handle the heavy lifting.

Google's announcement of VO3 and the Flow filmmaking tool marks a significant milestone in the evolution of AI-powered content creation. By seamlessly blending advanced video, audio, and image generation, they are redefining the boundaries of what's possible in the world of digital storytelling.

Flow: A Creative Tool for AI Filmmaking

Google has introduced a new AI filmmaking tool called Flow, which combines the best of V3, Imagine, and Gemini to create a powerful platform for creatives. Flow is inspired by the magical feeling of getting lost in the creative zone, where time slows down and the pieces start falling into place.

With Flow, users can easily upload their own images or generate them on the fly using Imagine, which is built right into the tool. They can then assemble the clips together with a single prompt, describing what they want, including precise camera controls. Flow puts everything in place, ensuring consistency in characters and scenes.

The tool also allows users to iterate and refine their work, with the ability to go back and trim or extend clips as needed. Once the clips are assembled, users can download the files and bring them into their favorite editing software, adding music from Lyria to complete the final product.

One of the standout features of Flow is its integration with V3, Google's latest state-of-the-art video generation model. V3 not only offers improved visual quality and understanding of physics, but it also comes with native audio generation. This means that users can prompt their characters to speak, with V3 generating the necessary sound effects, background sounds, and dialogue.

The combination of these powerful AI tools within the Flow platform allows creators to explore infinite possibilities, with the narrative growing naturally and spontaneously, rather than being forced. It's a tool that empowers creatives to find their path, rather than trying to build it brick by brick.

Overall, Flow represents a significant step forward in the field of AI-powered filmmaking, offering a seamless and intuitive way for creators to bring their visions to life.

Conclusion

Google's announcements at this year's I/O event showcase their continued advancements in AI technology. Some key highlights include:

Google Beam: A new 3D video communication platform that provides a more immersive and natural conversational experience.
Real-time Speech Translation: The ability to translate conversations between different languages in real-time, breaking down language barriers.
Project Astra: An advanced AI assistant that can understand the world around you and assist with a wide range of tasks.
Project Mariner: An AI agent that can interact with and operate browsers and other software, enabling new levels of automation and productivity.
Gemini 2.5 Flash: A highly efficient and performant language model that outperforms other AI models in various benchmarks.
Gemini Diffusion: An experimental research model that applies diffusion modeling to text and code generation, enabling faster and more efficient text production.
Native Audio Output: Expressive and natural-sounding text-to-speech capabilities in over 24 languages.
AI Mode in Google Search: A reimagined search experience powered by Gemini 2.5, offering advanced reasoning, personalization, and new ways to interact with search.
Search Live: Integrating Project Astra's live capabilities into Google Search, allowing users to interact with search using their camera and voice.
Imagine 4 and V3: Significant advancements in image and video generation, including the ability to generate native audio and dialogue.

These innovations showcase Google's commitment to pushing the boundaries of AI and making it more accessible and useful for a wide range of applications and user experiences.

Часто задаваемые вопросы

What is Google Beam?

What is the real-time speech translation feature in Google Meet?

What is Project Astra?

What is Project Mariner?

What is personal context in Gemini models?

What is Gemini 2.5 Flash?

What is Gemini 2.5 Pro Deep Think?

What is Gemini Diffusion?

What are the new text-to-speech capabilities in Gemini?

What is Jules, the coding agent released by Google Gemini?

What is AI mode in Google Search?

What is personal context in AI mode?

What is deep search in AI mode?

What are the agentic capabilities in AI mode?

What is Search Live in AI mode?

What is the AI try-on feature in Google's shopping experience?

What is the price tracking feature in Google's shopping experience?

What are the new features in Gemini Live?

What is Imagine 4, the new image generation model in the Gemini app?

What is VO3, the new video generation model in the Gemini app?

What is the Flow tool in the Gemini app?