A year ago on the I/O stage Goolge first shared plans for Gemini, a family of natively multimodal AI models that could reason across text, images, video, code, and more. When we launched the Gemini era in December 2023, it marked a big step in turning any input into any output — an “I/O” for a new generation.
Google I/O showed how Google is fully in the Gemini era, showcasing AI innovations across products, research, and infrastructure, and how it brings us close to our ultimate goal of making AI helpful for everyone. Sundar Blog Post.
All Google products with more than two billion users are built with Gemini. Today we shared how that helps us to create new experiences and make our products even more helpful:
-
Expanding AI Overviews in Search. With a new customised Gemini model — capable of multi-step reasoning, planning and multimodality — combined with best-in-class Search you’ll soon be able to ask complex, multi-step questions, customise search results, and even ask questions with videos [Blog Post]
-
Introducing Ask Photos. Over six billion photos are uploaded every day to Google Photos. With Gemini’s multimodal capabilities, we’re redefining how you can search your photos and videos. Want to find a specific memory or recall information hidden in your gallery? Just Ask Photos. [Blog Post]
-
New ways to engage with Gemini in Workspace. Gemini’s capabilities will expand to more users and integrate into the side panel of Gmail, Docs, Drive, Slides, and Sheets. Gemini features will also be added to the Gmail mobile app. [Blog Post]
-
Gemini for Android. We’re building AI right into the Android operating system. Students can now get homework help by circling problems with Circle to Search. And Gemini’s overlay will provide dynamic suggestions related to what’s on your screen — summarise a PDF or “ask this video” — while TalkBack with Gemini will be capable of even more detailed image descriptions. [Blog Post]
We’re also bringing Gemini 1.5 Pro to Gemini Advanced subscribers in more than 35 languages, along with a 1 million token context window — the longest of any widely available consumer chatbot in the world. This means it can understand more information than ever before, like a 1500-page PDF and soon, 30,000 lines of code and an hour-long video.
-
Gemini Advanced subscribers will also soon get access to Live, a new mobile conversational experience. With Live, you can talk to Gemini and choose from different natural-sounding voices. You can speak at your own pace and even interrupt with questions, making conversations more intuitive. [Blog Post]
All of this work is underpinned by our technical leadership in building the world’s most advanced AI.
And we’re looking ahead too, to the next models: we shared more details about Gemini 1.5 Flash, a more cost-efficient model built based on user feedback, with lower latencies; and Project Astra, our vision for the next generation of AI assistants, a responsive agent that can understand and react to the context of conversations. [Blog Post]
We’ve also been working closely with the creative community to explore how generative AI can best support the creative process, and to make sure our AI tools are as useful as possible at each stage:
-
Today, we’re introducing Veo, our most capable model for generating high-definition video, and Imagen 3, our highest quality text-to-image model. We’re also sharing new demo recordings — with global artists — created with our Music AI Sandbox. [Blog Post]
Of course, these advancements in AI are only made possible by truly cutting-edge infrastructure technology. Training state-of-the-art models requires a lot of computing power.
-
Today we unveiled the 6th generation of our TPUs, called Trillium, which delivers a 4.7x improvement in compute performance per chip over the previous generation, TPU v5e, and which we’ll make available to Cloud customers later this year. [Blog Post]
Bold innovation must be underpinned by responsible innovation. So we’re developing a cutting-edge technique we call AI-assisted red teaming, that draws on Google DeepMind’s gaming breakthroughs like AlphaGo and expanding our technical watermarking innovations like SynthID in two new modalities — text and video — so AI-generated content is easier to identify. [Blog Post]
By using the power of Gemini, we plan to make AI useful for everyone. Google’s mission is to organise the world’s information across every input, making it accessible via any output, and combining the world’s information, with the information in YOUR world, in a way that’s truly useful for you. Gemini will help us towards that goal.