Google Announces Gemini 1.5, Gemini 1.5 Pro multimodal generative AI model available now
Google has announced Gemini 1.5, the next generation of its flagship multimodal generative AI model family.

Google has announced Gemini 1.5, the next-generation version of its flagship multimodal generative AI model line. Only one week after making Gemini 1.0 Ultra generally available to the public, Google wasted no time unveiling future iterations of Gemini. The announcement comes as competition rapidly accelerates across AI rivals like OpenAI, Microsoft, and Amazon. Google also rebranded several generative AI tools, like Bard and Duet AI, to Gemini for uniformity across its cloud solutions.

Gemini 1.5 supports up to 1 million tokens

Gemini 1.0 supports three size factors: Nano, Pro, and Ultra. With today’s announcement, Google is explicitly focusing on Gemini 1.5 Pro and the significant advances it delivers.

Google states that Gemini 1.5 Pro supports a context window of up to 1 million tokens, a substantial increase from 32,000 with Gemini 1.0 Pro. The 1 million tokens context window is an experimental feature for private preview with select developers and enterprise customers. Gemini 1.5 Pro comes with a standard 128,000 token context window.

Gemini 1.5 Pro is the “sweet spot” between the three Gemini model sizes, and Gemini 1.5 Pro performs at a similar level to 1.0 Ultra. Google classifies Gemini 1.5 Pro as a “mid-size” multimodal model, yet is the company’s largest model in size, capabilities, and performance to date.

The new model is more efficient to train and serve, using a new Mixture-of-Experts (MoE) architecture and Transformers. Google DeepMind researchers explain that “while traditional Transformer functions as one large neural network, MoE models are divided into smaller ‘expert’ neural networks.”

As a result, MoE models learn to selectively activate only the most relevant expert pathways in their neural network, significantly improving model efficiency and performance.

A Gemini 1.5 Pro demo of long context understanding using a 44-minute silent Buster Keaton movie, Sherlock Jr., and a series of multimodal prompts. (source: YouTube/Google)

Complex reasoning, understanding across modalities, problem-solving with coding

Other significant upgrades to the Gemini family with 1.5 Pro are improvements across its abilities for complex reasoning tasks across modalities, problem-solving with dense blocks of code, and greater context.

A model’s context window comprises tokens, which can be entire parts or subsets of words, images, videos, audio, or code. The larger the model’s context window, the more information it can process for any prompt.

Google states that Gemini 1.5 Pro supports 1 hour of video, 11 hours of audio, and codebases with over 30,000 lines of code or over 700,000 words in a single prompt.

When provided more than 100,000 lines of code, it can better reason across examples, offer suggestions for code modifications, or explain how different portions of the code work.

How to access Google Gemini 1.5 Pro

Today, Google is limiting access to Gemini 1.5 Pro for private preview. Invited and approved developers and enterprise customers can access 1.5 Pro via AI Studio and Vertex AI. Early testers can experiment with 1.5 Pro at no cost.

Disclaimer: The author of this article is a current employee of Google. This article does not represent the views or opinions of his employer and is not meant to be an official statement for Google, or Google Cloud.

You May Also Like

Israel Cybersecurity Industry Working through Wartime

The rapidly escalating war between Israel and Hamas is forcing the Israeli…

Google Invests $2 Billion in AI Startup Anthropic

Google has announced it is investing $2 billion in artificial intelligence (AI)…

U.S. Department of Defense plans workforce to be AI Ready by 2025

The United States Department of Defense (DOD) is outlining its path to…