Gemini 3.5 Flash Lands - Google's Cheaper Tier Outperforms Its Own 3.1 Pro Flagship

0:00 / 1:00

Sources

X mail Email auto_stories NbLM

Actions

smart_display VIEW ON YOUTUBE arrow_back BACK TO NEWS

News

Gemini 3.5 Flash Lands - Google's Cheaper Tier Outperforms Its Own 3.1 Pro Flagship

calendar_today Date: MAY 20, 2026

schedule Duration: 1:00

visibility Views: 96

database

Summary Report

Google DeepMind has launched Gemini 3.5 Flash, the first model in the new 3.5 family. The cheaper Flash tier already outperforms Gemini 3.1 Pro, Google's flagship from February, on coding and agentic benchmarks.

01. Gemini 3.5 Flash is the first model in Google's new 3.5 family, live today in the Gemini app and AI Mode in Search.
02. It scores 76.2 percent on Terminal-Bench 2.1, beating Gemini 3.1 Pro's 70.3 percent on coding.
03. Runs at 289 tokens per second, around four times faster than other frontier models.
04. API pricing is roughly 40 percent cheaper than 3.1 Pro at 1.50 dollars input and 9 dollars output per million tokens.
05. Google is positioning the model around agentic work - codebase planning, parallel subagents, and long-horizon tasks. Gemini 3.5 Pro is delayed to June.

Google DeepMind has released Gemini 3.5 Flash, the first model in its new 3.5 family that the company positions as combining frontier intelligence with practical applications. The release creates an unusual situation where Google's cheaper Flash tier now outperforms its own flagship 3.1 Pro model, which launched just months earlier in February. On Terminal-Bench 2.1, Gemini 3.5 Flash achieves a score of 76.2%, compared to 3.1 Pro's 70.3%. The performance gains extend beyond accuracy, with Flash delivering 289 tokens per second—roughly four times faster than competing frontier models—whilst the API costs approximately 40% less than the model it has just surpassed. Google's strategic focus has shifted markedly towards agentic capabilities rather than traditional chatbot interactions. The company emphasises Flash's ability to plan across large codebases, spawn subagents for parallel task execution, and maintain context over extended periods. This positioning suggests Google views the model as an autonomous coding teammate rather than a conversational assistant. The timing raises questions about Google's model hierarchy, particularly with Gemini 3.5 Pro delayed until June. If the Flash variant can already exceed the previous Pro model's performance at a fraction of the cost, industry observers will be watching closely to see what capabilities the Pro version delivers when it arrives next month.