Google Launches Gemini 3.1 Flash-Lite, Its Fastest and Cheapest AI Model Yet

Google has released Gemini 3.1 Flash-Lite in preview, a new efficiency-focused model that represents its cheapest and fastest offering yet. The model is available to developers through the Gemini API in Google AI Studio and to enterprises through Vertex AI.

Speed and Cost

The numbers are striking. Gemini 3.1 Flash-Lite delivers a 2.5x faster time-to-first-answer-token and 45 percent faster output generation compared to earlier Gemini versions, according to benchmarks from Artificial Analysis. It is priced at $0.25 per million input tokens and $1.50 per million output tokens — roughly one-eighth the cost of Gemini Pro.

Despite the dramatic cost reduction, Google says the model matches Gemini 2.5 Flash performance across key capability areas, representing a significant quality improvement over both Gemini 2.0 Flash-Lite and Gemini 2.5 Flash-Lite predecessors.

New Capabilities

The model introduces several notable features. Expanded thinking support allows developers to control how much reasoning the model performs by choosing from minimal, low, medium, or high thinking levels — offering a tunable tradeoff between response quality and latency.

Improved instruction following makes the model a more reliable option for complex chatbot and workflow applications. Enhanced audio input quality improves tasks like automated speech recognition, broadening the model's applicability beyond text-only workloads.

Targeting the Cost-Sensitive Tier

Google is explicitly positioning Flash-Lite for high-volume, cost-sensitive use cases: translation, content moderation, user interface generation, and simulation. These are workloads where per-token costs matter enormously at scale, and where the model's speed advantage translates directly into lower infrastructure costs.

Competitive Context

The release arrives as the AI model market increasingly bifurcates between frontier capability models and efficiency-optimized alternatives. With OpenAI, Anthropic, and DeepSeek all competing on both fronts, Google's aggressive pricing signals a willingness to compete on cost — particularly for the enterprise deployment tier where millions of API calls per day make per-token economics decisive.

The model is currently in preview, with general availability expected in the coming weeks.

Google Launches Gemini 3.1 Flash-Lite, Its Fastest and Cheapest AI Model Yet

Speed and Cost

New Capabilities

Targeting the Cost-Sensitive Tier

Competitive Context

Get Lanceum in your inbox

More in News

China's Navy Deploys AI to Eliminate Air Defense Blind Spots on New Frigate

DeepSeek V4 to Launch on Huawei Chips With One Trillion Parameters

Foxconn Posts Record Q1 Revenue as AI Server Demand Surges 30 Percent