
Google Launches Gemini 3.1 Flash-Lite, Its Fastest and Cheapest AI Model Yet
The efficiency-focused model delivers 2.5x faster response times at just $0.25 per million input tokens, matching Gemini 2.5 Flash quality.
Google has released Gemini 3.1 Flash-Lite in preview, a new efficiency-focused model that represents its cheapest and fastest offering yet. The model is available to developers through the Gemini API in Google AI Studio and to enterprises through Vertex AI.
Speed and Cost
The numbers are striking. Gemini 3.1 Flash-Lite delivers a 2.5x faster time-to-first-answer-token and 45 percent faster output generation compared to earlier Gemini versions, according to benchmarks from Artificial Analysis. It is priced at $0.25 per million input tokens and $1.50 per million output tokens — roughly one-eighth the cost of Gemini Pro.
Despite the dramatic cost reduction, Google says the model matches Gemini 2.5 Flash performance across key capability areas, representing a significant quality improvement over both Gemini 2.0 Flash-Lite and Gemini 2.5 Flash-Lite predecessors.
New Capabilities
The model introduces several notable features. Expanded thinking support allows developers to control how much reasoning the model performs by choosing from minimal, low, medium, or high thinking levels — offering a tunable tradeoff between response quality and latency.
Improved instruction following makes the model a more reliable option for complex chatbot and workflow applications. Enhanced audio input quality improves tasks like automated speech recognition, broadening the model's applicability beyond text-only workloads.
Targeting the Cost-Sensitive Tier
Google is explicitly positioning Flash-Lite for high-volume, cost-sensitive use cases: translation, content moderation, user interface generation, and simulation. These are workloads where per-token costs matter enormously at scale, and where the model's speed advantage translates directly into lower infrastructure costs.
Competitive Context
The release arrives as the AI model market increasingly bifurcates between frontier capability models and efficiency-optimized alternatives. With OpenAI, Anthropic, and DeepSeek all competing on both fronts, Google's aggressive pricing signals a willingness to compete on cost — particularly for the enterprise deployment tier where millions of API calls per day make per-token economics decisive.
The model is currently in preview, with general availability expected in the coming weeks.
Newsletter
Get Lanceum in your inbox
Weekly insights on AI and technology in Asia.


