China's Navy Deploys AI to Eliminate Air Defense Blind Spots on New FrigateDeepSeek V4 to Launch on Huawei Chips With One Trillion ParametersFoxconn Posts Record Q1 Revenue as AI Server Demand Surges 30 PercentAsia's AI Boom Faces Its First Real Stress Test as Iran War Disrupts Energy and ChipsThe Physical AI Era Is Here: Why Robots Are Moving From Simulation to Factory FloorsAI Captured 80 Percent of Global Venture Funding in Q1 2026 — What That Means for Everything ElseAI Virtual Try-On Startups Take On Retail's Multibillion-Dollar Returns ProblemEclipse Raises $1.3 Billion to Build the 'Physical AI' EconomyChina's Navy Deploys AI to Eliminate Air Defense Blind Spots on New FrigateDeepSeek V4 to Launch on Huawei Chips With One Trillion ParametersFoxconn Posts Record Q1 Revenue as AI Server Demand Surges 30 PercentAsia's AI Boom Faces Its First Real Stress Test as Iran War Disrupts Energy and ChipsThe Physical AI Era Is Here: Why Robots Are Moving From Simulation to Factory FloorsAI Captured 80 Percent of Global Venture Funding in Q1 2026 — What That Means for Everything ElseAI Virtual Try-On Startups Take On Retail's Multibillion-Dollar Returns ProblemEclipse Raises $1.3 Billion to Build the 'Physical AI' EconomyChina's Navy Deploys AI to Eliminate Air Defense Blind Spots on New FrigateDeepSeek V4 to Launch on Huawei Chips With One Trillion ParametersFoxconn Posts Record Q1 Revenue as AI Server Demand Surges 30 PercentAsia's AI Boom Faces Its First Real Stress Test as Iran War Disrupts Energy and ChipsThe Physical AI Era Is Here: Why Robots Are Moving From Simulation to Factory FloorsAI Captured 80 Percent of Global Venture Funding in Q1 2026 — What That Means for Everything ElseAI Virtual Try-On Startups Take On Retail's Multibillion-Dollar Returns ProblemEclipse Raises $1.3 Billion to Build the 'Physical AI' Economy
Google Gemini 3.1 Flash-Lite promotional graphic
Google
News

Google Launches Gemini 3.1 Flash-Lite, Its Fastest and Cheapest AI Model Yet

The efficiency-focused model delivers 2.5x faster response times at just $0.25 per million input tokens, matching Gemini 2.5 Flash quality.

M
Maya SantosSenior Reporter
3 min read

Google has released Gemini 3.1 Flash-Lite in preview, a new efficiency-focused model that represents its cheapest and fastest offering yet. The model is available to developers through the Gemini API in Google AI Studio and to enterprises through Vertex AI.

Speed and Cost

The numbers are striking. Gemini 3.1 Flash-Lite delivers a 2.5x faster time-to-first-answer-token and 45 percent faster output generation compared to earlier Gemini versions, according to benchmarks from Artificial Analysis. It is priced at $0.25 per million input tokens and $1.50 per million output tokens — roughly one-eighth the cost of Gemini Pro.

Despite the dramatic cost reduction, Google says the model matches Gemini 2.5 Flash performance across key capability areas, representing a significant quality improvement over both Gemini 2.0 Flash-Lite and Gemini 2.5 Flash-Lite predecessors.

New Capabilities

The model introduces several notable features. Expanded thinking support allows developers to control how much reasoning the model performs by choosing from minimal, low, medium, or high thinking levels — offering a tunable tradeoff between response quality and latency.

Improved instruction following makes the model a more reliable option for complex chatbot and workflow applications. Enhanced audio input quality improves tasks like automated speech recognition, broadening the model's applicability beyond text-only workloads.

Targeting the Cost-Sensitive Tier

Google is explicitly positioning Flash-Lite for high-volume, cost-sensitive use cases: translation, content moderation, user interface generation, and simulation. These are workloads where per-token costs matter enormously at scale, and where the model's speed advantage translates directly into lower infrastructure costs.

Competitive Context

The release arrives as the AI model market increasingly bifurcates between frontier capability models and efficiency-optimized alternatives. With OpenAI, Anthropic, and DeepSeek all competing on both fronts, Google's aggressive pricing signals a willingness to compete on cost — particularly for the enterprise deployment tier where millions of API calls per day make per-token economics decisive.

The model is currently in preview, with general availability expected in the coming weeks.

Newsletter

Get Lanceum in your inbox

Weekly insights on AI and technology in Asia.

Share

More in News

Lanceum

Independent coverage of AI and technology across Asia. We go beyond headlines to explain what matters.

Colophon

Typeset in Space Grotesk & DM Serif Display. Built with Nuxt & Tailwind. Powered by curiosity.

© 2026 Lanceum. All rights reserved.

Independent • Rigorous • Asia-Focused