In response to a global shortage of computing resources, Google has implemented token-based limits on its Gemini AI platform since May 2026. This move caps how much free and paid AI usage is available, affecting users worldwide and signaling tighter access in India’s growing AI market.
- Gemini AI now uses token-based compute limits tied to usage complexity and model choice.
- Major enterprise customers like Meta faced early caps due to compute shortages.
- India’s free-access AI growth phase transitions to monetised, metered compute usage.
What happened
Since May 17, 2026, Google has enforced token-based compute limits on its Gemini AI apps, which restrict the volume of AI interactions based on factors like prompt complexity and model selected. These limits refresh every five hours and accumulate up to weekly caps, marking a shift from previous, more generous free and paid usage models. This development reflects an ongoing capacity shortage in Google's underlying computing infrastructure that has been constraining enterprise clients for months.
The shortage was first revealed when Google had to cap Meta’s access to Gemini models in March 2026 due to insufficient capacity, hindering Meta’s internal AI initiatives. Other customers experienced restrictions to a lesser extent. Google CEO Sundar Pichai confirmed near-term compute constraints during the company's Q1 2026 earnings announcement, highlighting a growing backlog of cloud contracts that exceed the company’s ability to deliver the necessary compute resources.
Why it matters
The move to token-based limits represents a broader shift in AI economics from open free access to metered, paid models that directly reflect the costly infrastructure required to operate advanced AI systems. Providers like Google now face significant expenses associated with inference computation, driven by the volume and complexity of user prompts. This scarcity funnels demand and spending to more efficient models and heavier users bear higher costs, reshaping consumer expectations around free AI availability.
For India, this transition poses particular implications. Google's early AI adoption strategy in India relied heavily on free access to build scale, similar to earlier telecom market approaches. Compute-based rationing marks the beginning of a monetisation phase that could limit widespread usage or increase costs for Indian users. Moreover, India’s own investments in domestic AI compute capacity lag far behind commitments made by global cloud providers, raising concerns about national digital autonomy and reliance on foreign compute infrastructure.
What to watch next
Observers should monitor how Google adjusts token limits for different user segments and whether further tier differentiation emerges to balance demand with infrastructure constraints. The resilience and competitiveness of AI offerings in India will also depend on domestic compute investments, policy incentives to foster local capacity, and responses from other major AI players facing similar compute shortages.
Additionally, ongoing partnerships leasing compute from entities like SpaceX highlight a structural change in how AI companies source resources. This external reliance could influence pricing, access, and control over AI infrastructure globally. India’s ability to build sufficient compute capacity locally will be critical to avoid dependency risks and ensure equitable AI access as global providers increasingly ration compute amidst surging demand.