Every AI interaction your organisation runs consumes tokens. A token is the unit of work that AI models process: roughly three quarters of a word for English text. When your team asks Copilot to summarise a meeting, it consumes tokens. When an AI agent reasons across your SharePoint documents, it consumes tokens. When a workflow routes an approval through a language model, tokens are consumed and billed.
Tokens are the electricity bill of AI. They are variable, consumption-based and scaling with every new use case your organisation enables. The difference between tokens and electricity is that token costs are falling at 10 to 200 times per year, depending on the benchmark, while total enterprise AI spending rose 320 per cent in 2025. Both of those facts are true simultaneously. Understanding why is the starting point for every CEO who wants to govern AI as an operating cost rather than manage it as a software subscription.
For the first time in the history of knowledge work, the cost of producing a unit of cognitive output is measurable, comparable and declining on a predictable curve. A token is that unit. The pricing is transparent. The cost trajectory is documented.
Stanford's AI Index 2025 found that the inference cost for a system performing at the level of GPT-3.5 dropped over 280-fold between November 2022 and October 2024. Epoch AI, analysing six benchmarks independently, found that the cost to run a model at a fixed performance level has been halving every two months. Andreessen Horowitz coined the term "LLMflation" to describe the dynamic: for models of equivalent capability, inference cost is decreasing by approximately 10 times every year.
The practical consequence is visible in the pricing. A frontier reasoning model costs between $1.25 and $15.00 per million input tokens today. An efficiency model costs between $0.05 and $1.00 per million input tokens. The 375-fold price differential between the cheapest and most expensive option means model selection is a financial decision with material budget impact, and most organisations are making that decision by default rather than by design.
| Task | AI cost | Human cost |
|---|---|---|
| Customer service inquiry | $0.25 – $2.00 | $5.00 – $15.00 |
| Document drafting (per page) | $0.02 – $0.15 | $35.00 – $80.00 |
| Data synthesis (per report) | $0.50 – $5.00 | $200.00 – $500.00 |
| Code review (per function) | $0.01 – $0.10 | $40.00 – $120.00 |
These figures carry an important caveat. The AI cost column reflects token consumption alone. It does not include the human time required for prompt design, quality review, hallucination remediation or governance oversight. MIT research found that only 23 per cent of the wages currently paid for AI-automatable tasks represent work that is cost-effective to automate today, once the full workflow is accounted for. The gap is closing as token costs fall, but it remains real. The honest comparison is total workflow cost, not token cost in isolation.
The cost of producing a unit of knowledge work via AI is now measurable, declining on a documented curve and lower than the equivalent human cost for a growing range of tasks. This is a new economic reality. It requires a new operating model response.
This is the fact that catches every CFO off guard. Token prices fell roughly 1,000-fold over the past three years. Total enterprise spending on AI inference surged 320 per cent in 2025. Both are true. The dynamic has a name: the Jevons Paradox.
William Stanley Jevons observed in 1865 that as steam engines became more efficient, total coal consumption rose rather than fell, because cheaper energy made new applications economically viable. The same dynamic is now operating in AI. As inference becomes cheaper, organisations deploy AI in more workflows, use more capable models with longer reasoning chains, build multi-agent systems that consume tokens at scale, and expand from pilot teams to entire functions. Demand has risen approximately 10,000-fold while unit costs fell 1,000-fold. The net result is a significant increase in total spending.
A January 2026 paper by Zhang and Zhang formalised this as a "Structural Jevons Paradox": falling API prices systematically induce deeper reasoning loops, larger context windows and tool-augmented multi-agent workflows that multiply token consumption per task. The more capable and affordable the model, the more tokens each interaction consumes.
Gartner's March 2026 forecast is explicit: while lower token unit costs will enable more advanced AI capabilities, those advancements will drive disproportionately higher token demand. As token consumption rises faster than token costs fall, overall inference costs are expected to increase.
The FinOps Foundation reports that 63 per cent of enterprises exceed their AI budgets by at least 30 per cent within the first year of deployment. Beyond the initial estimate, organisations typically spend 40 to 60 per cent more on data preparation, governance, integration and unused licences. The organisations that budgeted AI as a fixed software cost are the ones most consistently exceeding their forecasts.
The implication for every CEO: AI cost governance requires a fundamentally different approach from software procurement. Token consumption is variable, usage-driven and scales with every new workflow your organisation enables. Treating it as a subscription line item in the IT budget produces the same result as treating electricity as a one-time purchase. The bill arrives. It is larger than expected. And the organisation has no framework for understanding why.
The software industry is responding to this reality. Seat-based pricing dropped from 21 per cent to 15 per cent of software companies in just 12 months. Hybrid and usage-based pricing surged from 27 per cent to 41 per cent. Intercom now charges $0.99 per fully resolved customer issue. Zendesk charges $1.50 per AI-resolved ticket. The direction is clear: from per-seat to per-agent to per-outcome. Your vendors are already pricing on token consumption. The question is whether your organisation is budgeting for it.
Deloitte's 2025 US Technology Value survey found that nearly half of leaders expect up to three years to see ROI from basic AI automation. Only 28 per cent of global finance leaders report clear, measurable value from AI investments. The organisations that can connect their token spend to business outcomes have a structural advantage. The ones that cannot are spending on a trajectory they do not yet understand.
The FinOps Foundation's State of FinOps 2026 report shows that 98 per cent of organisations now actively manage AI spend, up from 31 per cent two years prior. AI cost management is the number one skillset that technology teams are seeking to add. The discipline is emerging. The frameworks are still immature.
The core shift: AI cost is moving from a technology line item to an operational cost governed with the same rigour as headcount, energy or capital allocation. This requires three structural changes that most mid-market organisations have yet to make.
Visibility. Most organisations cannot tell you how many tokens they consumed last month, which teams consumed them, or which business outcomes those tokens produced. The FinOps Foundation reports that IT teams oversee only 15 per cent of overall SaaS spend in the average large enterprise. Token consumption is distributed across functions, individuals and AI agents. Without a centralised gateway that tracks consumption by team, use case and model, the organisation is flying blind.
Allocation. Once visible, token spend requires allocation: which function owns which budget, at what threshold does consumption require approval, and how does the organisation charge back or show back AI costs to the departments generating them. The emerging practice is hierarchical budget enforcement: organisation-level ceilings, department-level allocations, team-level limits and individual-level monitoring. The same logic organisations apply to cloud computing spend.
Forecasting. Token demand is not linear. It compounds. Every new workflow, every new agent, every expansion from pilot to production multiplies consumption. The forecasting model requires: expected users, multiplied by interactions per session, multiplied by average tokens per interaction, adjusted for seasonality and growth. This is demand planning for a new type of resource, and most organisations are running on estimates rather than models.
Here is where the economics become an operating model question. AI agents are increasingly priced, deployed and managed as workforce equivalents. An autonomous AI agent costs approximately $500 per month: roughly $6,000 per year, compared to $50,000 to $80,000 fully loaded for a human employee performing equivalent routine tasks. KPMG is building "build, buy, borrow, or bot" decision frameworks for workforce planning. Deloitte has launched AI agent platforms designed to operate alongside human teams within established governance structures.
The World Economic Forum's October 2025 survey of over 1,000 C-suite executives found that 92 per cent reported up to 20 per cent workforce overcapacity. By 2028, nearly half expect more than 30 per cent excess capacity. This is the structural context: organisations are simultaneously building AI capability that performs knowledge work and accumulating human capacity that will need to be redeployed, reskilled or restructured.
The question is how to plan for both. Workforce planning has historically operated with one input: human hours. A function needs 40 hours of work per week; the organisation hires one full-time equivalent. That model now has a second input. A function needs 40 hours of work per week; the organisation can allocate some portion to human hours and some to token throughput. The ratio depends on the nature of the work, the quality requirements, the governance constraints and the cost trade-off.
Workforce planning with one input produces a headcount plan. Workforce planning with two inputs produces a capacity model. The second version governs both human hours and token throughput within the same framework. That is the structural shift underway.
Gartner forecasts that 40 per cent of enterprise applications will feature task-specific AI agents by the end of 2026, up from less than 5 per cent in 2025. The WEF Future of Jobs Report 2025 projects 92 million roles displaced by 2030, with 170 million new roles emerging: a net gain of 78 million jobs, but a fundamental recomposition of what those jobs contain. Fifty-two per cent of leaders rank job redesign as their top workforce priority. Only 46 per cent currently integrate workforce planning into their AI roadmaps.
The "AI-FTE" does not yet exist as a standardised unit. This is an evidence gap, and it is also an opportunity. The organisations that build capacity planning models incorporating both human and AI throughput will have a structural advantage in resource allocation, cost governance and workforce design. The organisations that continue planning solely around headcount will consistently misjudge both their costs and their capacity.
An emerging signal reinforces the shift. In March 2026, Jensen Huang proposed that engineers should receive roughly half their base salary in AI token credits, framing compute access as a productivity multiplier rather than a tool subscription. Over 40 per cent of technology companies now include AI credit allocations in their benefits packages, up from under 5 per cent eighteen months ago. Tokens are becoming a factor of productive capacity for individuals, and the implications for workforce planning at the organisational level follow directly.
The token economy creates a temptation: if AI can perform a task at a fraction of the human cost, automate it entirely. The evidence says otherwise.
Klarna deployed an AI chatbot in February 2024 that handled 2.3 million conversations, performing the equivalent work of 700 full-time agents. Resolution time dropped from 11 minutes to under two minutes. The estimated profit improvement was $40 million. Twelve months later, Klarna began rehiring human agents. Customer satisfaction on complex cases had declined. The full-automation model produced speed and cost savings. It eroded quality and trust where human judgement was required.
MIT research confirms the pattern. Companies that deploy hybrid models, where AI handles routine reasoning and data synthesis while humans govern decisions and handle complex exceptions, achieve approximately 40 per cent greater ROI than organisations that pursue either all-human or maximum-automation approaches. The hybrid model compounds because it allocates each type of work to the resource best suited to perform it: tokens for speed, scale and pattern recognition; human hours for judgement, context and accountability.
The operating model implication is precise. Every AI-enabled workflow requires a design decision: where does the AI act, where does the human govern, and where do the two interact? That design decision is not a technology question. It is an operating model question. It determines the quality of the output, the cost of the workflow, the governance posture and the long-term sustainability of the system.
The highest-performing organisations design workflows where AI handles the work that scales with tokens (data synthesis, pattern recognition, document generation, routing) and humans hold the decisions that carry consequence (approvals, exceptions, client relationships, strategic judgement). The boundary between the two is a governance decision. It belongs in the operating model, where it can be reviewed, adjusted and institutionalised.
The five sections above describe a new economic reality: the cost of intelligence is measurable, variable and scaling. The organisations that compound from this reality are the ones that wrap an operating model around it. The token economy requires governance decisions, budget architecture, workflow design and workforce planning that function as a single system.
Establish token visibility across the organisation. Audit current AI consumption by function, model and use case. Define the governance principles: who authorises new AI workflows, at what cost threshold, with what human checkpoints. Build the foundation that every budget and capacity decision rests on.
Design the token budget architecture: departmental allocations, model routing policies, cost-per-outcome measurement frameworks. Build the capacity planning model that accounts for both human hours and token throughput. Design the hybrid workflows function by function, specifying where AI acts, where humans govern and where the boundary sits.
Deploy the governed workflows into production. Establish the quarterly review rhythm for token spend, capacity ratios and outcome measurement. Build the AI-Enabled Playbook that captures every governance decision, every workflow design and every budget model the organisation has built. The capability compounds when the operating model holds it.
The token economy is here. The costs are real, the trajectory is clear and the governance requirements are structural. Three questions will tell you whether your organisation is managing this as an operating model variable or absorbing it as an unexamined cost.
The organisation that answers these three questions honestly and finds itself closer to the concerning column is in the right position to act. The token economy is a structural shift in how knowledge work is priced, performed and governed. The organisations that build the operating model around it will compound the advantage. The ones that treat it as a technology cost will spend more each quarter while understanding less about why.
The tools and the pricing are available today. The governance frameworks are emerging. The workforce planning models are being designed. What remains is the operating model decision to treat token capacity as a strategic variable with the same rigour the organisation applies to every other factor of production.
If that is the conversation your organisation needs to have, it is the conversation setmode.io exists to facilitate.