In Defence of Tokenmaxxing

Looksmaxxing, careermaxxing, Londonmaxxing – I am not online enough to understand the latest internet discourse du jour. One trend, however, that has not escaped me, is tokenmaxxing. In particular the criticism levelled against the practice as a symptom of just how performative and superficial AI rollout in the enterprise has become. Whilst those criticising seemingly pointless token consumption are right that, obviously, tokenmaxxing does not drive direct commercial value, and is yet another apathetic workplace practice couched in the ironic language of social media, I’m keen to push back on the idea of it necessarily being ‘valueless’ – reframing it as such, just requires us to understand the fundamentally emergent, unstructured and messy, nature of agentic AI adoption in the enterprise.

Tokens are chunks of data that are input into a model for processing, such as pixels, waveforms, or, given the dominance of LLMs, most commonly characters (usually around 3-5 constitute a token). With the transition into agentic AI came a switch from the usual SaaS seat-based revenue model to a token-based one, with ravenous products like Claude Code able to consume hundreds of thousands in a session. The Information first reported that a Meta employee had built an internal leadership dashboard tracking each employee’s token usage, and dubbed it “Claudenomics”. They weren’t the first; it appears that Microsoft, Salesforce, Shopify, and others, all had similar token consumption leadership boards – built either with or without executive approval.

Going further, some of the hyperscalers built token usage into performance reviews, with some engineers given token thresholds that they were expected to hit as a pillar of their KPIs. Inevitably, engineers admitted to burning tokens to jack up their scores, with devs running OpenClaw-like agents that shredded huge amounts of tokens performing peripheral tasks in the background (PragmaticEngineer) or engineers purposefully building line after line of useless code into products. Tokenmaxxing was here.

Once these stories leaked, the criticism was immediate. Tokenmaxxing became a symptom of a more fundamental malaise within agentic AI’s role in the enterprise; that the technology conflates consumption or usage with productivity, and so far, has seen little or no tangible return for the companies pouring millions into AI rollout. Tokenmaxxing itself, is a feature, not a bug, of this, incentivising devs to burn through millions of tokens in activities that drive zero financial gain for the companies. If we estimate an engineer running background agents to juice their score at consuming around 10 billion tokens per month (it could run higher), then at current Anthropic API prices for output tokens, some back-of-envelope maths would land the bill at around $250,000 per month. Multiply that across 100s of engineers, and it’s pretty easy to argue that tokenmaxxing doesn’t just offer no tangible return for these companies but instead is a significant net drain on the balance sheet.

For the critics, tokenmaxxing is a manifestation of Goodhart’s Law – economist Charles Goodhart’s principle that “when a measure becomes a target, it ceases to be a good measure.” The idea being that those who ‘play’ in a system set up with specific rewards and punishments will inevitably optimise their actions to ‘game’ the specific system that they find themselves in. This creates a series of unintended consequences for those setting the rules. Execs should not be surprised then when they set token consumption as a KPI, and then find the engineers spending tokens building useless tools.

And that is exactly the point – is the time spent whittling away burning tokens on the side necessarily useless? Potentially not. Jensen Huang for one, offered up a fervent defence of the practice, arguing that he would be “deeply alarmed” if top engineers did not consume at least $250,000 tokens (Business Insider). It’s hard to believe that this will persuade anyone, given that tokenmaxxing is essentially revenuemaxxing for Nvidia. As someone who (sadly) has no skin in the Nvidia game, let me offer my own defence of the practice.

Value is contextual. Some contexts have clearly desirable endpoints – and so the value of an activity is assigned based on how close it gets you to these outcomes. A sales team’s activity is measured against a quota; every call, demo, and email, is measured against this metric. There is no equivalent, no quota, no designator of value, for agentic AI to hit. This is felt particularly acutely in the enterprise, where, even before agentic AI tools, executives have struggled for years now to integrate AI into their day-to-day operations in a way that’s more fundamental than workers running their emails through ChatGPT. What those criticising tokenmaxxing miss, is that right now, with agentic AI in an emergent state, despite what the LinkedIn gurus claim, no one can predict what the north star metric for agentic AI rollout will even look like.

Goodhart’s Law is similarly subject to context. He was writing about monetary policy, and the difficulties the UK faced in revising new methods of credit control. This is a ‘game’ with a clear value metric: macroeconomic stability. Unintended consequences are necessarily undesirable because they will cause drift from this. This is a wholly different world from that of agentic AI rollout in large corporations, where there is no corresponding endpoint. In this messy environment, genuine value lies in encouraging exploratory experimentation, geared towards the future and potentially untethered from immediate commercial concerns, over indexing towards activities that drive direct financial gains in the here and now.

In this more uncertain, unknown playing field, unintended consequences from these actions can (not always) have a positive impact. Encouraging devs and engineers, to just build, without concern on generating an immediate return on the tokens that they consume, can then be a valuable activity – it’s just one that can’t be judged on the basis of quarterly revenue growth. Obviously engineers burning tokens to game leaderboards is not work exploring avenues for agentic AI’s use. However the conditions for genuine employee experimentation do require the kind of permissive, low-accountability environment that employers encouraging tokenmaxxing create.

There is a mirror here with the training of the neural networks themselves – the employees are engaging in unstructured, self-supervised learning, just like the models that they are using. This is when the model is pretrained on unlabelled data, undergoing the necessary, but fundamentally messy, process of enriching latent space and producing good embeddings. Here, exploratory but potentially useless learning is a necessary feature of the process – the absence of labels or specific value metrics is the whole point. As the data is raw, models have to be forced to learn abstracted more fundamental properties before they can be fine-tuned to specific tasks. Devs and engineers working with agentic AI are operating in a similarly messy and unstructured context. As such, burning through tokens by building new tools and getting to grips with novel systems, is not inherently wasteful, and value is not necessarily lost. Instead it’s the first stage of pretraining.