A security leader at a major retailer signed an AI contract with a fixed token allotment. Four months later, the invoice landed: half a million dollars in overages. The team hadn't done anything unusual—they'd investigated alerts, run hunts, queried threat intel. They'd used the platform the way it was designed to be used. The meter just ran faster than anyone modeled.

Stories like that one are circulating in every CISO peer group right now. At security conferences this year, token economics dominated the hallway conversations. We’re hearing that SOC analysts are burning through thousands of dollars in tokens every month—each.

The "find out" phase of enterprise AI adoption has arrived.

The Build Instinct (And What Kills It)

The first reaction to a seven-figure AI bill is predictable: "We'll build this ourselves."

Enterprise security teams look at the token invoices, look at what they're getting back, and conclude the margin is indefensible. The math seems simple—buy model access directly, build the orchestration layer, control the spend.

Then they start building.

The "tokenmaxxing" problem—teams spinning up experiments, agents, and automations with no governance layer to meter what's accretive and what's waste—turns out to be harder to solve internally than externally. The questions compound fast: Which tasks justify a frontier model? Which ones can run on something lighter? How do you A/B test model performance continuously without a dedicated ML engineering team? How do you enforce cost governance without throttling the security work that actually matters?

Most teams that start down the BYO path arrive at the same conclusion within three months: the orchestration and infrastructure layer is where the real complexity lives, and building it from scratch is a multi-year engineering commitment that competes directly with their core security mission.

So they turn to vendors. And that's where the next problem starts.

The AI SOC Vendor Fee: Per-Token Pricing Rations Security

Here’s how AI vendor pricing tends to work: You pay for a certain token allotment every month, then pay an additional fee for overages. The pitch sounds rational: pay only for what you use. Usage-based pricing aligns cost to value. Every finance team nods.

For security, this model is dangerous. SOC activity scales with the threat level, so analysts will naturally use more tokens during an attack. Say you’re experiencing an active attack against your organization, but you've exhausted your monthly investigations. The decision becomes financial: do you pay the overage or do the investigation manually? It's the same story with detection—if you run out of tokens, do you just accept the blind spots or pay up?

When the meter runs out, analysts start to hesitate on the extra pivot. The third enrichment query gets skipped. The proactive hunt gets deferred because no one can predict what it'll cost until it's already running.

You're rationing security decisions based on a billing model designed for cloud compute—applied to a domain where the highest-cost moments are also the highest stakes.

Why It's Structurally Broken

Most AI security platforms lock into one model at configuration. That creates a lose-lose:

Pick a lightweight model, and complex security tasks—multi-step investigations, behavioral correlation, nuanced threat intel analysis—degrade in quality. Pick a frontier model, and every task, including the ones a smaller model handles fine, runs at premium token rates.

Single-model architectures also lock teams out of improvements. Better models ship monthly. Cheaper models emerge quarterly. A fixed-model platform can't adopt them without re-procurement, re-integration, or customer-managed configuration. And if that single model degrades or goes down, there’s no failover—the entire operation degrades with it.

Even some vendors that have moved to multi-model approaches have yet to solve the routing problem.

In one variant, the customer chooses which model handles which task—meaning your team absorbs the optimization burden, needs to understand task-to-model fit, and reconfigures manually as models change.

In another, the vendor routes automatically but provides zero visibility into why a given model was selected or what it cost for a specific query. The result: "dynamic token usage based on the complexity of the query" with no way for the customer to audit, predict, or optimize spend.

In every case, variable model cost flows downstream to the customer—whether through per-token billing, opaque complexity-based pricing, or manual optimization overhead.

This is the root cause of per-token pricing: when a vendor has no mechanism to control its own model cost, it passes variable cost directly to the customer.

The Architectural Answer: Controlling Cost at the Infrastructure Layer

The causal chain runs in one direction: uncontrolled model cost → variable pricing passed to the customer → behavioral rationing of security operations.

Breaking the chain requires solving the cost problem at the infrastructure layer—intelligently routing every task to the optimal model for cost, speed, and accuracy.

A triage task that a lightweight model handles at 99%+ accuracy doesn't consume frontier-model resources. A complex multi-step investigation that demands reasoning depth gets routed to the model built for it.

This approach is what makes flat, unlimited-usage pricing economically viable. It optimizes cost continuously at the infrastructure layer, so the pricing layer above it can remain predictable regardless of volume.

What changes when the meter disappears:

  • Analysts stop self-rationing. Every alert gets investigated. Every hunt runs to completion.

  • New models enter production within days—no re-procurement, no customer-managed migration.

  • Model failover happens automatically mid-operation. One model degrades, the broker routes around it without human intervention.

  • Cost doesn't compound with scale. You know what you’re paying whether you have 9,000 endpoints or 90,000.

Where AI Security Pricing Goes Next

Per-token pricing is a bet against the trajectory of model economics. Every quarter, models get cheaper and more capable. A pricing model built on per-token usage means the vendor reaps the benefits while the customer's bill stays high—or the vendor drops prices and craters its own revenue. Neither outcome is sustainable.

Flat pricing bets with the trajectory. As models improve and costs drop, the customer's defense layer gets better continuously without a corresponding cost increase. That's a durable economic model because margins are generated at the architecture level instead of at the customer's expense.

The question every CISO should put to any AI security vendor in their next evaluation: "Show me what my worst-case month costs—the month I get hit hardest, investigate the most, and hunt the deepest. Is that the month I pay the most, or the same as every other month?"

The answer tells you whether the vendor's architecture can absorb its own AI costs—or whether your team's operational freedom is the relief valve.