Automatic AI Model Selection with ReliaQuest GreyMatter

GreyMatter brokers across 20+ AI models, routing each task to the best-performing model for that specific job through automated evaluation. The routing pipeline evaluates every request against three criteria—cost, speed, and accuracy—and dispatches it to the model it's best suited for in milliseconds.

How It Works

The Multi-Stage Routing Pipeline

Every request flows through a multi-stage pipeline that decides, in milliseconds, which model handles it. The pipeline is a hybrid system by design: deterministic checks handle anything with a clear right answer, a lightweight classifier handles fuzzy semantic judgments, and a scoring function combines them. This keeps every routing decision debuggable and auditable post-hoc, which matters when each decision has security consequences.

01

Request Parsing

The pipeline separates the user's instruction from attachments and tenant constraints: data residency requirements, allowed providers, budget ceilings.

02

Hard Constraint Elimination

A deterministic config lookup removes models that cannot serve the request. Filters include context-window length, tool-call support, structured-output capability, vision/file support, and region rules. This is fast, binary, and non-negotiable.

03

Agent Profile Matching

Each of GreyMatter's agents carries a precomputed profile: agent type, tool families it uses, expected output format, and preferred provider family. This narrows the candidate pool before any semantic analysis begins.

04

Semantic Classification

A lightweight, specialized classifier reads the user's instruction (not the full prompt) and predicts:

Task type:Extraction, summarization, code generation, open-ended reasoning, narrative analysis

Complexity profile:Reasoning depth required, domain knowledge required, constraint density

05

Scoring

Each model that survives the constraint filters is ranked on a weighted trade-off across expected quality, cost, and latency. The weighting shifts with the task: high-volume extraction prioritizes cost and speed, while complex incident correlation prioritizes quality, even at higher cost.

The highest-scoring model for this specific request wins.

06

Dispatch

The top-scoring model receives the request. The routing decision, the model selected, and the eventual outcome are all logged for continuous improvement.

Model Evaluation & Promotion Pipeline

Every model earns its routing position through standardized evaluation. No model enters production routing without meeting predefined evaluation criteria.

Fingerprinting via Probe Prompts

Each model is evaluated against a standardized set of test prompts spanning security-relevant task domains. Results produce a performance fingerprint—a numerical feature vector representing where the model excels and where it falls short. For example, some models score high on structured extraction but produce loose reasoning, while others handle ambiguous narrative analysis well but generate verbose output. The fingerprint captures these profiles quantitatively.

Shadow Traffic

After fingerprinting, candidate models run in shadow against production traffic—processing real requests in parallel with the currently routed model. Automated scoring compares outputs against the incumbent's results on the same requests, validating that fingerprint-predicted performance holds under real workload conditions.

Scoreboard & Promotion

Results populate a scoreboard that ranks models per task type across cost, speed, and accuracy. When a candidate consistently outperforms the incumbent on a task type, an operator reviews and promotes into active routing.

Ground Truth Feedback

Security work has a structural advantage for AI evaluation: ground truth exists. Every alert eventually resolves to true positive, false positive, or escalation. Every detection fires or doesn't. These outcomes feed directly back into model scoring—meaning performance is informed by actual security results.

Automatic Model Adoption and Failover

The architecture treats models as configuration, not code. Adding a new model to the evaluation pipeline requires no re-engineering; rather, it enters fingerprinting immediately upon availability, progresses through shadow traffic, and promotes into routing when it earns its position on the scoreboard.

What this means operationally:

When a stronger or more cost-efficient model emerges from any provider, GreyMatter evaluates it independently.

No re-procurement cycle, integration management, or configuration changes pushed to customers.

The defense layer improves continuously alongside AI model capabilities.

Mid-operation failover:If a model degrades or goes down during an active workflow, the pipeline automatically dispatches to the next-best model for that task type. Workflows continue without interruption or restart.

Scenario

Scenario: AI Model Broker + Detection Engineering Teammate

GreyMatter's 6 Agentic Teammates—IR, detection engineering, threat hunting, threat intel, IT, and OT—each decomposes jobs into hundreds of single-task agents. Every one of those agents executes through the model broker.

Consider a detection engineering request. The Teammate breaks it into component tasks: one agent writes the detection logic while another validates coverage against known attack patterns. Each of those agents may route to a different model—the logic-writing agent to a model strong at structured code generation, the validation agent to a model strong at reasoning over pattern sets.

Flat Pricing: How Model Selection Makes It Possible

Roughly three-quarters of tasks completed by GreyMatter resolve on lightweight, inexpensive models. High-volume, well-bounded work (log normalization, event summaries, IOC extraction, field mapping) produces accurate results without frontier-model compute. The remaining 25%—complex reasoning, incident correlation, ambiguous narrative extraction—routes to premium models where the need for accuracy justifies the cost.

This per-task cost control is what makes flat, unlimited-usage pricing viable. Customers pay one price regardless of volume because GreyMatter manages model economics internally rather than passing variable AI costs through.

	Single-Model Platforms	Customer-Choice Platforms
Model architecture	One fixed model for all tasks.	Multiple models; customer selects.	20+ models; automatic per-task routing.
Pricing model	Per token / per query / per investigation.	Per token / per query (across chosen model).	Flat, unlimited usage.
Cost as usage scales	Linear increase with volume.	Linear increase (customer absorbs optimization burden).	Flat. Routing absorbs cost optimization internally.
When a better model emerges	Re-procurement or vendor dependency.	Customer re-evaluates, reconfigures, re-procures.	Automatic evaluation and promotion; no customer action.
Per-task optimization	None. Same model regardless of task complexity.	Manual. Customer must understand task-to-model fit.	Automatic. Scoring function matches task complexity to model capability.
Accuracy under task diversity	Degrades. One model handles everything from triage to complex reasoning.	Depends on customer's configuration skill.	Maintained. Each task type routes to its strongest model.
Cost visibility	Unpredictable; scales with investigation volume.	Unpredictable; customer manages cost-quality tradeoffs.	Predictable; one price regardless of volume.

GreyMatter’s Approach

The pricing model is a direct consequence of routing architecture:

At a Glance

Attribute	Detail
Routing architecture	Hybrid (deterministic constraints + semantic classifier + scoring function).
Scoring formula	Weighted trade-off across quality, cost, and latency (per-task weighting).
Routing overhead	Milliseconds end-to-end.
Failover	Automatic, mid-operation, to next-best model per task type.
Ground truth signal	Alert resolutions (TP/FP/escalation) feed back into model scoring.
Explainability	Every routing decision logged and auditable post-hoc.

One Platform to Unify Your Security Operations

The Agentic AI Security Operations Platform

ReliaQuest Resource Center

Automatic AI Model Selection