Automatic AI Model Selection
Every task gets the right model. Every model earns its place through continuous evaluation.
GreyMatter brokers across 20+ AI models, routing each task to the best-performing model for that specific job through automated evaluation. The routing pipeline evaluates every request against three criteria—cost, speed, and accuracy—and dispatches it to the model it’s best suited for in milliseconds.
How It Works
The Multi-Stage Routing Pipeline
Every request flows through a multi-stage pipeline that decides, in milliseconds, which model handles it. The pipeline is a hybrid system by design: deterministic checks handle anything with a clear right answer, a lightweight classifier handles fuzzy semantic judgments, and a scoring function combines them. This keeps every routing decision debuggable and auditable post-hoc, which matters when each decision has security consequences.
The pipeline separates the user's instruction from attachments and tenant constraints: data residency requirements, allowed providers, budget ceilings.
A deterministic config lookup removes models that cannot serve the request. Filters include context-window length, tool-call support, structured-output capability, vision/file support, and region rules. This is fast, binary, and non-negotiable.
Each of GreyMatter's agents carries a precomputed profile: agent type, tool families it uses, expected output format, and preferred provider family. This narrows the candidate pool before any semantic analysis begins.
A lightweight, specialized classifier reads the user's instruction (not the full prompt) and predicts:
Each model that survives the constraint filters is ranked on a weighted trade-off across expected quality, cost, and latency. The weighting shifts with the task: high-volume extraction prioritizes cost and speed, while complex incident correlation prioritizes quality, even at higher cost.
The highest-scoring model for this specific request wins.The top-scoring model receives the request. The routing decision, the model selected, and the eventual outcome are all logged for continuous improvement.
Model Evaluation & Promotion Pipeline
Every model earns its routing position through standardized evaluation. No model enters production routing without meeting predefined evaluation criteria.
Fingerprinting via Probe Prompts
Each model is evaluated against a standardized set of test prompts spanning security-relevant task domains. Results produce a performance fingerprint—a numerical feature vector representing where the model excels and where it falls short. For example, some models score high on structured extraction but produce loose reasoning, while others handle ambiguous narrative analysis well but generate verbose output. The fingerprint captures these profiles quantitatively.
Shadow Traffic
After fingerprinting, candidate models run in shadow against production traffic—processing real requests in parallel with the currently routed model. Automated scoring compares outputs against the incumbent's results on the same requests, validating that fingerprint-predicted performance holds under real workload conditions.
Scoreboard & Promotion
Results populate a scoreboard that ranks models per task type across cost, speed, and accuracy. When a candidate consistently outperforms the incumbent on a task type, an operator reviews and promotes into active routing.
Ground Truth Feedback
Security work has a structural advantage for AI evaluation: ground truth exists. Every alert eventually resolves to true positive, false positive, or escalation. Every detection fires or doesn't. These outcomes feed directly back into model scoring—meaning performance is informed by actual security results.
Automatic Model Adoption and Failover
The architecture treats models as configuration, not code. Adding a new model to the evaluation pipeline requires no re-engineering; rather, it enters fingerprinting immediately upon availability, progresses through shadow traffic, and promotes into routing when it earns its position on the scoreboard.
What this means operationally:
Mid-operation failover:If a model degrades or goes down during an active workflow, the pipeline automatically dispatches to the next-best model for that task type. Workflows continue without interruption or restart.
Scenario: AI Model Broker + Detection Engineering Teammate
GreyMatter's 6 Agentic Teammates—IR, detection engineering, threat hunting, threat intel, IT, and OT—each decomposes jobs into hundreds of single-task agents. Every one of those agents executes through the model broker.
Consider a detection engineering request. The Teammate breaks it into component tasks: one agent writes the detection logic while another validates coverage against known attack patterns. Each of those agents may route to a different model—the logic-writing agent to a model strong at structured code generation, the validation agent to a model strong at reasoning over pattern sets.
Flat Pricing: How Model Selection Makes It Possible
Roughly three-quarters of tasks completed by GreyMatter resolve on lightweight, inexpensive models. High-volume, well-bounded work (log normalization, event summaries, IOC extraction, field mapping) produces accurate results without frontier-model compute. The remaining 25%—complex reasoning, incident correlation, ambiguous narrative extraction—routes to premium models where the need for accuracy justifies the cost.
This per-task cost control is what makes flat, unlimited-usage pricing viable. Customers pay one price regardless of volume because GreyMatter manages model economics internally rather than passing variable AI costs through.
| Single-Model Platforms | Customer-Choice Platforms | ![]() |
|
|---|---|---|---|
| Model architecture | One fixed model for all tasks. | Multiple models; customer selects. | 20+ models; automatic per-task routing. |
| Pricing model | Per token / per query / per investigation. | Per token / per query (across chosen model). | Flat, unlimited usage. |
| Cost as usage scales | Linear increase with volume. | Linear increase (customer absorbs optimization burden). | Flat. Routing absorbs cost optimization internally. |
| When a better model emerges | Re-procurement or vendor dependency. | Customer re-evaluates, reconfigures, re-procures. | Automatic evaluation and promotion; no customer action. |
| Per-task optimization | None. Same model regardless of task complexity. | Manual. Customer must understand task-to-model fit. | Automatic. Scoring function matches task complexity to model capability. |
| Accuracy under task diversity | Degrades. One model handles everything from triage to complex reasoning. | Depends on customer's configuration skill. | Maintained. Each task type routes to its strongest model. |
| Cost visibility | Unpredictable; scales with investigation volume. | Unpredictable; customer manages cost-quality tradeoffs. | Predictable; one price regardless of volume. |
GreyMatter’s Approach
The pricing model is a direct consequence of routing architecture:
At a Glance
| Attribute | Detail |
|---|---|
| Routing architecture | Hybrid (deterministic constraints + semantic classifier + scoring function). |
| Scoring formula | Weighted trade-off across quality, cost, and latency (per-task weighting). |
| Routing overhead | Milliseconds end-to-end. |
| Failover | Automatic, mid-operation, to next-best model per task type. |
| Ground truth signal | Alert resolutions (TP/FP/escalation) feed back into model scoring. |
| Explainability | Every routing decision logged and auditable post-hoc. |

