How do I automatically route AI calls to cheaper models?

Routing rules let you automatically redirect AI API calls to cheaper models based on feature, department, model name, or token count — without changing any application code. The most powerful way to operationalise cost recommendations at scale.

Routing rules redirect AI API calls to cheaper models based on feature, department, model name, or token count — without touching your application code. Apply a one-click recommendation from the AI Advisor and the rule is live immediately.

Metric	Value	Detail
Avg savings	31%	Monthly AI cost reduction after 3 rules applied
Setup time	2 min	From dashboard to active rule
Code changes	0	Rules are applied at proxy layer

How routing rules work

A routing rule is an if-then statement evaluated at the proxy layer before your request reaches the AI provider. If the call matches the rule conditions, the model is silently substituted. Your application receives a valid response — it has no knowledge of the redirect.

Rule: "sentiment-analysis optimisation"
  IF   X-Cost-Feature = "sentiment-analysis"
  AND  model = "gpt-4o"
  THEN redirect to "gpt-4o-mini"

Result:
  Your app calls:  gpt-4o   (still in the code)
  Proxy sends to:  gpt-4o-mini  (rule applied)
  You receive:     Valid gpt-4o-mini response
  Monthly saving:  $822/mo   (94% cost reduction)

The fastest way to create a routing rule is from the Recommendations page. Click Apply next to any model-mismatch recommendation — Cognocient creates the rule for you, pre-filled with the correct feature name and target model.

Creating a routing rule manually

Go to Control → Routing Rules → New Rule. The rule builder has four fields:

Name — A human-readable label for the rule, e.g. "Downgrade sentiment analysis to mini." Shown in the routing rules list and in the Live Calls view when a call is redirected.

Conditions — Any combination of: feature name (exact or wildcard), department name, source model (the model your app is calling), token count range (e.g., input < 1,000 tokens). Multiple conditions are AND-ed.

Action — The target model to redirect to. Cognocient validates that the target model is compatible with the source (same context window, same response format). Incompatible redirects are rejected at rule creation time.

Mode — Active (redirect all matching calls immediately) or Shadow (redirect but log both costs without changing behaviour — useful for validating the quality of the cheaper model before committing).

Shadow mode — test before committing

Shadow mode is the safest way to validate a routing rule. When a rule is in shadow mode:

The original call proceeds as normal — the user gets the response they expected
Cognocient also sends an identical request to the target model in the background
Both responses are stored for comparison in the rule's Shadow Report
You can review the quality difference before activating the rule

Shadow mode incurs costs for both the original and shadow model. Use it for 24–48 hours of validation only, then switch to Active mode once quality is confirmed.

Model compatibility reference

If your app calls	You can route to	Cost saving
`gpt-4o`	`gpt-4o-mini`	~94%
`gpt-4o`	`claude-haiku-4-5`	~90%
`claude-sonnet-4-6`	`claude-haiku-4-5`	~75%
`claude-opus-4-8`	`claude-sonnet-4-6`	~60%
`gpt-4o (async)`	`gpt-4o batch API`	50%

Cross-provider redirects (OpenAI → Anthropic) change the response object format. If your application parses specific provider fields, test with shadow mode first.

Next steps: Recommendations · Budgets · Feature Intelligence

How routing rules work

Creating a routing rule manually

Shadow mode — test before committing

Model compatibility reference

On this page