Optimization

How do I automatically route AI calls to cheaper models?

Routing rules let you automatically redirect AI API calls to cheaper models based on feature, department, model name, or token count — without changing any application code. The most powerful way to operationalise cost recommendations at scale.

Routing rules redirect AI API calls to cheaper models based on feature, department, model name, or token count — without touching your application code. Apply a one-click recommendation from the AI Advisor and the rule is live immediately.

MetricValueDetail
Avg savings31%Monthly AI cost reduction after 3 rules applied
Setup time2 minFrom dashboard to active rule
Code changes0Rules are applied at proxy layer

How routing rules work

A routing rule is an if-then statement evaluated at the proxy layer before your request reaches the AI provider. If the call matches the rule conditions, the model is silently substituted. Your application receives a valid response — it has no knowledge of the redirect.

Rule: "sentiment-analysis optimisation"
  IF   X-Cost-Feature = "sentiment-analysis"
  AND  model = "gpt-4o"
  THEN redirect to "gpt-4o-mini"

Result:
  Your app calls:  gpt-4o   (still in the code)
  Proxy sends to:  gpt-4o-mini  (rule applied)
  You receive:     Valid gpt-4o-mini response
  Monthly saving:  $822/mo   (94% cost reduction)

The fastest way to create a routing rule is from the Recommendations page. Click Apply next to any model-mismatch recommendation — Cognocient creates the rule for you, pre-filled with the correct feature name and target model.

Creating a routing rule manually

Go to Control → Routing Rules → New Rule. The rule builder has four fields:

Name — A human-readable label for the rule, e.g. "Downgrade sentiment analysis to mini." Shown in the routing rules list and in the Live Calls view when a call is redirected.

Conditions — Any combination of: feature name (exact or wildcard), department name, source model (the model your app is calling), token count range (e.g., input < 1,000 tokens). Multiple conditions are AND-ed.

Action — The target model to redirect to. Cognocient validates that the target model is compatible with the source (same context window, same response format). Incompatible redirects are rejected at rule creation time.

Mode — Active (redirect all matching calls immediately) or Shadow (redirect but log both costs without changing behaviour — useful for validating the quality of the cheaper model before committing).

Shadow mode — test before committing

Shadow mode is the safest way to validate a routing rule. When a rule is in shadow mode:

  1. The original call proceeds as normal — the user gets the response they expected
  2. Cognocient also sends an identical request to the target model in the background
  3. Both responses are stored for comparison in the rule's Shadow Report
  4. You can review the quality difference before activating the rule

Shadow mode incurs costs for both the original and shadow model. Use it for 24–48 hours of validation only, then switch to Active mode once quality is confirmed.

Model compatibility reference

If your app callsYou can route toCost saving
gpt-4ogpt-4o-mini~94%
gpt-4oclaude-haiku-4-5~90%
claude-sonnet-4-6claude-haiku-4-5~75%
claude-opus-4-8claude-sonnet-4-6~60%
gpt-4o (async)gpt-4o batch API50%

Cross-provider redirects (OpenAI → Anthropic) change the response object format. If your application parses specific provider fields, test with shadow mode first.


Next steps: Recommendations · Budgets · Feature Intelligence

On this page