How do I automatically route AI calls to cheaper models?
Routing rules let you automatically redirect AI API calls to cheaper models based on feature, department, model name, or token count — without changing any application code. The most powerful way to operationalise cost recommendations at scale.
Routing rules redirect AI API calls to cheaper models based on feature, department, model name, or token count — without touching your application code. Apply a one-click recommendation from the AI Advisor and the rule is live immediately.
| Metric | Value | Detail |
|---|---|---|
| Avg savings | 31% | Monthly AI cost reduction after 3 rules applied |
| Setup time | 2 min | From dashboard to active rule |
| Code changes | 0 | Rules are applied at proxy layer |
How routing rules work
A routing rule is an if-then statement evaluated at the proxy layer before your request reaches the AI provider. If the call matches the rule conditions, the model is silently substituted. Your application receives a valid response — it has no knowledge of the redirect.
The fastest way to create a routing rule is from the Recommendations page. Click Apply next to any model-mismatch recommendation — Cognocient creates the rule for you, pre-filled with the correct feature name and target model.
Creating a routing rule manually
Go to Control → Routing Rules → New Rule. The rule builder has four fields:
Name — A human-readable label for the rule, e.g. "Downgrade sentiment analysis to mini." Shown in the routing rules list and in the Live Calls view when a call is redirected.
Conditions — Any combination of: feature name (exact or wildcard), department name, source model (the model your app is calling), token count range (e.g., input < 1,000 tokens). Multiple conditions are AND-ed.
Action — The target model to redirect to. Cognocient validates that the target model is compatible with the source (same context window, same response format). Incompatible redirects are rejected at rule creation time.
Mode — Active (redirect all matching calls immediately) or Shadow (redirect but log both costs without changing behaviour — useful for validating the quality of the cheaper model before committing).
Shadow mode — test before committing
Shadow mode is the safest way to validate a routing rule. When a rule is in shadow mode:
- The original call proceeds as normal — the user gets the response they expected
- Cognocient also sends an identical request to the target model in the background
- Both responses are stored for comparison in the rule's Shadow Report
- You can review the quality difference before activating the rule
Shadow mode incurs costs for both the original and shadow model. Use it for 24–48 hours of validation only, then switch to Active mode once quality is confirmed.
Model compatibility reference
| If your app calls | You can route to | Cost saving |
|---|---|---|
gpt-4o | gpt-4o-mini | ~94% |
gpt-4o | claude-haiku-4-5 | ~90% |
claude-sonnet-4-6 | claude-haiku-4-5 | ~75% |
claude-opus-4-8 | claude-sonnet-4-6 | ~60% |
gpt-4o (async) | gpt-4o batch API | 50% |
Cross-provider redirects (OpenAI → Anthropic) change the response object format. If your application parses specific provider fields, test with shadow mode first.
Next steps: Recommendations · Budgets · Feature Intelligence
Related articles