How do I apply routing rules to automatically switch models?
Create a routing rule in the dashboard to redirect GPT-4o to GPT-4o-mini (or any cheaper model) for matching calls — no code changes, savings apply immediately.
Goal: Automatically redirect expensive model calls to cheaper equivalents for specific features — without changing a single line of application code.
Time: 2 minutes from dashboard to active rule. Savings appear on the next call.
Prerequisite: At least a few days of attributed calls so you can see which features are using expensive models unnecessarily. Run one-click recommendations first to identify candidates.
Step 1 — Identify the rule candidate
Go to Dashboard → AI Advisor → Recommendations. Look for model-mismatch cards — these are features using frontier models for simple tasks. Each card shows:
- Which feature and model are over-provisioned
- Estimated monthly saving
- Confidence level (based on call volume and output length data)
Alternatively, go to Dashboard → Spend by Feature and look for features with high cost but short average output tokens — a signal that a cheaper model would handle them identically.
Step 2 — Apply from the Recommendations page (fastest path)
Click Apply next to a model-mismatch recommendation. Cognocient pre-fills the rule with:
- The feature name from
X-Cost-Feature - The source model your code is currently calling
- The recommended target model
Click Confirm. The routing rule is active immediately — no deployment, no restart.
After applying, check Live Calls — you'll see the x-cog-redirected: true header on matching calls within seconds. The rule is working.
Step 3 — Create a rule manually
For custom conditions (token count, department, time-of-day), create the rule from scratch:
Go to Control → Routing Rules → New Rule and fill in:
| Field | What to enter | Example |
|---|---|---|
| Name | Descriptive label | Downgrade sentiment-analysis to mini |
| Match: Feature | X-Cost-Feature value | sentiment-analysis |
| Match: Source model | Model your code calls | gpt-4o |
| Match: Max input tokens | Only redirect short inputs | < 500 |
| Action: Redirect to | Cheaper model | gpt-4o-mini |
| Mode | Start with Shadow | Shadow |
Step 4 — Validate with shadow mode before going live
Shadow mode runs the cheaper model in parallel without changing what your users receive. Use it for 24–48 hours to confirm output quality before activating the rule.
In Routing Rules → [your rule] → Shadow Report you can compare:
- Responses from the original model vs. the shadow model side by side
- Token counts and costs for both
- Whether the shadow model consistently produces equivalent outputs
Once satisfied, click Activate to switch the rule from Shadow to Active.
Shadow mode costs money for both the original and shadow model calls. Use it for short validation windows (24–48 hours), then activate or discard the rule.
Model compatibility reference
| If your code calls | Route to | Typical saving |
|---|---|---|
gpt-4o | gpt-4o-mini | ~94% |
claude-sonnet-4-6 | claude-haiku-4-5 | ~75% |
claude-opus-4-8 | claude-sonnet-4-6 | ~60% |
gpt-4o (async job) | gpt-4o via Batch API | 50% |
Cross-provider redirects (e.g., OpenAI → Anthropic) change the response object format. If your code accesses provider-specific fields (like logprobs or usage.cache_read_input_tokens), test with shadow mode first.
Step 5 — Monitor the impact
After activating, check Dashboard → Spend by Feature for the affected feature. Cost should drop within 24 hours. The Routing Rules page shows:
- Calls redirected (count and %)
- Cost saved vs. the original model
- Any calls that bypassed the rule (e.g., explicitly set to a different model)
If quality issues arise, click Pause on the rule — the original model resumes immediately.
Related: Routing Rules · Cut Your AI Bill with One Click · Waste Detection
Related articles
Tag Your First AI Call
Add 2 headers to your existing code and see per-feature spend in under 5 minutes.
Set a Monthly Spending Limit
Create a hard budget enforced at the proxy before charges reach your provider bill.
Cut Your AI Bill with One Click
Use AI Advisor recommendations to apply model downgrades and caching without code changes.