How do I apply routing rules to automatically switch models?

Create a routing rule in the dashboard to redirect GPT-4o to GPT-4o-mini (or any cheaper model) for matching calls — no code changes, savings apply immediately.

Goal: Automatically redirect expensive model calls to cheaper equivalents for specific features — without changing a single line of application code.

Time: 2 minutes from dashboard to active rule. Savings appear on the next call.

Prerequisite: At least a few days of attributed calls so you can see which features are using expensive models unnecessarily. Run one-click recommendations first to identify candidates.

Step 1 — Identify the rule candidate

Go to Dashboard → AI Advisor → Recommendations. Look for model-mismatch cards — these are features using frontier models for simple tasks. Each card shows:

Which feature and model are over-provisioned
Estimated monthly saving
Confidence level (based on call volume and output length data)

Alternatively, go to Dashboard → Spend by Feature and look for features with high cost but short average output tokens — a signal that a cheaper model would handle them identically.

Step 2 — Apply from the Recommendations page (fastest path)

Click Apply next to a model-mismatch recommendation. Cognocient pre-fills the rule with:

The feature name from X-Cost-Feature
The source model your code is currently calling
The recommended target model

Click Confirm. The routing rule is active immediately — no deployment, no restart.

After applying, check Live Calls — you'll see the x-cog-redirected: true header on matching calls within seconds. The rule is working.

Step 3 — Create a rule manually

For custom conditions (token count, department, time-of-day), create the rule from scratch:

Go to Control → Routing Rules → New Rule and fill in:

Field	What to enter	Example
Name	Descriptive label	`Downgrade sentiment-analysis to mini`
Match: Feature	`X-Cost-Feature` value	`sentiment-analysis`
Match: Source model	Model your code calls	`gpt-4o`
Match: Max input tokens	Only redirect short inputs	`< 500`
Action: Redirect to	Cheaper model	`gpt-4o-mini`
Mode	Start with Shadow	Shadow

Step 4 — Validate with shadow mode before going live

Shadow mode runs the cheaper model in parallel without changing what your users receive. Use it for 24–48 hours to confirm output quality before activating the rule.

In Routing Rules → [your rule] → Shadow Report you can compare:

Responses from the original model vs. the shadow model side by side
Token counts and costs for both
Whether the shadow model consistently produces equivalent outputs

Once satisfied, click Activate to switch the rule from Shadow to Active.

Shadow mode costs money for both the original and shadow model calls. Use it for short validation windows (24–48 hours), then activate or discard the rule.

Model compatibility reference

If your code calls	Route to	Typical saving
`gpt-4o`	`gpt-4o-mini`	~94%
`claude-sonnet-4-6`	`claude-haiku-4-5`	~75%
`claude-opus-4-8`	`claude-sonnet-4-6`	~60%
`gpt-4o` (async job)	`gpt-4o` via Batch API	50%

Cross-provider redirects (e.g., OpenAI → Anthropic) change the response object format. If your code accesses provider-specific fields (like logprobs or usage.cache_read_input_tokens), test with shadow mode first.

Step 5 — Monitor the impact

After activating, check Dashboard → Spend by Feature for the affected feature. Cost should drop within 24 hours. The Routing Rules page shows:

Calls redirected (count and %)
Cost saved vs. the original model
Any calls that bypassed the rule (e.g., explicitly set to a different model)

If quality issues arise, click Pause on the rule — the original model resumes immediately.