How-to Guides

How do I apply routing rules to automatically switch models?

Create a routing rule in the dashboard to redirect GPT-4o to GPT-4o-mini (or any cheaper model) for matching calls — no code changes, savings apply immediately.

Goal: Automatically redirect expensive model calls to cheaper equivalents for specific features — without changing a single line of application code.

Time: 2 minutes from dashboard to active rule. Savings appear on the next call.

Prerequisite: At least a few days of attributed calls so you can see which features are using expensive models unnecessarily. Run one-click recommendations first to identify candidates.


Step 1 — Identify the rule candidate

Go to Dashboard → AI Advisor → Recommendations. Look for model-mismatch cards — these are features using frontier models for simple tasks. Each card shows:

  • Which feature and model are over-provisioned
  • Estimated monthly saving
  • Confidence level (based on call volume and output length data)

Alternatively, go to Dashboard → Spend by Feature and look for features with high cost but short average output tokens — a signal that a cheaper model would handle them identically.

Step 2 — Apply from the Recommendations page (fastest path)

Click Apply next to a model-mismatch recommendation. Cognocient pre-fills the rule with:

  • The feature name from X-Cost-Feature
  • The source model your code is currently calling
  • The recommended target model

Click Confirm. The routing rule is active immediately — no deployment, no restart.

After applying, check Live Calls — you'll see the x-cog-redirected: true header on matching calls within seconds. The rule is working.

Step 3 — Create a rule manually

For custom conditions (token count, department, time-of-day), create the rule from scratch:

Go to Control → Routing Rules → New Rule and fill in:

FieldWhat to enterExample
NameDescriptive labelDowngrade sentiment-analysis to mini
Match: FeatureX-Cost-Feature valuesentiment-analysis
Match: Source modelModel your code callsgpt-4o
Match: Max input tokensOnly redirect short inputs< 500
Action: Redirect toCheaper modelgpt-4o-mini
ModeStart with ShadowShadow

Step 4 — Validate with shadow mode before going live

Shadow mode runs the cheaper model in parallel without changing what your users receive. Use it for 24–48 hours to confirm output quality before activating the rule.

In Routing Rules → [your rule] → Shadow Report you can compare:

  • Responses from the original model vs. the shadow model side by side
  • Token counts and costs for both
  • Whether the shadow model consistently produces equivalent outputs

Once satisfied, click Activate to switch the rule from Shadow to Active.

Shadow mode costs money for both the original and shadow model calls. Use it for short validation windows (24–48 hours), then activate or discard the rule.

Model compatibility reference

If your code callsRoute toTypical saving
gpt-4ogpt-4o-mini~94%
claude-sonnet-4-6claude-haiku-4-5~75%
claude-opus-4-8claude-sonnet-4-6~60%
gpt-4o (async job)gpt-4o via Batch API50%

Cross-provider redirects (e.g., OpenAI → Anthropic) change the response object format. If your code accesses provider-specific fields (like logprobs or usage.cache_read_input_tokens), test with shadow mode first.

Step 5 — Monitor the impact

After activating, check Dashboard → Spend by Feature for the affected feature. Cost should drop within 24 hours. The Routing Rules page shows:

  • Calls redirected (count and %)
  • Cost saved vs. the original model
  • Any calls that bypassed the rule (e.g., explicitly set to a different model)

If quality issues arise, click Pause on the rule — the original model resumes immediately.


Related: Routing Rules · Cut Your AI Bill with One Click · Waste Detection

On this page