How-to Guides

How do I cut my AI bill with one-click recommendations?

Use Cognocient's AI Advisor to find your biggest cost savings and apply them without touching your code.

Goal: Identify and apply the top cost-saving opportunities in your AI spend — without any code changes.

Time: 30 minutes to apply. Savings appear in the next billing period.

Prerequisite: At least 7 days of API calls through the Cognocient proxy so the Advisor has enough data to analyse.


Step 1 — Open AI Advisor

Go to Dashboard → AI Advisor (or click Recommendations in the sidebar).

The Advisor analyses your call patterns and ranks recommendations by estimated monthly saving. Each card shows:

  • What the issue is — e.g. "chatbot uses GPT-4o for 3-token classification outputs"
  • Estimated saving — e.g. "$420/month"
  • Confidence — based on how many calls it's seen
  • One-click apply — creates a routing rule automatically

Start with the highest-saving recommendation at the top of the list.

Step 2 — Apply the top recommendation

Click Apply on the top recommendation. You'll see a preview of what changes:

  • Model downgrade — e.g. GPT-4o → GPT-4o-mini for the flagged feature
  • Scope — which feature and department the rule applies to
  • Estimated impact — how many calls per day will be affected

Confirm and click Apply. The routing rule is created instantly — no deployment, no code change.

How it works

Cognocient creates a routing rule that intercepts calls matching the scope and rewrites the model in the request before forwarding to the provider. Your code keeps sending gpt-4o. The proxy silently sends gpt-4o-mini. Your code sees the same response format.

Step 3 — Check the result after 48 hours

Come back to AI Advisor in 2 days. The recommendation will show:

  • Actual saving so far — real dollars from calls that hit the new routing rule
  • Quality signal — if you've tagged calls with X-Cost-Outcome, the Advisor shows whether quality held up

If the feature's output quality has held up, keep the rule. If something degraded, click Revert to restore the original model — one click, no deployment.

Step 4 — Work through the list

Repeat for each remaining recommendation. Common findings:

FindingTypical saving
GPT-4o used for short classification outputs60–85% cost reduction
Identical prompts sent repeatedly (no caching)75–90% via prompt cache
Batch-eligible workload using real-time endpoint50% with Batch API
Context window bloating after 5 turns20–40% with context trimming

If you don't see recommendations yet

The Advisor needs enough data to be confident. If your call volume is low:

  1. Make sure calls are going through the proxy (check Live Calls tab)
  2. Make sure you've added X-Cost-Feature headers — without them, all calls look like a single unattributed pool
  3. Wait 7+ days for call patterns to stabilise

See Tag Your First AI Call if attribution isn't set up yet.

On this page