How-to Guides

How do I set a monthly AI spending limit?

Create a hard budget that blocks or degrades AI calls before charges hit your provider bill — in under 5 minutes.

Goal: A monthly spending limit enforced at the proxy — so overspend is impossible, not just alerted.

Time: 5 minutes.


Step 1 — Go to Budgets and create a new budget

Open Dashboard → Budgets → New Budget.

Fill in the form:

FieldWhat to enter
NameSomething descriptive — Engineering monthly, chatbot feature, Org total
Monthly limitYour budget in USD
ScopeSee below
EnforcementSee below
Alert thresholds50, 80, 100 (you'll get email alerts at each)

Choosing the scope:

ScopeUse when
GlobalOne limit for your entire organisation's AI spend
DepartmentLimit a team — needs X-Cost-Department: engineering in your calls
FeatureLimit a single product feature — needs X-Cost-Feature: chatbot in your calls

Choosing enforcement:

ModeWhat happens when the limit is hit
AlertCall goes through, you get a notification. Good for monitoring.
DegradeCall is rerouted to a cheaper model. Good for production features.
BlockCall returns 429. Good for experimental features and agents.

Start with Alert, graduate to Block

If you're not sure which to pick, start with Alert for 2 weeks to understand normal spend patterns. Then switch to Block or Degrade once you're confident in the limit.

Step 2 — Save and verify

Click Save. The budget appears on the Budgets page immediately.

To verify it's active, make a test API call through the proxy and check the response headers:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "hello"}],
    extra_headers={"X-Cost-Feature": "chatbot"}
)
# Header shows remaining budget across all matching limits
print(response.headers.get("x-cog-budget-remaining"))

You'll see the remaining budget in USD for this call's scope.

Step 3 — Test the enforcement (optional)

To verify the block mode works: temporarily set the budget to $0.01, make a call, confirm you get a 429. Then restore the real limit.


Useful follow-ups

  • Multiple levels — Add a department budget on top of feature budgets. The tightest limit always wins. See Hierarchical Budgets.
  • Agent workflows — Add a per-run limit so one runaway agent can't consume your entire feature budget. See Track Cost Per Agent Run.
  • Get alerted in Slack — Budget alerts can route to Slack instead of (or in addition to) email. See Slack Alerts.

On this page