How do I enforce AI spend budgets before they exceed limits?
Set per-feature, per-department, and per-run budgets. Block, degrade to a cheaper model, or alert before AI spend exceeds your limit.
Cognocient checks every API call against your configured budgets before the call reaches the provider. Set a budget in the dashboard and choose your enforcement mode: Block (hard stop), Degrade (auto-switch to cheaper model), or Alert (allow through, flag it).
Most cost tools read billing exports that are 24–48 hours late. Cognocient checks budget before the call reaches the provider — so overspend is impossible, not just notifiable. The check is a single atomic Redis operation that runs in sub-millisecond time on every request.
What are the three budget enforcement modes?
| Mode | What happens when the limit is hit | Best for |
|---|---|---|
| Alert | Call goes through, notification sent | Baseline measurement, new features |
| Block | HTTP 429 returned — provider never sees the call | Experiments, agents, dev/test |
| Degrade | Request silently rerouted to a cheaper model | Production features with SLA requirements |
The Base plan supports up to 3 active budgets. Growth and Business plans have unlimited budgets.
How do I create a budget in Cognocient?
Go to Dashboard → Budgets → New Budget and fill in:
- Name — descriptive label, e.g.
chatbot feature,Engineering department,Org total - Scope — Global (all traffic), Feature (by
X-Cost-Feature), Department (byX-Cost-Department), Model, or User - Monthly limit — the spending cap in USD
- Enforcement mode — Alert, Block, or Degrade
- Alert thresholds — default 50%, 80%, 100% — you get email/Slack at each
Alert mode
All API calls pass through normally. Cognocient sends a notification when your configured threshold is reached.
Example Slack notification:
Block mode
When the budget is exhausted, Cognocient returns HTTP 429 immediately. The AI provider never sees the request — overspend is structurally impossible.
Handle it gracefully in your application:
Degrade mode (recommended for production)
When the budget threshold is reached, Cognocient transparently rewrites requests to a cheaper model. Your application keeps working — cost drops immediately.
| Original model | Degraded to | Cost saving |
|---|---|---|
gpt-4o | gpt-4o-mini | ~94% |
claude-opus | claude-sonnet-4-6 | ~80% |
claude-sonnet-4-6 | claude-haiku | ~91% |
gemini-1.5-pro | gemini-1.5-flash | ~87% |
The response includes headers so you can optionally surface a UI indicator:
Budget scopes
| Scope | What it applies to | Example |
|---|---|---|
| Global | All traffic through your account | Monthly org cap: $5,000 |
| Feature | Calls with matching X-Cost-Feature | chatbot ≤ $800/mo |
| Department | Calls with matching X-Cost-Department | customer-success ≤ $2,000/mo |
| Model | All calls using a specific model | gpt-4o ≤ $1,000/mo |
| User | Calls from a specific X-Cost-User | Per-seat billing caps |
Stack budgets. A feature budget of $800/mo and a department budget of $2,000/mo coexist — both are checked, the most restrictive wins. See Hierarchical Budgets for how to set up nested limits.
Per-run limits for agents
For agentic workloads, add a per-run limit so a single runaway agent execution can't consume your entire monthly budget.
Configure in Budgets → New Budget → Per-run limit. When a run exceeds its limit, only that run is blocked — other concurrent runs continue normally.
Pass X-Cost-Run-ID on all agent calls (see Track Cost Per Agent Run).
Check remaining budget before your next agent step
For long-running agents, query remaining budget before each step to exit gracefully instead of being hard-stopped mid-execution:
Next steps: Guardrails & Velocity Limits · Hierarchical Budgets · Waste Detection