Budgets & Control

How do I enforce AI spend budgets before they exceed limits?

Set per-feature, per-department, and per-run budgets. Block, degrade to a cheaper model, or alert before AI spend exceeds your limit.

Cognocient checks every API call against your configured budgets before the call reaches the provider. Set a budget in the dashboard and choose your enforcement mode: Block (hard stop), Degrade (auto-switch to cheaper model), or Alert (allow through, flag it).

Most cost tools read billing exports that are 24–48 hours late. Cognocient checks budget before the call reaches the provider — so overspend is impossible, not just notifiable. The check is a single atomic Redis operation that runs in sub-millisecond time on every request.

What are the three budget enforcement modes?

ModeWhat happens when the limit is hitBest for
AlertCall goes through, notification sentBaseline measurement, new features
BlockHTTP 429 returned — provider never sees the callExperiments, agents, dev/test
DegradeRequest silently rerouted to a cheaper modelProduction features with SLA requirements

The Base plan supports up to 3 active budgets. Growth and Business plans have unlimited budgets.

How do I create a budget in Cognocient?

Go to Dashboard → Budgets → New Budget and fill in:

  • Name — descriptive label, e.g. chatbot feature, Engineering department, Org total
  • Scope — Global (all traffic), Feature (by X-Cost-Feature), Department (by X-Cost-Department), Model, or User
  • Monthly limit — the spending cap in USD
  • Enforcement mode — Alert, Block, or Degrade
  • Alert thresholds — default 50%, 80%, 100% — you get email/Slack at each

Alert mode

All API calls pass through normally. Cognocient sends a notification when your configured threshold is reached.

Example Slack notification:

⚠️ Budget Alert — customer-success
Feature ticket-resolver reached 90% of $500/month budget.
Current spend: $452.10 | Remaining: $47.90
Calls continuing normally.

Block mode

When the budget is exhausted, Cognocient returns HTTP 429 immediately. The AI provider never sees the request — overspend is structurally impossible.

{
  "error": {
    "message": "Budget exceeded for feature 'chatbot'. Monthly limit: $500.00",
    "type": "budget_exceeded",
    "feature": "chatbot",
    "current_spend": 502.14,
    "limit": 500.00,
    "reset_at": "2026-07-01T00:00:00Z"
  }
}

Handle it gracefully in your application:

When the budget threshold is reached, Cognocient transparently rewrites requests to a cheaper model. Your application keeps working — cost drops immediately.

Original modelDegraded toCost saving
gpt-4ogpt-4o-mini~94%
claude-opusclaude-sonnet-4-6~80%
claude-sonnet-4-6claude-haiku~91%
gemini-1.5-progemini-1.5-flash~87%

The response includes headers so you can optionally surface a UI indicator:

x-cog-degraded: true
x-cog-original-model: gpt-4o
x-cog-degraded-model: gpt-4o-mini

Budget scopes

ScopeWhat it applies toExample
GlobalAll traffic through your accountMonthly org cap: $5,000
FeatureCalls with matching X-Cost-Featurechatbot ≤ $800/mo
DepartmentCalls with matching X-Cost-Departmentcustomer-success ≤ $2,000/mo
ModelAll calls using a specific modelgpt-4o ≤ $1,000/mo
UserCalls from a specific X-Cost-UserPer-seat billing caps

Stack budgets. A feature budget of $800/mo and a department budget of $2,000/mo coexist — both are checked, the most restrictive wins. See Hierarchical Budgets for how to set up nested limits.

Per-run limits for agents

For agentic workloads, add a per-run limit so a single runaway agent execution can't consume your entire monthly budget.

Configure in Budgets → New Budget → Per-run limit. When a run exceeds its limit, only that run is blocked — other concurrent runs continue normally.

Pass X-Cost-Run-ID on all agent calls (see Track Cost Per Agent Run).

Check remaining budget before your next agent step

For long-running agents, query remaining budget before each step to exit gracefully instead of being hard-stopped mid-execution:

import httpx
 
async def check_budget(feature: str, run_id: str) -> bool:
    status = (await httpx.AsyncClient().get(
        "https://api.cognocient.com/api/budgets/status",
        headers={"Authorization": f"Bearer {COGNOCIENT_KEY}"},
        params={"feature": feature, "run_id": run_id},
    )).json()
 
    if not status["can_proceed"]:
        return False
    # Stop if any budget is below $0.05 remaining
    return all(b["remaining_usd"] > 0.05 for b in status.get("budgets", []))
 
# In your agent loop:
for step in planned_steps:
    if not await check_budget("research-agent", run_id):
        return {"status": "budget_limit_reached", "completed": completed}
    result = await execute_step(step)

Next steps: Guardrails & Velocity Limits · Hierarchical Budgets · Waste Detection

On this page