How-to Guides

How do I set up hierarchical AI budgets?

Enforce spending limits at every level — per run, per feature, per department, and org-wide — so a single agent loop can never blow your quarterly AI budget regardless of what individual feature budgets allow.

Goal: A budget hierarchy where the org ceiling can never be breached, even if every individual child budget still has room.

The problem this solves: Without hierarchy, 50 agent runs × $0.49 each = $24.50 against a $20 department budget. Every individual run passes (it's under its own $0.50 limit) while the department budget is blown. Hierarchy means the parent ceiling wins.

Prerequisites: At least one application making calls through the proxy.


Step 1 — Create the org-level budget (the ceiling)

Go to Dashboard → Budgets → New Budget.

FieldValue
NameOrg — AI Total
ScopeGlobal
Monthly limitYour total quarterly AI budget ÷ 3
EnforcementDegrade (recommended for org level)
Alerts70%, 85%, 95%

This is the hard ceiling. No combination of child budgets can exceed it.

Step 2 — Create department budgets

Create one budget per department that uses AI. Each is scoped to a department header value.

FieldValue
NameEngineering
ScopeDepartment
Scope valueengineering (matches X-Cost-Department: engineering)
Monthly limitDepartment's AI allocation
EnforcementAlert (teams manage their own spend)

Repeat for each department: customer-success, product, sales, etc.

Step 3 — Create feature budgets for high-risk features

For agent workflows, experimental features, or anything that can loop, add a feature-level budget.

FieldValue
NameAgent — document-processor
ScopeFeature
Scope valuedocument-processor
Monthly limit$200 (conservative while monitoring)
EnforcementBlock

Use Block mode for experimental features and agents. Use Degrade for production features where you want graceful degradation instead of hard failures. See enforcement modes for the full trade-off.

Step 4 — Add per-run limits for agents

For any agent that runs multi-step workflows, add a per-run budget. This caps the cost of a single agent execution, independent of the monthly feature budget.

Go to the feature budget you created, click Add Sub-limit, and set a per-session limit:

FieldValue
Sub-limit typePer session
Limit$0.50 per run
EnforcementBlock

Tag agent runs with a session ID so Cognocient can group calls per run:

import uuid
 
run_id = f"doc-proc-{uuid.uuid4()}"
 
for step in agent_steps:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=step.messages,
        extra_headers={
            "X-Cost-Feature": "document-processor",
            "X-Cost-Session": run_id,   # groups all steps in this run
        }
    )

Step 5 — Test the hierarchy

Make a test call tagged with a feature that has budgets at all levels:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "test"}],
    extra_headers={
        "X-Cost-Department": "engineering",
        "X-Cost-Feature":    "document-processor",
        "X-Cost-Session":    "test-run-001",
    }
)
print(response.headers.get("x-cog-budget-remaining"))

The response header x-cog-budget-remaining shows the most restrictive remaining budget across all matching levels. A call is only allowed if every matching budget level has room.

Step 6 — Check budget status before each agent step (advanced)

For long-running agents, poll the budget status API before each step to avoid being stopped mid-execution:

import httpx
 
def check_budget(feature: str, session_id: str) -> bool:
    resp = httpx.get(
        "https://api.cognocient.com/api/budgets/status",
        headers={
            "Authorization": f"Bearer {COG_API_KEY}",
            "X-Cost-Feature": feature,
            "X-Cost-Session": session_id,
        }
    )
    status = resp.json()
    if not status.get("can_proceed", True):
        return False
    # Stop gracefully if any budget is under 10% remaining
    return all(
        b["percent_used"] < 90
        for b in status.get("budgets", [])
    )
 
for step in agent_steps:
    if not check_budget("document-processor", run_id):
        return {"status": "budget_limit_reached", "steps_completed": completed}
    # ... proceed with step

Done

Your hierarchy is active. The enforcement chain is: run → feature → department → org. Any level being exhausted stops the call — the most restrictive level always wins.

Monitor budget consumption across all levels in Dashboard → Budgets. The parent budget cards show aggregate child spend in real time.

On this page