How do I debug and stop runaway agent loops?
An agent loop that calls GPT-4o 400 times in 60 seconds is a $200 mistake. Here's how to find it in the dashboard, stop it, and prevent it from happening again.
Goal: Identify which agent is looping, stop the active spend, and set up enforcement so it can't recur.
Prerequisites: Agent calls going through the Cognocient proxy (so they're visible in the dashboard).
Step 1 — Spot the loop in Live Calls
Go to Dashboard → Live Calls. Sort by timestamp (newest first). A runaway loop shows up as a dense cluster of calls from the same session ID, hitting the same model repeatedly, each within a few seconds of the last.
Signs of a loop:
- Same
session_idappearing 50+ times in under 5 minutes - Each call has very similar token counts (the prompt hasn't changed)
- Latency is low (the model is responding fine — it's your code that keeps re-calling)
Click any call in the cluster to see the full metadata including the X-Cost-Feature and X-Cost-Session headers. This tells you exactly which feature and which specific run is looping.
Step 2 — Check if the circuit breaker fired
Go to Dashboard → Engineering Dashboard and check the Circuit Breaker metric in the System Health bar. If it shows a trip count > 0, the velocity limit already caught this loop and started blocking calls.
If it didn't fire, the loop may be under the default velocity threshold. Continue to Step 3 to tighten it.
Step 3 — Stop the active loop immediately
If the loop is still running:
Option A — Block the specific session (fastest): In Live Calls, click the session ID → Block session. All further calls from this session ID return 429 immediately.
Option B — Cut the feature budget: In Dashboard → Budgets, find the budget for this feature and temporarily set it to $0. All calls tagged with that feature are blocked instantly until you raise the limit.
Option C — Revoke the proxy key: In Settings → API Keys, click the key being used and toggle it off. All calls using that key stop immediately. Use this for emergencies — it affects all features on that key.
Step 4 — Set velocity limits to prevent recurrence
Go to Dashboard → Budgets → Guardrails and configure velocity enforcement for the affected feature:
| Setting | Recommended value |
|---|---|
| Velocity window | 60 seconds |
| TPM threshold | 3× your normal baseline |
| Action | Block (not just alert) |
| Alert | Slack notification on trip |
The circuit breaker uses a sliding 60-second window. If tokens-per-minute exceeds your baseline multiplier, calls are blocked and you get a Slack alert.
Set the multiplier to 3× rather than 10×. A factor-of-3 spike is almost always a loop, not a legitimate traffic surge. A factor-of-10 spike has usually already cost you hundreds of dollars before the alert fires.
Step 5 — Add a budget check in your agent loop code
The most robust protection is a pre-flight budget check before each agent step. Add this to your agent's step-execution function:
Step 6 — Review in the dashboard after
Once the loop is stopped, go to Dashboard → Feature Intelligence and filter to the affected feature. You'll see the exact spike in the cost-over-time chart, with the loop visible as a vertical cost cliff. This view also shows your average cost per call before and during the loop — useful for estimating the total impact.
Preventing loops from the start
For any new agent workflow, apply these defaults before it goes to production:
- Per-session budget — Cap the cost of a single run (e.g., $0.50 per document processing job).
- Feature-level Block budget — Hard limit for the feature per month, not just Alert.
- Budget pre-check in code — The
budget_ok()function above in every agent loop. - Velocity circuit breaker — Set at 3× baseline, action = Block.
See hierarchical budgets for how to set up all four levels together.
Related articles
Tag Your First AI Call
Add 2 headers to your existing code and see per-feature spend in under 5 minutes.
Set a Monthly Spending Limit
Create a hard budget enforced at the proxy before charges reach your provider bill.
Cut Your AI Bill with One Click
Use AI Advisor recommendations to apply model downgrades and caching without code changes.