How does the Cognocient proxy architecture work?
How Cognocient works — proxy architecture, data flow, and what each dashboard section is for. The mental model you need before diving into individual features.
Cognocient sits transparently between your application and your AI provider. Every request passes through in 10–30ms, with budget checking, caching, attribution tagging, and metadata logging happening in-path — before the response returns to your app.
The core idea
Cognocient sits between your application and the AI provider. One URL change is all it takes:
Every request passes through in 10–30ms overhead. Cognocient records metadata (model, tokens, cost, latency, your attribution headers), enforces any budget rules, and forwards the request. Your application code sees zero difference — same response format, same streaming behaviour, same error codes.
What Cognocient does on each request
- Budget check — Redis lookup (sub-millisecond). Over the limit? Block or degrade based on your configuration.
- Cache lookup — pgvector similarity search. Cache hit? Return in under 10ms at $0 cost.
- Routing rule check — Should this call be redirected to a cheaper model?
- Forward to provider — With your real provider key, which Cognocient decrypted in memory.
- Log metadata — Model, tokens, cost, latency, attribution headers. Never prompt content.
- Return response — Transparently forwarded to your app.
Dashboard sections
| Page | What it's for | Where to find it |
|---|---|---|
| Engineering Dashboard | Daily operations — cost by feature, waste %, maturity score, ROI panel | /dashboard |
| Live Calls | Real-time call feed — verify attribution, debug cost spikes | /calls |
| Budgets | Create and manage spending limits at any scope | /budgets |
| Waste Detection | Four waste categories broken down by feature | /waste |
| AI Advisor | One-click cost reduction recommendations | /dashboard/recommendations |
| Anomalies | Statistical cost spikes with root-cause analysis | /dashboard/anomalies |
| Feature Intelligence | Per-feature ROI, waste %, and efficiency score | /dashboard/feature-intelligence |
| Sessions / Workstreams | Cost per conversation, JIRA story, or PR | /dashboard/workstreams |
| Cost Forecast | 30/60/90-day spend projections by feature | /forecast |
| Executive View | CFO-level dashboard with unit economics | /dashboard/executive |
| Reports | Board-ready PDF reports with AI narrative | /reports |
| Routing Rules | Automatic model downgrade rules | /routing-rules |
| Outcomes & ROI | Cost per business outcome (ticket, contract, etc.) | /outcomes |
The 2-minute morning check
The most effective teams scan five things each morning before standup:
- KPI strip — Waste % up from yesterday? Budget health below 30%? Any KPI in red is worth 30 seconds of investigation.
- Open anomalies — Zero means you're clear. Any anomaly has a root-cause hypothesis already prepared — read it and dismiss or escalate.
- Recommendations — Apply anything with >80% confidence. Each applied recommendation creates a routing rule automatically — no code change.
- Spend trend — Is the 30-day line flat or declining? An unexpected uptick that didn't trigger an anomaly alert is still worth a quick Live Calls drill-down.
- Budget health — Any budget below 20% remaining needs attention before it hits enforcement mode.
What you don't need to do
Cognocient does not require you to:
- Change your logging pipeline
- Modify your data warehouse
- Wrap individual SDK calls with metadata
- Migrate historical logs
- Install any new SDKs
The proxy handles all of this automatically. Your only code change is the base_url and api_key. See Quickstart for the 2-minute setup.
Zero-code overhead after setup
Attribution headers are optional add-ons, not requirements. You get cost, model, tokens, and latency tracking on day one — before you add a single header.
Next steps: Quickstart · Attribution Headers · Dashboard Walkthrough
Related articles