How does the Cognocient proxy architecture work?

How Cognocient works — proxy architecture, data flow, and what each dashboard section is for. The mental model you need before diving into individual features.

Cognocient sits transparently between your application and your AI provider. Every request passes through in 10–30ms, with budget checking, caching, attribution tagging, and metadata logging happening in-path — before the response returns to your app.

The core idea

Cognocient sits between your application and the AI provider. One URL change is all it takes:

Your app  →  api.cognocient.com/v1  →  OpenAI / Anthropic / Gemini / ...

Every request passes through in 10–30ms overhead. Cognocient records metadata (model, tokens, cost, latency, your attribution headers), enforces any budget rules, and forwards the request. Your application code sees zero difference — same response format, same streaming behaviour, same error codes.

What Cognocient does on each request

Budget check — Redis lookup (sub-millisecond). Over the limit? Block or degrade based on your configuration.
Cache lookup — pgvector similarity search. Cache hit? Return in under 10ms at $0 cost.
Routing rule check — Should this call be redirected to a cheaper model?
Forward to provider — With your real provider key, which Cognocient decrypted in memory.
Log metadata — Model, tokens, cost, latency, attribution headers. Never prompt content.
Return response — Transparently forwarded to your app.

Dashboard sections

Page	What it's for	Where to find it
Engineering Dashboard	Daily operations — cost by feature, waste %, maturity score, ROI panel	`/dashboard`
Live Calls	Real-time call feed — verify attribution, debug cost spikes	`/calls`
Budgets	Create and manage spending limits at any scope	`/budgets`
Waste Detection	Four waste categories broken down by feature	`/waste`
AI Advisor	One-click cost reduction recommendations	`/dashboard/recommendations`
Anomalies	Statistical cost spikes with root-cause analysis	`/dashboard/anomalies`
Feature Intelligence	Per-feature ROI, waste %, and efficiency score	`/dashboard/feature-intelligence`
Sessions / Workstreams	Cost per conversation, JIRA story, or PR	`/dashboard/workstreams`
Cost Forecast	30/60/90-day spend projections by feature	`/forecast`
Executive View	CFO-level dashboard with unit economics	`/dashboard/executive`
Reports	Board-ready PDF reports with AI narrative	`/reports`
Routing Rules	Automatic model downgrade rules	`/routing-rules`
Outcomes & ROI	Cost per business outcome (ticket, contract, etc.)	`/outcomes`

The 2-minute morning check

The most effective teams scan five things each morning before standup:

KPI strip — Waste % up from yesterday? Budget health below 30%? Any KPI in red is worth 30 seconds of investigation.
Open anomalies — Zero means you're clear. Any anomaly has a root-cause hypothesis already prepared — read it and dismiss or escalate.
Recommendations — Apply anything with >80% confidence. Each applied recommendation creates a routing rule automatically — no code change.
Spend trend — Is the 30-day line flat or declining? An unexpected uptick that didn't trigger an anomaly alert is still worth a quick Live Calls drill-down.
Budget health — Any budget below 20% remaining needs attention before it hits enforcement mode.

What you don't need to do

Cognocient does not require you to:

Change your logging pipeline
Modify your data warehouse
Wrap individual SDK calls with metadata
Migrate historical logs
Install any new SDKs

The proxy handles all of this automatically. Your only code change is the base_url and api_key. See Quickstart for the 2-minute setup.

Zero-code overhead after setup

Attribution headers are optional add-ons, not requirements. You get cost, model, tokens, and latency tracking on day one — before you add a single header.

Next steps: Quickstart · Attribution Headers · Dashboard Walkthrough

The core idea

What Cognocient does on each request

Dashboard sections

The 2-minute morning check

What you don't need to do

On this page