Getting Started

How does the Cognocient proxy architecture work?

How Cognocient works — proxy architecture, data flow, and what each dashboard section is for. The mental model you need before diving into individual features.

Cognocient sits transparently between your application and your AI provider. Every request passes through in 10–30ms, with budget checking, caching, attribution tagging, and metadata logging happening in-path — before the response returns to your app.

The core idea

Cognocient sits between your application and the AI provider. One URL change is all it takes:

Your app  →  api.cognocient.com/v1  →  OpenAI / Anthropic / Gemini / ...

Every request passes through in 10–30ms overhead. Cognocient records metadata (model, tokens, cost, latency, your attribution headers), enforces any budget rules, and forwards the request. Your application code sees zero difference — same response format, same streaming behaviour, same error codes.

What Cognocient does on each request

  1. Budget check — Redis lookup (sub-millisecond). Over the limit? Block or degrade based on your configuration.
  2. Cache lookup — pgvector similarity search. Cache hit? Return in under 10ms at $0 cost.
  3. Routing rule check — Should this call be redirected to a cheaper model?
  4. Forward to provider — With your real provider key, which Cognocient decrypted in memory.
  5. Log metadata — Model, tokens, cost, latency, attribution headers. Never prompt content.
  6. Return response — Transparently forwarded to your app.

Dashboard sections

PageWhat it's forWhere to find it
Engineering DashboardDaily operations — cost by feature, waste %, maturity score, ROI panel/dashboard
Live CallsReal-time call feed — verify attribution, debug cost spikes/calls
BudgetsCreate and manage spending limits at any scope/budgets
Waste DetectionFour waste categories broken down by feature/waste
AI AdvisorOne-click cost reduction recommendations/dashboard/recommendations
AnomaliesStatistical cost spikes with root-cause analysis/dashboard/anomalies
Feature IntelligencePer-feature ROI, waste %, and efficiency score/dashboard/feature-intelligence
Sessions / WorkstreamsCost per conversation, JIRA story, or PR/dashboard/workstreams
Cost Forecast30/60/90-day spend projections by feature/forecast
Executive ViewCFO-level dashboard with unit economics/dashboard/executive
ReportsBoard-ready PDF reports with AI narrative/reports
Routing RulesAutomatic model downgrade rules/routing-rules
Outcomes & ROICost per business outcome (ticket, contract, etc.)/outcomes

The 2-minute morning check

The most effective teams scan five things each morning before standup:

  1. KPI strip — Waste % up from yesterday? Budget health below 30%? Any KPI in red is worth 30 seconds of investigation.
  2. Open anomalies — Zero means you're clear. Any anomaly has a root-cause hypothesis already prepared — read it and dismiss or escalate.
  3. Recommendations — Apply anything with >80% confidence. Each applied recommendation creates a routing rule automatically — no code change.
  4. Spend trend — Is the 30-day line flat or declining? An unexpected uptick that didn't trigger an anomaly alert is still worth a quick Live Calls drill-down.
  5. Budget health — Any budget below 20% remaining needs attention before it hits enforcement mode.

What you don't need to do

Cognocient does not require you to:

  • Change your logging pipeline
  • Modify your data warehouse
  • Wrap individual SDK calls with metadata
  • Migrate historical logs
  • Install any new SDKs

The proxy handles all of this automatically. Your only code change is the base_url and api_key. See Quickstart for the 2-minute setup.

Zero-code overhead after setup

Attribution headers are optional add-ons, not requirements. You get cost, model, tokens, and latency tracking on day one — before you add a single header.


Next steps: Quickstart · Attribution Headers · Dashboard Walkthrough

On this page