What is Cognocient?
Cognocient is an AI Spend Decision Intelligence platform. Real-time LLM cost attribution, pre-call budget enforcement, and CFO-ready board reports. Setup in 2 minutes.
Cognocient is an AI spend intelligence platform that proxies every API call your application makes to AI providers — OpenAI, Anthropic, Google Gemini, Mistral, Groq, Together AI, and more. Every request is observed, attributed to a feature or team, analysed for waste, and surfaced in real-time dashboards your CFO can actually use.
The integration is a single configuration change. No new SDKs, no new dependencies, no application code to rewrite.
The problem
58% of organizations describe their AI costs as a "black box" (Capgemini Research Institute, 2025). Engineering teams see token counts in Datadog. Finance teams get a consolidated line item on the credit card bill — weeks later. Nobody can answer:
- Which product feature is driving the cost spike?
- Is this spend waste, or is it generating revenue?
- Are we about to breach our AI budget before the month ends?
Cognocient closes all three gaps — automatically, in real time, without requiring your developers to instrument anything manually.
How Cognocient works
Your application code points at https://api.cognocient.com/v1 instead of the AI provider directly. Cognocient:
- Forwards the request to the original provider — same API surface, same models, same response format. Your application code sees zero difference.
- Records metadata — model, token counts, cost, latency, and your optional attribution headers. Prompt and response content are never stored.
- Runs analytics in-path — waste detection, budget enforcement, anomaly scoring, and cache hit analysis — before the response returns to your app.
- Surfaces insights — into the Cognocient dashboard, your BI tools via FOCUS 1.1 export, or your alerting pipeline via webhooks.
The proxy adds 10–30ms of overhead. Budget checks run in Redis (sub-millisecond). Semantic cache hits return in under 10ms — often faster than the provider would have responded.
One change unlocks everything
Change one line. That's it.
2-minute setup
No SDKs to install. No infrastructure to manage. If you already use the OpenAI SDK — or any SDK that accepts a base_url / baseURL — you are already compatible. The same URL works for OpenAI, Anthropic, Gemini, Mistral, Groq, and Together AI.
Key capabilities
Cognocient is built around six capability pillars. Each unlocks a different part of the AI spend visibility problem.
| Capability | What you get |
|---|---|
| Spend Attribution | Tag every AI call by feature, team, session, and GL account using HTTP headers. Slice and dice costs any way finance needs. |
| Workstream Attribution | Tag calls with X-Cost-Session: JIRA-1234 or X-Cost-Session: pr-456 to see AI cost per story, PR, or agent task run — down to the cent. |
| Waste Detection | Five waste categories detected automatically: over-sized models, redundant calls, missed cache opportunities, anomalous usage spikes, and bloated context windows. |
| AI Investment ROI | Live dashboard panel showing investment vs. waste split, efficiency score (0–100), and a one-sentence board summary generated from your actual spend data. |
| Budget Enforcement | Hard budget limits enforced at the proxy — before the charge reaches your provider bill. Hierarchical: run → feature → department → org. The tightest limit wins. |
| max_tokens Clamping | Budget reservations are enforced physically, not just logically. Forwarded max_tokens is clamped to what the remaining budget can actually afford, eliminating mid-stream overshoot. |
| AI Optimization | Semantic caching via pgvector HNSW, prompt-cache routing, and AI-generated one-click recommendations that right-size your spend with a single click. |
| Agent & MCP Attribution | Full cost tree across multi-agent workflows: which agent called which tool, how much each step cost, and which team or feature was responsible. |
| Reports & Exports | Board-ready PDF executive reports with 3 narrative tones (Board/CFO, Technical Finance, Engineering). FOCUS 1.1 CSV export. Scheduled monthly delivery to finance teams. |
Finding features in the dashboard
Each capability maps to a specific dashboard page. Here's where to look:
| Capability | Dashboard page | How to access |
|---|---|---|
| Spend Attribution | Engineering Dashboard | Spend by Department chart (bottom left) |
| Workstream Attribution | Workstreams | /dashboard/workstreams |
| Waste Detection | Executive Overview | Recoverable Waste widget (middle section) |
| Waste Detection | AI Insights | /dashboard/insights |
| AI Investment ROI | Engineering Dashboard | AI Investment ROI panel (below Maturity Score) |
| Budget Enforcement | Budgets | /budgets |
| Budget Enforcement | Engineering Dashboard | Health bar → "Budget Usage %" |
| Budget Status API | — | GET /api/budgets/status for orchestration layers |
| AI Optimization | Recommendations | /dashboard/recommendations |
| Agent Attribution | Engineering Dashboard | Agent Runs table |
| Reports & Exports | Reports | /reports → Generate Report or Scheduled Delivery tab |
Read the full Dashboard Walkthrough →
Who Cognocient is for
Engineering teams use Cognocient to understand which features drive AI costs, catch runaway usage before it hits the bill, right-size models with one-click recommendations, and debug anomalies with per-request attribution trails.
FinOps and finance teams use Cognocient to allocate AI spend to the right cost centres, enforce monthly budgets in real time, produce chargeback reports for internal GL accounts, and export FOCUS-compliant data into existing cloud cost management platforms.
CTOs and CFOs use Cognocient's Executive View to answer the board question every quarter: "We spent $X on AI last month — what did we actually get for it?" Unit economics (cost per ticket resolved, cost per report generated, cost per sale influenced) make the answer concrete.
10-day free trial — no credit card required
Full platform access. All 7 providers supported. Your first attribution dashboard in under 5 minutes. Get started for free
Supported AI providers
Cognocient proxies all major providers through a single base_url configuration. No separate keys or SDK changes per provider.
| Provider | Models | Notes |
|---|---|---|
| OpenAI | GPT-4o, o1, o3, GPT-4o-mini | All models including latest releases |
| Anthropic | Claude 3.5, Claude 4 family | claude-opus-4, claude-sonnet-4, claude-haiku-4 |
| Google Gemini | Gemini 1.5, 2.0 Flash, 2.5 Pro | Via OpenAI-compatible endpoint |
| Mistral | Mistral Large, Nemo, Codestral | All Mistral models |
| Groq | Llama 3.1, Mixtral, Gemma | Ultra-fast inference |
| Together AI | 100+ open-source models | Llama, DeepSeek, Qwen, and more |
See Supported Providers for SDK-specific configuration examples for each provider.
Security and privacy
Cognocient is built on a metadata-only logging principle:
- Prompt and response content is never stored. We log model, token counts, cost, latency, and your attribution headers — nothing else.
- Provider API keys are encrypted at rest using Fernet symmetric encryption. Keys are never logged or exposed in API responses.
- All traffic is encrypted in transit (TLS 1.2+).
- SOC 2 Type II audit is in progress. See Security & Privacy for the full control list.
Browse the documentation
| Section | What it covers |
|---|---|
| Quickstart | Up and running in 2 minutes. First attribution dashboard in under 5 minutes. |
| Spend Attribution | HTTP headers, session tracking, GL account mapping, and chargeback reports. |
| Cost Intelligence | Automatic waste detection, investment classification, and cost-per-outcome metrics. |
| AI Optimization | Semantic caching, prompt-cache routing, batch API routing, and one-click recommendations. |
| Agent Workflows | MCP tool call attribution, A2A handoffs, and full workflow cost trees. |
| Budget & Control | Hard budget limits enforced before charges reach your bill. |
| Reports & Exports | Board-ready PDFs and FOCUS 1.1 export for major cost management platforms. |
| Executive View | CFO-level dashboards with unit economics and strategic spend insights. |
| Security & Privacy | Full security control list, encryption details, and compliance status. |
How-to guides
Step-by-step walkthroughs for the most common tasks. Each is self-contained and takes under 30 minutes.
| Guide | Time |
|---|---|
| Tag your first AI call | 5 min — add 2 headers and see per-feature spend immediately |
| Set a monthly spending limit | 5 min — hard budget enforced at the proxy |
| Cut your AI bill with one-click recommendations | 30 min — AI Advisor finds savings, one click applies them |
| Get Slack alerts on spend spikes | 10 min — anomaly and budget alerts in your Slack channel |
| Track cost per agent run | 15 min — per-execution cost breakdown and per-run budget |
| Map spend to GL accounts | 10 min — finance-ready chargeback reports with GL codes |
| Set up hierarchical budgets | 15 min — enforce limits at every level simultaneously |
| Debug runaway agent loops | — find, stop, and prevent looping agents |
| Prepare an AI ROI board report | 30 min — cost per outcome, efficiency trend, board-ready PDF |
| Enable semantic caching | 5 min — eliminate duplicate calls, 25–35% bill reduction |
| Track cost per business outcome | 15 min — cost per ticket, contract, or conversion with one header |
| Set up MCP agent attribution | 15 min — full cost tree for Claude + MCP tool call workflows |
| Apply routing rules to auto-switch models | 10 min — no code changes, savings apply immediately |
| Schedule monthly AI spend reports | 5 min — automated PDF delivery to your finance team |
| Export AI spend to your FinOps platform | 10 min — FOCUS 1.1 into Apptio, CloudZero, or Spot |
| Migrate from LiteLLM, Langfuse, or Helicone | 15 min — side-by-side comparison and first-5-minutes checklist |
Frequently asked questions
How long does integration take?
Two minutes. Change your OpenAI base_url to api.cognocient.com/v1 and replace your API key with your Cognocient proxy key (sk-cog-...). All existing SDK calls continue to work unchanged — same method signatures, same response objects, same streaming behaviour.
Does Cognocient add latency to my API calls?
The proxy adds 10–30ms of overhead per call. Budget enforcement runs in Redis (sub-millisecond, not in the critical path). Semantic caching reduces latency for cache-hit requests — results return in under 10ms versus seconds for a live API call. For most production applications, the overhead is imperceptible.
Is my data secure?
Cognocient never stores prompt or response content. We log metadata only: model, token counts, cost, latency, and your attribution headers. Provider API keys are encrypted at rest using Fernet symmetric encryption. All traffic is encrypted in transit. SOC 2 Type II audit is in progress. See Security for the full control list.
What happens when my trial ends?
After 10 days, API calls return a 402 Payment Required error. No data is deleted. Upgrade to any paid plan to resume immediately. Your attribution history is retained for 30 days after trial expiry.
Can I use Cognocient with Anthropic, Google, or Mistral?
Yes. Cognocient proxies OpenAI, Anthropic, Google Gemini, Mistral, Groq, and Together AI. Use the same base_url for all providers. See Supported Providers for SDK-specific configuration examples.
How does budget enforcement actually work?
Cognocient checks your configured budget in Redis before every API call reaches the provider. If the budget is exceeded, the call is either blocked (returns 429), degraded (auto-switches to a cheaper model), or allowed with a webhook alert — your choice per budget rule. This is pre-call enforcement, not a billing alert 24 hours after the fact.
What is the "Investment vs. Waste" classification?
Every API call is classified as "Investment" (generating revenue, serving a user, improving a product) or "Waste" (over-engineered, redundant, inefficient, or purely exploratory) using a 3-tier heuristic engine. You can override classifications manually or train the classifier with feedback signals. The AI Investment ROI panel on the Engineering Dashboard shows this split live — investment protected, recoverable waste, efficiency score, and a one-sentence board summary. See Cost Intelligence for the full classification methodology.
Do I need to change anything in my Anthropic SDK calls?
Yes — the Anthropic SDK uses a different base URL option. See the Quickstart for the exact SDK configuration for Anthropic, Gemini, Mistral, and Groq. The pattern is the same: one base URL change, everything else stays identical.
Ready to start? Get started for free