What is Cognocient?

Cognocient is an AI Spend Decision Intelligence platform. Real-time LLM cost attribution, pre-call budget enforcement, and CFO-ready board reports. Setup in 2 minutes.

Cognocient is an AI spend intelligence platform that proxies every API call your application makes to AI providers — OpenAI, Anthropic, Google Gemini, Mistral, Groq, Together AI, and more. Every request is observed, attributed to a feature or team, analysed for waste, and surfaced in real-time dashboards your CFO can actually use.

The integration is a single configuration change. No new SDKs, no new dependencies, no application code to rewrite.

The problem

58% of organizations describe their AI costs as a "black box" (Capgemini Research Institute, 2025). Engineering teams see token counts in Datadog. Finance teams get a consolidated line item on the credit card bill — weeks later. Nobody can answer:

Which product feature is driving the cost spike?
Is this spend waste, or is it generating revenue?
Are we about to breach our AI budget before the month ends?

Cognocient closes all three gaps — automatically, in real time, without requiring your developers to instrument anything manually.

How Cognocient works

Your application code points at https://api.cognocient.com/v1 instead of the AI provider directly. Cognocient:

Forwards the request to the original provider — same API surface, same models, same response format. Your application code sees zero difference.
Records metadata — model, token counts, cost, latency, and your optional attribution headers. Prompt and response content are never stored.
Runs analytics in-path — waste detection, budget enforcement, anomaly scoring, and cache hit analysis — before the response returns to your app.
Surfaces insights — into the Cognocient dashboard, your BI tools via FOCUS 1.1 export, or your alerting pipeline via webhooks.

┌─────────────────────────────────────────────────────────────────┐
│                       YOUR APPLICATION                          │
│  client = OpenAI(api_key="sk-cog-...",                          │
│                  base_url="https://api.cognocient.com/v1")  ←─  │
│                     the only change                             │
└───────────────────────────┬─────────────────────────────────────┘
                            │  POST /v1/chat/completions
                            │  X-Cost-Feature: chatbot
                            │  X-Cost-Department: engineering
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                    COGNOCIENT PROXY                             │
│                                                                 │
│  ① Budget check (Redis, sub-ms) ── over limit? → 429 / degrade │
│  ② Cache lookup (pgvector HNSW) ── hit? → return in <10ms      │
│  ③ max_tokens clamping ─────────── cap to remaining budget     │
│  ④ Attribution tagging ─────────── feature, dept, session      │
│  ⑤ Forward to provider ─────────── with real provider key      │
│  ⑥ Log metadata ────────────────── cost, tokens, latency       │
│                                                                 │
│  10–30ms overhead  ·  Prompt content never stored              │
└───────────────────────────┬─────────────────────────────────────┘
                            │
               ┌────────────┼────────────┐
               ▼            ▼            ▼
          OpenAI       Anthropic      Gemini / Mistral
          GPT-4o      Claude 4       / Groq / Together

The proxy adds 10–30ms of overhead. Budget checks run in Redis (sub-millisecond). Semantic cache hits return in under 10ms — often faster than the provider would have responded.

One change unlocks everything

Change one line. That's it.

from openai import OpenAI
 
client = OpenAI(
    api_key="sk-cog-YOUR-KEY",                    # Your Cognocient proxy key
    base_url="https://api.cognocient.com/v1"       # ← the only change
)
 
# All existing code works unchanged ✓
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)

import OpenAI from 'openai';
 
const openai = new OpenAI({
  apiKey: 'sk-cog-YOUR-KEY',
  baseURL: 'https://api.cognocient.com/v1',        // ← the only change
});
 
// All existing code works unchanged ✓
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

2-minute setup

No SDKs to install. No infrastructure to manage. If you already use the OpenAI SDK — or any SDK that accepts a base_url / baseURL — you are already compatible. The same URL works for OpenAI, Anthropic, Gemini, Mistral, Groq, and Together AI.

Key capabilities

Cognocient is built around six capability pillars. Each unlocks a different part of the AI spend visibility problem.

Capability	What you get
Spend Attribution	Tag every AI call by feature, team, session, and GL account using HTTP headers. Slice and dice costs any way finance needs.
Workstream Attribution	Tag calls with `X-Cost-Session: JIRA-1234` or `X-Cost-Session: pr-456` to see AI cost per story, PR, or agent task run — down to the cent.
Waste Detection	Five waste categories detected automatically: over-sized models, redundant calls, missed cache opportunities, anomalous usage spikes, and bloated context windows.
AI Investment ROI	Live dashboard panel showing investment vs. waste split, efficiency score (0–100), and a one-sentence board summary generated from your actual spend data.
Budget Enforcement	Hard budget limits enforced at the proxy — before the charge reaches your provider bill. Hierarchical: run → feature → department → org. The tightest limit wins.
max_tokens Clamping	Budget reservations are enforced physically, not just logically. Forwarded `max_tokens` is clamped to what the remaining budget can actually afford, eliminating mid-stream overshoot.
AI Optimization	Semantic caching via pgvector HNSW, prompt-cache routing, and AI-generated one-click recommendations that right-size your spend with a single click.
Agent & MCP Attribution	Full cost tree across multi-agent workflows: which agent called which tool, how much each step cost, and which team or feature was responsible.
Reports & Exports	Board-ready PDF executive reports with 3 narrative tones (Board/CFO, Technical Finance, Engineering). FOCUS 1.1 CSV export. Scheduled monthly delivery to finance teams.

Finding features in the dashboard

Each capability maps to a specific dashboard page. Here's where to look:

Capability	Dashboard page	How to access
Spend Attribution	Engineering Dashboard	Spend by Department chart (bottom left)
Workstream Attribution	Workstreams	`/dashboard/workstreams`
Waste Detection	Executive Overview	Recoverable Waste widget (middle section)
Waste Detection	AI Insights	`/dashboard/insights`
AI Investment ROI	Engineering Dashboard	AI Investment ROI panel (below Maturity Score)
Budget Enforcement	Budgets	`/budgets`
Budget Enforcement	Engineering Dashboard	Health bar → "Budget Usage %"
Budget Status API	—	`GET /api/budgets/status` for orchestration layers
AI Optimization	Recommendations	`/dashboard/recommendations`
Agent Attribution	Engineering Dashboard	Agent Runs table
Reports & Exports	Reports	`/reports` → Generate Report or Scheduled Delivery tab

Read the full Dashboard Walkthrough →

Who Cognocient is for

Engineering teams use Cognocient to understand which features drive AI costs, catch runaway usage before it hits the bill, right-size models with one-click recommendations, and debug anomalies with per-request attribution trails.

FinOps and finance teams use Cognocient to allocate AI spend to the right cost centres, enforce monthly budgets in real time, produce chargeback reports for internal GL accounts, and export FOCUS-compliant data into existing cloud cost management platforms.

CTOs and CFOs use Cognocient's Executive View to answer the board question every quarter: "We spent $X on AI last month — what did we actually get for it?" Unit economics (cost per ticket resolved, cost per report generated, cost per sale influenced) make the answer concrete.

10-day free trial — no credit card required

Full platform access. All 7 providers supported. Your first attribution dashboard in under 5 minutes. Get started for free

Supported AI providers

Cognocient proxies all major providers through a single base_url configuration. No separate keys or SDK changes per provider.

Provider	Models	Notes
OpenAI	GPT-4o, o1, o3, GPT-4o-mini	All models including latest releases
Anthropic	Claude 3.5, Claude 4 family	claude-opus-4, claude-sonnet-4, claude-haiku-4
Google Gemini	Gemini 1.5, 2.0 Flash, 2.5 Pro	Via OpenAI-compatible endpoint
Mistral	Mistral Large, Nemo, Codestral	All Mistral models
Groq	Llama 3.1, Mixtral, Gemma	Ultra-fast inference
Together AI	100+ open-source models	Llama, DeepSeek, Qwen, and more

See Supported Providers for SDK-specific configuration examples for each provider.

Security and privacy

Cognocient is built on a metadata-only logging principle:

Prompt and response content is never stored. We log model, token counts, cost, latency, and your attribution headers — nothing else.
Provider API keys are encrypted at rest using Fernet symmetric encryption. Keys are never logged or exposed in API responses.
All traffic is encrypted in transit (TLS 1.2+).
SOC 2 Type II audit is in progress. See Security & Privacy for the full control list.

Browse the documentation

Section	What it covers
Quickstart	Up and running in 2 minutes. First attribution dashboard in under 5 minutes.
Spend Attribution	HTTP headers, session tracking, GL account mapping, and chargeback reports.
Cost Intelligence	Automatic waste detection, investment classification, and cost-per-outcome metrics.
AI Optimization	Semantic caching, prompt-cache routing, batch API routing, and one-click recommendations.
Agent Workflows	MCP tool call attribution, A2A handoffs, and full workflow cost trees.
Budget & Control	Hard budget limits enforced before charges reach your bill.
Reports & Exports	Board-ready PDFs and FOCUS 1.1 export for major cost management platforms.
Executive View	CFO-level dashboards with unit economics and strategic spend insights.
Security & Privacy	Full security control list, encryption details, and compliance status.

How-to guides

Step-by-step walkthroughs for the most common tasks. Each is self-contained and takes under 30 minutes.

Guide	Time
Tag your first AI call	5 min — add 2 headers and see per-feature spend immediately
Set a monthly spending limit	5 min — hard budget enforced at the proxy
Cut your AI bill with one-click recommendations	30 min — AI Advisor finds savings, one click applies them
Get Slack alerts on spend spikes	10 min — anomaly and budget alerts in your Slack channel
Track cost per agent run	15 min — per-execution cost breakdown and per-run budget
Map spend to GL accounts	10 min — finance-ready chargeback reports with GL codes
Set up hierarchical budgets	15 min — enforce limits at every level simultaneously
Debug runaway agent loops	— find, stop, and prevent looping agents
Prepare an AI ROI board report	30 min — cost per outcome, efficiency trend, board-ready PDF
Enable semantic caching	5 min — eliminate duplicate calls, 25–35% bill reduction
Track cost per business outcome	15 min — cost per ticket, contract, or conversion with one header
Set up MCP agent attribution	15 min — full cost tree for Claude + MCP tool call workflows
Apply routing rules to auto-switch models	10 min — no code changes, savings apply immediately
Schedule monthly AI spend reports	5 min — automated PDF delivery to your finance team
Export AI spend to your FinOps platform	10 min — FOCUS 1.1 into Apptio, CloudZero, or Spot
Migrate from LiteLLM, Langfuse, or Helicone	15 min — side-by-side comparison and first-5-minutes checklist

Frequently asked questions

How long does integration take?

Two minutes. Change your OpenAI base_url to api.cognocient.com/v1 and replace your API key with your Cognocient proxy key (sk-cog-...). All existing SDK calls continue to work unchanged — same method signatures, same response objects, same streaming behaviour.

Does Cognocient add latency to my API calls?

The proxy adds 10–30ms of overhead per call. Budget enforcement runs in Redis (sub-millisecond, not in the critical path). Semantic caching reduces latency for cache-hit requests — results return in under 10ms versus seconds for a live API call. For most production applications, the overhead is imperceptible.

Is my data secure?

Cognocient never stores prompt or response content. We log metadata only: model, token counts, cost, latency, and your attribution headers. Provider API keys are encrypted at rest using Fernet symmetric encryption. All traffic is encrypted in transit. SOC 2 Type II audit is in progress. See Security for the full control list.

What happens when my trial ends?

After 10 days, API calls return a 402 Payment Required error. No data is deleted. Upgrade to any paid plan to resume immediately. Your attribution history is retained for 30 days after trial expiry.

Can I use Cognocient with Anthropic, Google, or Mistral?

Yes. Cognocient proxies OpenAI, Anthropic, Google Gemini, Mistral, Groq, and Together AI. Use the same base_url for all providers. See Supported Providers for SDK-specific configuration examples.

How does budget enforcement actually work?

Cognocient checks your configured budget in Redis before every API call reaches the provider. If the budget is exceeded, the call is either blocked (returns 429), degraded (auto-switches to a cheaper model), or allowed with a webhook alert — your choice per budget rule. This is pre-call enforcement, not a billing alert 24 hours after the fact.

What is the "Investment vs. Waste" classification?

Every API call is classified as "Investment" (generating revenue, serving a user, improving a product) or "Waste" (over-engineered, redundant, inefficient, or purely exploratory) using a 3-tier heuristic engine. You can override classifications manually or train the classifier with feedback signals. The AI Investment ROI panel on the Engineering Dashboard shows this split live — investment protected, recoverable waste, efficiency score, and a one-sentence board summary. See Cost Intelligence for the full classification methodology.

Do I need to change anything in my Anthropic SDK calls?

Yes — the Anthropic SDK uses a different base URL option. See the Quickstart for the exact SDK configuration for Anthropic, Gemini, Mistral, and Groq. The pattern is the same: one base URL change, everything else stays identical.

Ready to start? Get started for free

On this page