What is Cognocient?

Cognocient is an AI Spend Decision Intelligence platform. Real-time LLM cost attribution, pre-call budget enforcement, and CFO-ready board reports. Setup in 2 minutes.

Cognocient is an AI spend intelligence platform that proxies every API call your application makes to AI providers — OpenAI, Anthropic, Google Gemini, Mistral, Groq, Together AI, and more. Every request is observed, attributed to a feature or team, analysed for waste, and surfaced in real-time dashboards your CFO can actually use.

The integration is a single configuration change. No new SDKs, no new dependencies, no application code to rewrite.

The problem

58% of organizations describe their AI costs as a "black box" (Capgemini Research Institute, 2025). Engineering teams see token counts in Datadog. Finance teams get a consolidated line item on the credit card bill — weeks later. Nobody can answer:

  • Which product feature is driving the cost spike?
  • Is this spend waste, or is it generating revenue?
  • Are we about to breach our AI budget before the month ends?

Cognocient closes all three gaps — automatically, in real time, without requiring your developers to instrument anything manually.

How Cognocient works

Your application code points at https://api.cognocient.com/v1 instead of the AI provider directly. Cognocient:

  1. Forwards the request to the original provider — same API surface, same models, same response format. Your application code sees zero difference.
  2. Records metadata — model, token counts, cost, latency, and your optional attribution headers. Prompt and response content are never stored.
  3. Runs analytics in-path — waste detection, budget enforcement, anomaly scoring, and cache hit analysis — before the response returns to your app.
  4. Surfaces insights — into the Cognocient dashboard, your BI tools via FOCUS 1.1 export, or your alerting pipeline via webhooks.
┌─────────────────────────────────────────────────────────────────┐
│                       YOUR APPLICATION                          │
│  client = OpenAI(api_key="sk-cog-...",                          │
│                  base_url="https://api.cognocient.com/v1")  ←─  │
│                     the only change                             │
└───────────────────────────┬─────────────────────────────────────┘
                            │  POST /v1/chat/completions
                            │  X-Cost-Feature: chatbot
                            │  X-Cost-Department: engineering

┌─────────────────────────────────────────────────────────────────┐
│                    COGNOCIENT PROXY                             │
│                                                                 │
│  ① Budget check (Redis, sub-ms) ── over limit? → 429 / degrade │
│  ② Cache lookup (pgvector HNSW) ── hit? → return in <10ms      │
│  ③ max_tokens clamping ─────────── cap to remaining budget     │
│  ④ Attribution tagging ─────────── feature, dept, session      │
│  ⑤ Forward to provider ─────────── with real provider key      │
│  ⑥ Log metadata ────────────────── cost, tokens, latency       │
│                                                                 │
│  10–30ms overhead  ·  Prompt content never stored              │
└───────────────────────────┬─────────────────────────────────────┘

               ┌────────────┼────────────┐
               ▼            ▼            ▼
          OpenAI       Anthropic      Gemini / Mistral
          GPT-4o      Claude 4       / Groq / Together

The proxy adds 10–30ms of overhead. Budget checks run in Redis (sub-millisecond). Semantic cache hits return in under 10ms — often faster than the provider would have responded.

One change unlocks everything

Change one line. That's it.

from openai import OpenAI
 
client = OpenAI(
    api_key="sk-cog-YOUR-KEY",                    # Your Cognocient proxy key
    base_url="https://api.cognocient.com/v1"       # ← the only change
)
 
# All existing code works unchanged ✓
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
import OpenAI from 'openai';
 
const openai = new OpenAI({
  apiKey: 'sk-cog-YOUR-KEY',
  baseURL: 'https://api.cognocient.com/v1',        // ← the only change
});
 
// All existing code works unchanged ✓
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

2-minute setup

No SDKs to install. No infrastructure to manage. If you already use the OpenAI SDK — or any SDK that accepts a base_url / baseURL — you are already compatible. The same URL works for OpenAI, Anthropic, Gemini, Mistral, Groq, and Together AI.

Key capabilities

Cognocient is built around six capability pillars. Each unlocks a different part of the AI spend visibility problem.

CapabilityWhat you get
Spend AttributionTag every AI call by feature, team, session, and GL account using HTTP headers. Slice and dice costs any way finance needs.
Workstream AttributionTag calls with X-Cost-Session: JIRA-1234 or X-Cost-Session: pr-456 to see AI cost per story, PR, or agent task run — down to the cent.
Waste DetectionFive waste categories detected automatically: over-sized models, redundant calls, missed cache opportunities, anomalous usage spikes, and bloated context windows.
AI Investment ROILive dashboard panel showing investment vs. waste split, efficiency score (0–100), and a one-sentence board summary generated from your actual spend data.
Budget EnforcementHard budget limits enforced at the proxy — before the charge reaches your provider bill. Hierarchical: run → feature → department → org. The tightest limit wins.
max_tokens ClampingBudget reservations are enforced physically, not just logically. Forwarded max_tokens is clamped to what the remaining budget can actually afford, eliminating mid-stream overshoot.
AI OptimizationSemantic caching via pgvector HNSW, prompt-cache routing, and AI-generated one-click recommendations that right-size your spend with a single click.
Agent & MCP AttributionFull cost tree across multi-agent workflows: which agent called which tool, how much each step cost, and which team or feature was responsible.
Reports & ExportsBoard-ready PDF executive reports with 3 narrative tones (Board/CFO, Technical Finance, Engineering). FOCUS 1.1 CSV export. Scheduled monthly delivery to finance teams.

Finding features in the dashboard

Each capability maps to a specific dashboard page. Here's where to look:

CapabilityDashboard pageHow to access
Spend AttributionEngineering DashboardSpend by Department chart (bottom left)
Workstream AttributionWorkstreams/dashboard/workstreams
Waste DetectionExecutive OverviewRecoverable Waste widget (middle section)
Waste DetectionAI Insights/dashboard/insights
AI Investment ROIEngineering DashboardAI Investment ROI panel (below Maturity Score)
Budget EnforcementBudgets/budgets
Budget EnforcementEngineering DashboardHealth bar → "Budget Usage %"
Budget Status APIGET /api/budgets/status for orchestration layers
AI OptimizationRecommendations/dashboard/recommendations
Agent AttributionEngineering DashboardAgent Runs table
Reports & ExportsReports/reports → Generate Report or Scheduled Delivery tab

Read the full Dashboard Walkthrough →

Who Cognocient is for

Engineering teams use Cognocient to understand which features drive AI costs, catch runaway usage before it hits the bill, right-size models with one-click recommendations, and debug anomalies with per-request attribution trails.

FinOps and finance teams use Cognocient to allocate AI spend to the right cost centres, enforce monthly budgets in real time, produce chargeback reports for internal GL accounts, and export FOCUS-compliant data into existing cloud cost management platforms.

CTOs and CFOs use Cognocient's Executive View to answer the board question every quarter: "We spent $X on AI last month — what did we actually get for it?" Unit economics (cost per ticket resolved, cost per report generated, cost per sale influenced) make the answer concrete.

10-day free trial — no credit card required

Full platform access. All 7 providers supported. Your first attribution dashboard in under 5 minutes. Get started for free

Supported AI providers

Cognocient proxies all major providers through a single base_url configuration. No separate keys or SDK changes per provider.

ProviderModelsNotes
OpenAIGPT-4o, o1, o3, GPT-4o-miniAll models including latest releases
AnthropicClaude 3.5, Claude 4 familyclaude-opus-4, claude-sonnet-4, claude-haiku-4
Google GeminiGemini 1.5, 2.0 Flash, 2.5 ProVia OpenAI-compatible endpoint
MistralMistral Large, Nemo, CodestralAll Mistral models
GroqLlama 3.1, Mixtral, GemmaUltra-fast inference
Together AI100+ open-source modelsLlama, DeepSeek, Qwen, and more

See Supported Providers for SDK-specific configuration examples for each provider.

Security and privacy

Cognocient is built on a metadata-only logging principle:

  • Prompt and response content is never stored. We log model, token counts, cost, latency, and your attribution headers — nothing else.
  • Provider API keys are encrypted at rest using Fernet symmetric encryption. Keys are never logged or exposed in API responses.
  • All traffic is encrypted in transit (TLS 1.2+).
  • SOC 2 Type II audit is in progress. See Security & Privacy for the full control list.

Browse the documentation

SectionWhat it covers
QuickstartUp and running in 2 minutes. First attribution dashboard in under 5 minutes.
Spend AttributionHTTP headers, session tracking, GL account mapping, and chargeback reports.
Cost IntelligenceAutomatic waste detection, investment classification, and cost-per-outcome metrics.
AI OptimizationSemantic caching, prompt-cache routing, batch API routing, and one-click recommendations.
Agent WorkflowsMCP tool call attribution, A2A handoffs, and full workflow cost trees.
Budget & ControlHard budget limits enforced before charges reach your bill.
Reports & ExportsBoard-ready PDFs and FOCUS 1.1 export for major cost management platforms.
Executive ViewCFO-level dashboards with unit economics and strategic spend insights.
Security & PrivacyFull security control list, encryption details, and compliance status.

How-to guides

Step-by-step walkthroughs for the most common tasks. Each is self-contained and takes under 30 minutes.

GuideTime
Tag your first AI call5 min — add 2 headers and see per-feature spend immediately
Set a monthly spending limit5 min — hard budget enforced at the proxy
Cut your AI bill with one-click recommendations30 min — AI Advisor finds savings, one click applies them
Get Slack alerts on spend spikes10 min — anomaly and budget alerts in your Slack channel
Track cost per agent run15 min — per-execution cost breakdown and per-run budget
Map spend to GL accounts10 min — finance-ready chargeback reports with GL codes
Set up hierarchical budgets15 min — enforce limits at every level simultaneously
Debug runaway agent loops— find, stop, and prevent looping agents
Prepare an AI ROI board report30 min — cost per outcome, efficiency trend, board-ready PDF
Enable semantic caching5 min — eliminate duplicate calls, 25–35% bill reduction
Track cost per business outcome15 min — cost per ticket, contract, or conversion with one header
Set up MCP agent attribution15 min — full cost tree for Claude + MCP tool call workflows
Apply routing rules to auto-switch models10 min — no code changes, savings apply immediately
Schedule monthly AI spend reports5 min — automated PDF delivery to your finance team
Export AI spend to your FinOps platform10 min — FOCUS 1.1 into Apptio, CloudZero, or Spot
Migrate from LiteLLM, Langfuse, or Helicone15 min — side-by-side comparison and first-5-minutes checklist

Frequently asked questions

How long does integration take?

Two minutes. Change your OpenAI base_url to api.cognocient.com/v1 and replace your API key with your Cognocient proxy key (sk-cog-...). All existing SDK calls continue to work unchanged — same method signatures, same response objects, same streaming behaviour.

Does Cognocient add latency to my API calls?

The proxy adds 10–30ms of overhead per call. Budget enforcement runs in Redis (sub-millisecond, not in the critical path). Semantic caching reduces latency for cache-hit requests — results return in under 10ms versus seconds for a live API call. For most production applications, the overhead is imperceptible.

Is my data secure?

Cognocient never stores prompt or response content. We log metadata only: model, token counts, cost, latency, and your attribution headers. Provider API keys are encrypted at rest using Fernet symmetric encryption. All traffic is encrypted in transit. SOC 2 Type II audit is in progress. See Security for the full control list.

What happens when my trial ends?

After 10 days, API calls return a 402 Payment Required error. No data is deleted. Upgrade to any paid plan to resume immediately. Your attribution history is retained for 30 days after trial expiry.

Can I use Cognocient with Anthropic, Google, or Mistral?

Yes. Cognocient proxies OpenAI, Anthropic, Google Gemini, Mistral, Groq, and Together AI. Use the same base_url for all providers. See Supported Providers for SDK-specific configuration examples.

How does budget enforcement actually work?

Cognocient checks your configured budget in Redis before every API call reaches the provider. If the budget is exceeded, the call is either blocked (returns 429), degraded (auto-switches to a cheaper model), or allowed with a webhook alert — your choice per budget rule. This is pre-call enforcement, not a billing alert 24 hours after the fact.

What is the "Investment vs. Waste" classification?

Every API call is classified as "Investment" (generating revenue, serving a user, improving a product) or "Waste" (over-engineered, redundant, inefficient, or purely exploratory) using a 3-tier heuristic engine. You can override classifications manually or train the classifier with feedback signals. The AI Investment ROI panel on the Engineering Dashboard shows this split live — investment protected, recoverable waste, efficiency score, and a one-sentence board summary. See Cost Intelligence for the full classification methodology.

Do I need to change anything in my Anthropic SDK calls?

Yes — the Anthropic SDK uses a different base URL option. See the Quickstart for the exact SDK configuration for Anthropic, Gemini, Mistral, and Groq. The pattern is the same: one base URL change, everything else stays identical.


Ready to start? Get started for free