What is Context Tax?
Context tax is the recurring cost of sending a large, mostly-static system prompt with every API call. Learn how prompt caching eliminates it.
Context tax is the token cost paid repeatedly for a large, mostly unchanging system prompt sent with every API call. When a feature's system prompt makes up 80-90% of its total input tokens on every request, that static portion is a "tax" — the same cost paid over and over for information that rarely changes.
Why context tax adds up
RAG-based features are especially prone to this — sending large reference documents or instructions with every query dramatically inflates token count per call, even when the user's actual question is short.
How to detect context tax
Measure the ratio of static tokens (the same on every call) to variable tokens (the part that actually changes, like the user's question). A low variance in input token count across many calls to the same feature is the signal.
How Cognocient detects and fixes context tax
Cognocient's Context Tax Analyser calculates this ratio automatically per feature and quantifies the exact saving available from enabling prompt caching on the static portion — often a 60-80% reduction on that portion of input cost.
Find my waste — free trial → — see your own context tax findings in under 5 minutes.
Related: Token maxing · AI spend attribution · Context Tax Analyser
Related articles
AI FinOps Glossary
Definitions for AI spend management terms: token maxing, context tax, cost per outcome, AI spend attribution, and more.
What is Token Maxing?
Token maxing is the practice of using expensive frontier AI models for tasks that cheaper models handle equally well.
What is Cost Per Outcome?
Cost per outcome measures what it costs to achieve a specific business result with AI, not just raw API spend.