TokenMinMax compresses every LLM request automatically — 95% of context retained, zero quality loss. Short conversations save 40%. Long sessions save up to 70%. Works with Anthropic, OpenAI, Claude Code, and Cursor. No SDK changes required.
Every request passes through a sequential pipeline before reaching your LLM provider. Always fails open — if any tier fails, the request proceeds unaffected.
No SDK changes, no code refactors. Point your client at TokenMinMax and every request is automatically compressed.
# Your existing code
client = Anthropic()
client = Anthropic( base_url="https://www.tokenminmax.com/anthropic", default_headers={ "x-tokenminmax-key": "tmm-yourkey" } )
# Your existing code
client = OpenAI()
client = OpenAI( base_url="https://www.tokenminmax.com/openai", default_headers={ "x-tokenminmax-key": "tmm-yourkey" } )
Tier 1+2 free on all plans. Tier 3 included from Starter. Tier 4 session memory from Growth. Extra credits available on Pro.