Many other providers offer prompt-caching with discounted pricing for cache hits, either implicitly (e.g., OpenAI, DeepSeek, Google, DeepInfra, NovitaAI, Fireworks) or explicitly (most notably Anthropic).
This capability can significantly reduce costs in agentic workflows, where a single session often re-sends the same context repeatedly (for example, when the model performs multiple tool calls in sequence and the shared conversation/context is included each time).
Today, Nebius Token Factory is at a cost disadvantage in these repeated, input-token-heavy scenarios compared to providers that support prompt caching and pass the savings through to customers.
Please add support for prompt caching (implicit or explicit), including discounted pricing for cached prompt tokens, to improve cost-efficiency for agentic and tool-using applications.
Please authenticate to join the conversation.
In Review
💡 Feature request
Billing
About 1 month ago

Lukas Kreussel
Get notified by email when there are changes.
In Review
💡 Feature request
Billing
About 1 month ago

Lukas Kreussel
Get notified by email when there are changes.