Support Implicite/Explicite Prompt Caching

Many other providers offer prompt-caching with discounted pricing for cache hits, either implicitly (e.g., OpenAI, DeepSeek, Google, DeepInfra, NovitaAI, Fireworks) or explicitly (most notably Anthropic).

This capability can significantly reduce costs in agentic workflows, where a single session often re-sends the same context repeatedly (for example, when the model performs multiple tool calls in sequence and the shared conversation/context is included each time).

Today, Nebius Token Factory is at a cost disadvantage in these repeated, input-token-heavy scenarios compared to providers that support prompt caching and pass the savings through to customers.

Please add support for prompt caching (implicit or explicit), including discounted pricing for cached prompt tokens, to improve cost-efficiency for agentic and tool-using applications.

Please authenticate to join the conversation.

Upvoters
Status

In Review

Board

💡 Feature request

Tags

Billing

Date

About 1 month ago

Author

Lukas Kreussel

Subscribe to post

Get notified by email when there are changes.