Help Center

Improved

Fixed

nebius

In Review

Planned

Completed

Won't do

Duplicat

Сompute Cloud

Managed Database

Network

Documentation

Managed Service for Kubernetes®

Billing

Marketplace

Other

User interface

Main Roadmap

Share your ideas about our platform

Hey {name|there}! 👋

Many other providers offer prompt-caching with discounted pricing for cache hits, either implicitly (e.g., OpenAI, DeepSeek, Google, DeepInfra, NovitaAI, Fireworks) or explicitly (most notably Anthropic). This capability can significantly reduce costs in agentic workflows, where a single session often re-sends the same context repeatedly (for example, when the model performs multiple tool calls in sequence and the shared conversation/context is included each time). Today, Nebius Token Factory is at a cost disadvantage in these repeated, input-token-heavy scenarios compared to providers that support prompt caching and pass the savings through to customers. Please add support for prompt caching (implicit or explicit), including discounted pricing for cached prompt tokens, to improve cost-efficiency for agentic and tool-using applications.

Support Implicite/Explicite Prompt Caching

Lukas Kreussel

Nebius

Support Implicite/Explicite Prompt Caching

Subscribe to post

Subscribe to post