Larger FAST models with SO.

Currently SO by constrained decoding is supported by large models (>80B) only with tps <50. This is insanely slow for agentic architectures based on SGR, and you keep slowing down inference. Please provide at least one fast, large, good model with SO

Please authenticate to join the conversation.

Upvoters
Status

In Review

Board

💡 Feature request

Date

19 days ago

Author

Ivan Matveev

Subscribe to post

Get notified by email when there are changes.