Currently SO by constrained decoding is supported by large models (>80B) only with tps <50. This is insanely slow for agentic architectures based on SGR, and you keep slowing down inference. Please provide at least one fast, large, good model with SO
Please authenticate to join the conversation.
In Review
💡 Feature request
19 days ago

Ivan Matveev
Get notified by email when there are changes.
In Review
💡 Feature request
19 days ago

Ivan Matveev
Get notified by email when there are changes.