Template Think issue

#2
by marutichintan - opened

No getting starting <think> just getting ending </think>

==========
The user has simply said "Hello". This is a greeting, so I should respond in a friendly and welcoming manner. I'll introduce myself briefly and offer to help.
</think>

Hello! I'm MiniMax, an AI assistant built by MiniMax. How can I help you today?
==========
Prompt: 40 tokens, 44.115 tokens-per-sec
Generation: 57 tokens, 53.484 tokens-per-sec
Peak memory: 128.776 GB```
MLX Community org

There is a discussion on this topic at Reddit.
https://www.reddit.com/r/LocalLLaMA/comments/1q1gps8/minimax_m21_think_tag_and_interleaved_thinking/

One of the llama.cpp developers mentions, "MiniMax works perfectly fine and is in fact not the only model with this behavior ..."
"You should not expect the inference engine to output raw tokens. The role of the inference engine is to parse the reasoning section accordingly, including splitting it into reasoning_content and content."
Seems that some UI apps might handle it the wrong way?

You might want to try my framework, mlx-openai-server, which can help handle this use case, @marutichintan . Please feel free to take a look, and I’d be happy to hear any feedback you have.

Sign up or log in to comment