Template Think issue

by marutichintan - opened 4 days ago

4 days ago

No getting starting <think> just getting ending </think>

==========
The user has simply said "Hello". This is a greeting, so I should respond in a friendly and welcoming manner. I'll introduce myself briefly and offer to help.
</think>

Hello! I'm MiniMax, an AI assistant built by MiniMax. How can I help you today?
==========
Prompt: 40 tokens, 44.115 tokens-per-sec
Generation: 57 tokens, 53.484 tokens-per-sec
Peak memory: 128.776 GB```

bibproj

MLX Community org 3 days ago

There is a discussion on this topic at Reddit.
https://www.reddit.com/r/LocalLLaMA/comments/1q1gps8/minimax_m21_think_tag_and_interleaved_thinking/

One of the llama.cpp developers mentions, "MiniMax works perfectly fine and is in fact not the only model with this behavior ..."
"You should not expect the inference engine to output raw tokens. The role of the inference engine is to parse the reasoning section accordingly, including splitting it into reasoning_content and content."
Seems that some UI apps might handle it the wrong way?

GiaHuy

2 days ago

You might want to try my framework, mlx-openai-server, which can help handle this use case, @marutichintan . Please feel free to take a look, and I’d be happy to hear any feedback you have.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment