the chat template might be incorrect: missing `<|tool_call_start|>` part in chat_template.jinja
We saw the <|tool_call_start|> .. <|tool_call_end|> part in the example of tool usage on the model card:
<|startoftext|><|im_start|>system
List of tools: <|tool_list_start|>[{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|tool_list_end|><|im_end|>
<|im_start|>user
What is the current status of candidate ID 12345?<|im_end|>
<|im_start|>assistant
<|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|>
<|im_start|>tool
<|tool_response_start|>{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}<|tool_response_end|><|im_end|>
<|im_start|>assistant
The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|>
but, there is no <|tool_call_start|> part in the chat_template.jinja and tokenizer_config files for now.
Am I missing something around that?
Hey, the model has been trained to output these tokens, which is why they're not part of the chat template. Did you have an issue with the model answer for function calling?
I have, it repeats calls forever. It’s just me or anyone have the same issue?
I recommend using these generation parameters:
temperature=0.3min_p=0.15repetition_penalty=1.05
If you still have this issue, you can increase the repetition_penalty in the 1.1-1.5 range.
I'm talking about function calling SFT.
In that case, we need these special tokens in chat template if the models are trained with it in post training.
Hi @mlabonne , do you have any plans to update the chat template to include tool-use tokens?
I think adding them would make it easier for huggingface users to fine-tune your models on tool-use datasets,
since many hf users expect chat templates to work seamlessly for SFT: https://huggingface.co/docs/transformers/en/chat_templating