Transformers v5 support
Dependent on https://github.com/huggingface/transformers/pull/42028 and requires the latest transformers version (from main)
Usage example:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"MiniMaxAI/MiniMax-M2.1",
device_map="auto",
revision="refs/pr/15",
)
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.1", revision="refs/pr/15")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
generated_ids = model.generate(**model_inputs, max_new_tokens=100)
response = tokenizer.batch_decode(generated_ids)[0]
print(response)
SGLang/vLLM still uses transformers v4. Changing the tokenizer_class from GPT2Tokenizer to TokenizersBackend will break vLLM/SGLang inference.
[2026-01-09 07:37:53 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 2932, in run_scheduler_process
scheduler = Scheduler(
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 330, in __init__
self.init_tokenizer()
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 443, in init_tokenizer
self.tokenizer = get_tokenizer(
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 496, in get_tokenizer
raise e
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 461, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 1137, in from_pretrained
raise ValueError(
ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.
Perhaps PreTrainedTokenizerFast is a more compatible option?
Let me check if we can simply revert back to GPT2 Tokenizer, https://github.com/huggingface/transformers/pull/42894 should enable TokenizersBackend by default when we don't specify it in the auto mapping
Yes it works, still loading tokenizers backend. That should work for both v4 (GPT2 Tokenizers) and v5 (TokenizersBackend) then
Updating the other PR as well
This is useful to run it in MLX as well so hope it lands!
Removing these files would be a breaking change for now, as vLLM and SGLang still rely on transformers v4 for inference.
A better way is to keep the v5 implementation in a separate branch until those engines officially transition to v5. What do you think?
$ python3 -m sglang.launch_server \
--model-path /model \
--tp-size 4 \
--tool-call-parser minimax-m2 \
--trust-remote-code \
--host 0.0.0.0 --port 8000 \
--reasoning-parser minimax-append-think \
--disable-radix-cache \
--mem-fraction-static 0.7
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1360, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1048, in __getitem__
raise KeyError(key)
KeyError: 'minimax_m2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/sglang/launch_server.py", line 29, in <module>
server_args = prepare_server_args(sys.argv[1:])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 5061, in prepare_server_args
return ServerArgs.from_cli_args(raw_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 4563, in from_cli_args
return cls(**{attr: getattr(args, attr) for attr in attrs})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 316, in __init__
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 679, in __post_init__
self._handle_model_specific_adjustments()
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 1048, in _handle_model_specific_adjustments
hf_config = self.get_model_config().hf_config
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 4577, in get_model_config
self.model_config = ModelConfig.from_server_args(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/configs/model_config.py", line 241, in from_server_args
return ModelConfig(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/configs/model_config.py", line 126, in __init__
self.hf_config = get_config(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/common.py", line 3164, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 278, in get_config
raise e
File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 273, in get_config
config = AutoConfig.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1362, in from_pretrained
raise ValueError(
ValueError: The checkpoint you are trying to load has model type `minimax_m2` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
A better way is to keep the v5 implementation in a separate branch until those engines officially transition to v5. What do you think?
Gotcha, yes no worries. We will release v5 very soon, I'll ping you again then to keep you in the loop 🤗