Transformers v5 support

#15
by AntonV HF Staff - opened

Dependent on https://github.com/huggingface/transformers/pull/42028 and requires the latest transformers version (from main)

Usage example:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "MiniMaxAI/MiniMax-M2.1",
    device_map="auto",
    revision="refs/pr/15",
)

tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.1", revision="refs/pr/15")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")

generated_ids = model.generate(**model_inputs, max_new_tokens=100)

response = tokenizer.batch_decode(generated_ids)[0]

print(response)
AntonV changed pull request status to open
MiniMax org

SGLang/vLLM still uses transformers v4. Changing the tokenizer_class from GPT2Tokenizer to TokenizersBackend will break vLLM/SGLang inference.

#14 has same problem. @awni

[2026-01-09 07:37:53 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 2932, in run_scheduler_process
    scheduler = Scheduler(
                ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 330, in __init__
    self.init_tokenizer()
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 443, in init_tokenizer
    self.tokenizer = get_tokenizer(
                     ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 496, in get_tokenizer
    raise e
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 461, in get_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 1137, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

Perhaps PreTrainedTokenizerFast is a more compatible option?

Let me check if we can simply revert back to GPT2 Tokenizer, https://github.com/huggingface/transformers/pull/42894 should enable TokenizersBackend by default when we don't specify it in the auto mapping

Yes it works, still loading tokenizers backend. That should work for both v4 (GPT2 Tokenizers) and v5 (TokenizersBackend) then

Updating the other PR as well

@rogeryoungh bumping this so it doesn't get lost

MiniMax org

This is useful to run it in MLX as well so hope it lands!

Removing these files would be a breaking change for now, as vLLM and SGLang still rely on transformers v4 for inference.

A better way is to keep the v5 implementation in a separate branch until those engines officially transition to v5. What do you think?

$ python3 -m sglang.launch_server \
    --model-path /model \
    --tp-size 4 \
    --tool-call-parser minimax-m2 \
    --trust-remote-code \
    --host 0.0.0.0 --port 8000 \
    --reasoning-parser minimax-append-think \
    --disable-radix-cache \
    --mem-fraction-static 0.7

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1360, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
                   ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1048, in __getitem__
    raise KeyError(key)
KeyError: 'minimax_m2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.12/dist-packages/sglang/launch_server.py", line 29, in <module>
    server_args = prepare_server_args(sys.argv[1:])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 5061, in prepare_server_args
    return ServerArgs.from_cli_args(raw_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 4563, in from_cli_args
    return cls(**{attr: getattr(args, attr) for attr in attrs})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 316, in __init__
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 679, in __post_init__
    self._handle_model_specific_adjustments()
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 1048, in _handle_model_specific_adjustments
    hf_config = self.get_model_config().hf_config
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 4577, in get_model_config
    self.model_config = ModelConfig.from_server_args(self)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/configs/model_config.py", line 241, in from_server_args
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/configs/model_config.py", line 126, in __init__
    self.hf_config = get_config(
                     ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/common.py", line 3164, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 278, in get_config
    raise e
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 273, in get_config
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1362, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `minimax_m2` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

A better way is to keep the v5 implementation in a separate branch until those engines officially transition to v5. What do you think?

Gotcha, yes no worries. We will release v5 very soon, I'll ping you again then to keep you in the loop 🤗

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment