Transformers v5 support

#15

by AntonV HF Staff - opened 16 days ago

base: refs/heads/main

←

from: refs/pr/15

Discussion Files changed

+26

-1003

AntonV

16 days ago

•

edited 15 days ago

Dependent on https://github.com/huggingface/transformers/pull/42028 and requires the latest transformers version (from main)

Usage example:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "MiniMaxAI/MiniMax-M2.1",
    device_map="auto",
    revision="refs/pr/15",
)

tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-M2.1", revision="refs/pr/15")

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")

generated_ids = model.generate(**model_inputs, max_new_tokens=100)

response = tokenizer.batch_decode(generated_ids)[0]

print(response)

v5 version200351ba

update docs1f5d476e

AntonV changed pull request status to open 15 days ago

rogeryoungh

MiniMax org 14 days ago

SGLang/vLLM still uses transformers v4. Changing the tokenizer_class from GPT2Tokenizer to TokenizersBackend will break vLLM/SGLang inference.

#14 has same problem. @awni

[2026-01-09 07:37:53 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 2932, in run_scheduler_process
    scheduler = Scheduler(
                ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 330, in __init__
    self.init_tokenizer()
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/managers/scheduler.py", line 443, in init_tokenizer
    self.tokenizer = get_tokenizer(
                     ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 496, in get_tokenizer
    raise e
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 461, in get_tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py", line 1137, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.

Perhaps PreTrainedTokenizerFast is a more compatible option?

AntonV

14 days ago

•

edited 14 days ago

Let me check if we can simply revert back to GPT2 Tokenizer, https://github.com/huggingface/transformers/pull/42894 should enable TokenizersBackend by default when we don't specify it in the auto mapping

revert tokenizer config418288ea

AntonV

14 days ago

Yes it works, still loading tokenizers backend. That should work for both v4 (GPT2 Tokenizers) and v5 (TokenizersBackend) then

Updating the other PR as well

AntonV

1 day ago

@rogeryoungh bumping this so it doesn't get lost

awni

MiniMax org 1 day ago

This is useful to run it in MLX as well so hope it lands!

rogeryoungh

MiniMax org about 9 hours ago

•

edited about 9 hours ago

Removing these files would be a breaking change for now, as vLLM and SGLang still rely on transformers v4 for inference.

A better way is to keep the v5 implementation in a separate branch until those engines officially transition to v5. What do you think?

$ python3 -m sglang.launch_server \
    --model-path /model \
    --tp-size 4 \
    --tool-call-parser minimax-m2 \
    --trust-remote-code \
    --host 0.0.0.0 --port 8000 \
    --reasoning-parser minimax-append-think \
    --disable-radix-cache \
    --mem-fraction-static 0.7

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1360, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
                   ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1048, in __getitem__
    raise KeyError(key)
KeyError: 'minimax_m2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.12/dist-packages/sglang/launch_server.py", line 29, in <module>
    server_args = prepare_server_args(sys.argv[1:])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 5061, in prepare_server_args
    return ServerArgs.from_cli_args(raw_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 4563, in from_cli_args
    return cls(**{attr: getattr(args, attr) for attr in attrs})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 316, in __init__
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 679, in __post_init__
    self._handle_model_specific_adjustments()
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 1048, in _handle_model_specific_adjustments
    hf_config = self.get_model_config().hf_config
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/server_args.py", line 4577, in get_model_config
    self.model_config = ModelConfig.from_server_args(self)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/configs/model_config.py", line 241, in from_server_args
    return ModelConfig(
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/configs/model_config.py", line 126, in __init__
    self.hf_config = get_config(
                     ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/common.py", line 3164, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 278, in get_config
    raise e
  File "/usr/local/lib/python3.12/dist-packages/sglang/srt/utils/hf_transformers_utils.py", line 273, in get_config
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py", line 1362, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `minimax_m2` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

AntonV

about 4 hours ago

A better way is to keep the v5 implementation in a separate branch until those engines officially transition to v5. What do you think?

Gotcha, yes no worries. We will release v5 very soon, I'll ping you again then to keep you in the loop 🤗

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment