Text Generation
Transformers
Safetensors
PyTorch
nvidia
conversational

HybridMambaAttentionDynamicCache is not valid?

#14
by GentleLiu - opened

Have you tried to infer with "HybridMambaAttentionDynamicCache"? It seems to be invalid?
However, if I don't use cache_params, the inference speed is very slow.

I was facing issues on transformers v4.55.4, and the following generation call wouldn't work, and would throw an error about key_cache not having a setter:

outputs = model.generate(
    tokenized_chat,
    max_new_tokens=32,
    eos_token_id=tokenizer.eos_token_id
)

Had to add the following fields manually to get it working:

class HybridMambaAttentionDynamicCache(DynamicCache):
   ...

    # Added these:
    key_cache = None
    value_cache = None
    is_compileable = False

But I do see this warning, and generation is pretty slow:

WARNING:transformers_modules.nvidia.NVIDIA-Nemotron-Nano-9B-v2.dc376c20a64208fc2cb4667e00af485eeced8ae4.modeling_nemotron_h:NemotronH requires an initialized `NemotronHHybridDynamicCache` to return a cache. None was provided, so no cache will be returned.

I was facing issues on transformers v4.55.4, and the following generation call wouldn't work, and would throw an error about key_cache not having a setter:

outputs = model.generate(
    tokenized_chat,
    max_new_tokens=32,
    eos_token_id=tokenizer.eos_token_id
)

Had to add the following fields manually to get it working:

class HybridMambaAttentionDynamicCache(DynamicCache):
   ...

    # Added these:
    key_cache = None
    value_cache = None
    is_compileable = False

But I do see this warning, and generation is pretty slow:

WARNING:transformers_modules.nvidia.NVIDIA-Nemotron-Nano-9B-v2.dc376c20a64208fc2cb4667e00af485eeced8ae4.modeling_nemotron_h:NemotronH requires an initialized `NemotronHHybridDynamicCache` to return a cache. None was provided, so no cache will be returned.

Indeed, when you run the inference code with Pytorch, the cache mechanism is not used. Hence, the generation speed is pretty slow.

Sign up or log in to comment