HybridMambaAttentionDynamicCache is not valid?

#14

by GentleLiu - opened Aug 25

Aug 25

Have you tried to infer with "HybridMambaAttentionDynamicCache"? It seems to be invalid?
However, if I don't use cache_params, the inference speed is very slow.

Maghoumi

Sep 1

I was facing issues on transformers v4.55.4, and the following generation call wouldn't work, and would throw an error about key_cache not having a setter:

outputs = model.generate(
    tokenized_chat,
    max_new_tokens=32,
    eos_token_id=tokenizer.eos_token_id
)

Had to add the following fields manually to get it working:

class HybridMambaAttentionDynamicCache(DynamicCache):
   ...

    # Added these:
    key_cache = None
    value_cache = None
    is_compileable = False

But I do see this warning, and generation is pretty slow:

WARNING:transformers_modules.nvidia.NVIDIA-Nemotron-Nano-9B-v2.dc376c20a64208fc2cb4667e00af485eeced8ae4.modeling_nemotron_h:NemotronH requires an initialized `NemotronHHybridDynamicCache` to return a cache. None was provided, so no cache will be returned.

GentleLiu

Sep 9

I was facing issues on transformers v4.55.4, and the following generation call wouldn't work, and would throw an error about key_cache not having a setter:
outputs = model.generate(
    tokenized_chat,
    max_new_tokens=32,
    eos_token_id=tokenizer.eos_token_id
)
Had to add the following fields manually to get it working:
class HybridMambaAttentionDynamicCache(DynamicCache):
   ...

    # Added these:
    key_cache = None
    value_cache = None
    is_compileable = False
But I do see this warning, and generation is pretty slow:
WARNING:transformers_modules.nvidia.NVIDIA-Nemotron-Nano-9B-v2.dc376c20a64208fc2cb4667e00af485eeced8ae4.modeling_nemotron_h:NemotronH requires an initialized `NemotronHHybridDynamicCache` to return a cache. None was provided, so no cache will be returned.

Indeed, when you run the inference code with Pytorch, the cache mechanism is not used. Hence, the generation speed is pretty slow.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment