Vietnamese Tokenizer

This repository contains a ByteLevel BPE tokenizer trained from scratch specifically for the Vietnamese language, designed for decoder-only language model pretraining.


πŸš€ Usage

Load tokenizer

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    "tranhuyHoang/mini_VN_decoder_tokenizer",
    use_fast=True
)
Downloads last month
39
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train tranhuyHoang/mini_VN_decoder_tokenizer