Introduction

To be able to process 16K tokens, bartpho-word-base's position embedding matrix was simply copied 16 times.

This model is especially interesting for long-range summarization and question answering.

Fine-tuning for down-stream task

This notebook shows how led model can effectively be fine-tuned on a downstream task.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support