SeaLLMs
/

SeaLLM-13B-Chat

multilingual

sea

Model card Files Files and versions

xet

Community

nxphi47 commited on Oct 25, 2023

Commit

9caf7f3

1 Parent(s): 7cf5394

Update README.md

Browse files

Files changed (1) hide show

README.md +6 -10

README.md CHANGED Viewed

@@ -6,24 +6,20 @@ inference: false
   <img src="seal_logo.png" width="200" />
 </p>
-# SeaLLM - An Assistant for South East Asian Languages
-<!-- - DEMO: [DAMO-NLP-SG/damo-seal-v0](https://huggingface.co/spaces/DAMO-NLP-SG/damo-seal-v0) -->
 <p align="center">
-🤗 <a href="https://huggingface.co/spaces/DAMO-NLP-SG/damo-seal-v0">Hugging Face DEMO</a>
 </p>
-We introduce SeaLLM - a family of language models optimized for South East Asian (SEA) languages. The SeaLLM-base models (to be released) were pre-trained from [Llama-2](https://huggingface.co/meta-llama/Llama-2-13b-hf), on a tailored publicly-available dataset, which comprises mainly Vietnamese 🇻🇳, Indonesian 🇮🇩 and Thai 🇹🇭 texts, along with those in English 🇬🇧 and Chinese 🇨🇳. The pre-training stage involves multiple stages with dynamic data control to preserve the original knowledge base of Llama-2 while gaining new abilities in SEA languages.
-The [SeaLLM-chat](https://huggingface.co/spaces/DAMO-NLP-SG/damo-seal-v0) model underwent supervised finetuning (SFT) on a mix of public instruction data (e.g. [OpenORCA](https://huggingface.co/datasets/Open-Orca/OpenOrca)) and a small internally-collected amount of natural queries from SEA native speakers, which **adapt to the local cultural norms, customs, styles and laws in these regions**, as well as other SFT enhancement techniques (to be revealed later).
 Our customized SFT process helps enhance our models' ability to understand, respond and serve communities whose languages are often neglected by previous [English-dominant LLMs](https://arxiv.org/abs/2307.09288), while outperforming existing polyglot LLMs, like [BLOOM](https://arxiv.org/abs/2211.05100) or [PolyLM](https://arxiv.org/pdf/2307.06018.pdf).
-Our [first released SeaLLM](https://huggingface.co/spaces/DAMO-NLP-SG/damo-seal-v0) supports Vietnamese 🇻🇳, Indonesian 🇮🇩 and Thai 🇹🇭. Future verions endeavor to cover all languages spoken in South East Asia.
-<!-- - Model links: [DAMO-NLP-SG/seal-13b-chat-a](https://huggingface.co/DAMO-NLP-SG/seal-13b-chat-a) -->
 <blockquote style="color:red">
@@ -204,7 +200,7 @@ If you find our project useful, hope you can star our repo and cite our work as
 ```
 @article{damonlpsg2023seallm,
   author = {???},
-  title = {SeaLLM: A language model for South East Asian Languages},
   year = 2023,
 }
 ```

   <img src="seal_logo.png" width="200" />
 </p>
+# SeaLLMs - Large Language Models for Southeast Asia
 <p align="center">
+🤗 <a href="https://huggingface.co/spaces/SeaLLMs/SeaLLM-chat-13b-demo">Hugging Face DEMO</a>
 </p>
+We introduce SeaLLM - a family of language models optimized for Southeast Asian (SEA) languages. The SeaLLM-base models (to be released) were pre-trained from [Llama-2](https://huggingface.co/meta-llama/Llama-2-13b-hf), on a tailored publicly-available dataset, which comprises mainly Vietnamese 🇻🇳, Indonesian 🇮🇩 and Thai 🇹🇭 texts, along with those in English 🇬🇧 and Chinese 🇨🇳. The pre-training stage involves multiple stages with dynamic data control to preserve the original knowledge base of Llama-2 while gaining new abilities in SEA languages.
+The [SeaLLM-chat](https://huggingface.co/spaces/SeaLLMs/SeaLLM-chat-13b-demo) model underwent supervised finetuning (SFT) on a mix of public instruction data (e.g. [OpenORCA](https://huggingface.co/datasets/Open-Orca/OpenOrca)) and a small internally-collected amount of natural queries from SEA native speakers, which **adapt to the local cultural norms, customs, styles and laws in these regions**, as well as other SFT enhancement techniques (to be revealed later).
 Our customized SFT process helps enhance our models' ability to understand, respond and serve communities whose languages are often neglected by previous [English-dominant LLMs](https://arxiv.org/abs/2307.09288), while outperforming existing polyglot LLMs, like [BLOOM](https://arxiv.org/abs/2211.05100) or [PolyLM](https://arxiv.org/pdf/2307.06018.pdf).
+Our [first released SeaLLM](https://huggingface.co/spaces/SeaLLMs/SeaLLM-chat-13b-demo) supports Vietnamese 🇻🇳, Indonesian 🇮🇩 and Thai 🇹🇭. Future verions endeavor to cover all languages spoken in Southeast Asia.
 <blockquote style="color:red">
 ```
 @article{damonlpsg2023seallm,
   author = {???},
+  title = {SeaLLMs - Large Language Models for Southeast Asia},
   year = 2023,
 }
 ```