Update README.md
Browse files
README.md
CHANGED
|
@@ -8,16 +8,29 @@ language:
|
|
| 8 |
# Intermediate Checkpoints Release
|
| 9 |
|
| 10 |
For the first time among Korean-targeted LLMs, we’re releasing **intermediate checkpoints** from the Tri family—**0.5B**, **1.9B**, and **7B**—to advance research on LLM training dynamics. We release checkpoints at regular step intervals— **≈20B tokens (0.5B), ≈40B (1.9B), and ≈160B (7B & 70B)** —enabling consistent analysis of training dynamics. Each step’s release is distinguished by its **branch name**.
|
| 11 |
-
|
| 12 |
-
- You can grab the **Tri-7B** model here: [https://huggingface.co/trillionlabs/Tri-7B](https://huggingface.co/trillionlabs/Tri-7B?utm_source=chatgpt.com).
|
| 13 |
-
- You can grab the **Tri-70B** preview model here: [https://huggingface.co/trillionlabs/Tri-70B-preview-SFT](https://huggingface.co/trillionlabs/Tri-70B-preview-SFT).
|
| 14 |
-
|
| 15 |
We’re also sharing the **0.5B** and **1.9B** runs—originally produced for system bring-up but now available as valuable artifacts for analyzing training behavior at smaller scales.
|
| 16 |
|
| 17 |
You can browse all intermediate checkpoints here:
|
| 18 |
-
- **Tri-0.5B** → [https://huggingface.co/trillionlabs/0.
|
| 19 |
-
- **Tri-1.9B** → [https://huggingface.co/trillionlabs/1.
|
| 20 |
-
- **Tri-7B** → [https://huggingface.co/trillionlabs/
|
| 21 |
-
- **Tri-70B(SFT Preview)** → [https://huggingface.co/trillionlabs/Tri-70B-
|
| 22 |
|
| 23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
# Intermediate Checkpoints Release
|
| 9 |
|
| 10 |
For the first time among Korean-targeted LLMs, we’re releasing **intermediate checkpoints** from the Tri family—**0.5B**, **1.9B**, and **7B**—to advance research on LLM training dynamics. We release checkpoints at regular step intervals— **≈20B tokens (0.5B), ≈40B (1.9B), and ≈160B (7B & 70B)** —enabling consistent analysis of training dynamics. Each step’s release is distinguished by its **branch name**.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
We’re also sharing the **0.5B** and **1.9B** runs—originally produced for system bring-up but now available as valuable artifacts for analyzing training behavior at smaller scales.
|
| 12 |
|
| 13 |
You can browse all intermediate checkpoints here:
|
| 14 |
+
- **Tri-0.5B** → [https://huggingface.co/trillionlabs/0.5B-Intermediate-Checkpoints](https://huggingface.co/trillionlabs/0.5B-Intermediate-Checkpoints)
|
| 15 |
+
- **Tri-1.9B** → [https://huggingface.co/trillionlabs/1.9B-Intermediate-Checkpoints](https://huggingface.co/trillionlabs/1.9B-Intermediate-Checkpoints)
|
| 16 |
+
- **Tri-7B** → [https://huggingface.co/trillionlabs/Tri-7B-Intermediate-Checkpoints](https://huggingface.co/trillionlabs/Tri-7B-Intermediate-Checkpoints)
|
| 17 |
+
- **Tri-70B(SFT Preview)** → [https://huggingface.co/trillionlabs/Tri-70B-Intermediate-Checkpoints](https://huggingface.co/trillionlabs/Tri-70B-Intermediate-Checkpoints)
|
| 18 |
|
| 19 |
+
Feel free to check out the full Tri-series collection here:
|
| 20 |
+
- https://huggingface.co/collections/trillionlabs/tri-series-687fa9ff7eb23e8ba847ef93
|
| 21 |
+
|
| 22 |
+
Dive into the full details—including training configuration and loss curves —on our [blog](BLOG LINK).
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
# Usage
|
| 27 |
+
|
| 28 |
+
```python
|
| 29 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 30 |
+
|
| 31 |
+
INTERMEDIATE_STEP = "0000020000"
|
| 32 |
+
model = AutoModelForCausalLM.from_pretrained('trillionlabs/Tri-70B-Intermediate-Checkpoints', revision=INTERMEDIATE_STEP, trust_remote_code=True)
|
| 33 |
+
tokenizer = AutoTokenizer.from_pretrained('trillionlabs/Tri-70B-Intermediate-Checkpoints', revision=INTERMEDIATE_STEP, trust_remote_code=True)
|
| 34 |
+
|
| 35 |
+
...
|
| 36 |
+
```
|