Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
wayyresearch
/
aetheris
like
2
Follow
Wayy Research Co.
2
Text Generation
PyTorch
65 languages
mamba
ssm
state-space-model
mixture-of-experts
Mixture of Experts
multilingual
distillation
knowledge-distillation
aya
hybrid-architecture
wayy-research
arxiv:
2312.00752
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
main
aetheris
Ctrl+K
Ctrl+K
1 contributor
History:
59 commits
rcgalbo
Stage 3 SFT best (step 3550, loss 3.6461) - pruned 80K vocab
9ef2709
verified
2 days ago
aetheris
Sync latest aetheris source code
4 days ago
tokenizer
Add Aya tokenizer files (avoid gated repo dependency)
4 days ago
.gitattributes
Safe
1.58 kB
Add Aya tokenizer files (avoid gated repo dependency)
4 days ago
README.md
Safe
9.19 kB
Update model card with full architecture and training details
4 days ago
config.yaml
Safe
316 Bytes
Full vocab config for SFT model
4 days ago
pytorch_model.pt
pickle
Detected Pickle imports (3)
"collections.OrderedDict"
,
"torch.FloatStorage"
,
"torch._utils._rebuild_tensor_v2"
What is a pickle import?
2.15 GB
xet
Stage 3 SFT best (step 3550, loss 3.6461) - pruned 80K vocab
2 days ago
stage1_checkpoint.pt
Suspicious
pickle
Detected Pickle imports (4)
"torch.BFloat16Storage"
,
"collections.OrderedDict"
,
"torch._utils._rebuild_tensor_v2"
,
"torch.FloatStorage"
What is a pickle import?
1.64 GB
xet
Stage 1 checkpoint: [Step 50/20000] loss=7.7500
8 days ago
stage1_metadata.json
Safe
414 Bytes
Stage 1 checkpoint: [Step 50/20000] loss=7.7500
8 days ago
stage2_best.pt
Safe
pickle
Detected Pickle imports (3)
"collections.OrderedDict"
,
"torch.BFloat16Storage"
,
"torch._utils._rebuild_tensor_v2"
What is a pickle import?
1.44 GB
xet
Upload final Stage 2 best checkpoint (loss=2.7305, 20K steps)
7 days ago
stage2_checkpoint.pt
Suspicious
pickle
Detected Pickle imports (3)
"collections.OrderedDict"
,
"torch._utils._rebuild_tensor_v2"
,
"torch.BFloat16Storage"
What is a pickle import?
1.44 GB
xet
Stage 2 checkpoint: [Step 18500/20000] loss=3.1250
7 days ago
stage2_final.pt
Safe
pickle
Detected Pickle imports (3)
"collections.OrderedDict"
,
"torch._utils._rebuild_tensor_v2"
,
"torch.BFloat16Storage"
What is a pickle import?
1.44 GB
xet
Upload Stage 2 final checkpoint (step 20000)
7 days ago
stage2_metadata.json
Safe
263 Bytes
Update Stage 2 metadata: COMPLETE, best loss=2.7305
7 days ago
student_config.yaml
Safe
668 Bytes
Stage 1 initial: step 1000, loss=0.29, cka=0.60
8 days ago
training_config.yaml
Safe
2.74 kB
Stage 1 initial: step 1000, loss=0.29, cka=0.60
8 days ago