metadata
base_model:
- marcuscedricridia/Springer1.1-32B-Qwen2.5-Extras
- marcuscedricridia/Springer1.1-32B-Qwen2.5-Reasoning
- marcuscedricridia/Springer-32B-Restore
- marcuscedricridia/Springer1.1-32B-Qwen2.5-Coder
- marcuscedricridia/Springer1.1-32B-Qwen2.5-RP
library_name: transformers
tags:
- mergekit
- merge
- dense
license: apache-2.0
language:
- en
- zh
This is a passthrough experiment with ~158B (160B) params. We merged all 64 layers from each model—no picking, full overlap. It's rough, unfiltered, and definitely experimental. This version is meant to test the concept.
Goal? MoE-level performance without being a MoE.
Does it work? 🤷♂️ We're finding out.
Try it. Break it. Let us know.
We don't recommend using this model. It's huge, needs serious hardware — more than we can run ourselves. If you must try it, use the cloud.