160B-NotQuiteAMoE / README.md

marcuscedricridia

Update README.md

580ecdf verified 8 months ago

preview code

raw

history blame contribute delete

859 Bytes

metadata

base_model:
  - marcuscedricridia/Springer1.1-32B-Qwen2.5-Extras
  - marcuscedricridia/Springer1.1-32B-Qwen2.5-Reasoning
  - marcuscedricridia/Springer-32B-Restore
  - marcuscedricridia/Springer1.1-32B-Qwen2.5-Coder
  - marcuscedricridia/Springer1.1-32B-Qwen2.5-RP
library_name: transformers
tags:
  - mergekit
  - merge
  - dense
license: apache-2.0
language:
  - en
  - zh

This is a passthrough experiment with ~158B (160B) params. We merged all 64 layers from each model—no picking, full overlap. It's rough, unfiltered, and definitely experimental. This version is meant to test the concept.

Goal? MoE-level performance without being a MoE.

Does it work? 🤷‍♂️ We're finding out.

Try it. Break it. Let us know.

We don't recommend using this model. It's huge, needs serious hardware — more than we can run ourselves. If you must try it, use the cloud.