Bruno Swarm Models

7 abliterated Qwen2.5-Coder models for multi-agent software development using CrewAI + Ollama.

Created with Bruno - neural behavior modification via contrastive activation analysis and orthogonalization.

Models

Model	Base	Size	Role
`orchestrator-14b-f16.gguf`	Qwen2.5-Coder-14B-Instruct	28 GB	Senior Architect / Project Manager
`frontend-3b-f16.gguf`	Qwen2.5-Coder-3B-Instruct	5.8 GB	React / TypeScript / Tailwind
`backend-3b-f16.gguf`	Qwen2.5-Coder-3B-Instruct	5.8 GB	FastAPI / PostgreSQL / async
`test-3b-f16.gguf`	Qwen2.5-Coder-3B-Instruct	5.8 GB	pytest / coverage / edge cases
`security-3b-f16.gguf`	Qwen2.5-Coder-3B-Instruct	5.8 GB	OWASP / vulnerability assessment
`docs-3b-f16.gguf`	Qwen2.5-Coder-3B-Instruct	5.8 GB	API docs / README / guides
`devops-3b-f16.gguf`	Qwen2.5-Coder-3B-Instruct	5.8 GB	Docker / CI-CD / IaC

Total: ~63 GB (all F16 precision GGUF)

Abliteration Details

Each model was independently abliterated using Bruno to reduce refusal behavior while preserving coding capabilities. The 6 specialists share the same base model (Qwen2.5-Coder-3B-Instruct) but have different abliteration weights from separate optimization runs.

Orchestrator (14B):

KL divergence: 0.47 (from base)
Refusal reduction: 63/67 prompts answered (6% reduction)
Optuna trials: 50

Specialists (3B):

Each independently optimized for their domain
All retain full coding capability

Quick Start

1. Download models and Modelfiles

# Install git-lfs
git lfs install

# Clone (63 GB download)
git clone https://huggingface.co/rawcell/bruno-swarm-models
cd bruno-swarm-models

2. Import into Ollama

Update the FROM paths in each Modelfile to point to your local GGUF files, then:

# Import each model
ollama create orchestrator -f modelfiles/Modelfile.orchestrator
ollama create frontend -f modelfiles/Modelfile.frontend
ollama create backend -f modelfiles/Modelfile.backend
ollama create test -f modelfiles/Modelfile.test
ollama create security -f modelfiles/Modelfile.security
ollama create docs -f modelfiles/Modelfile.docs
ollama create devops -f modelfiles/Modelfile.devops

3. Run with bruno-swarm CLI

pip install bruno-ai[swarm]
bruno-swarm run --task "Build a REST API with authentication"

Or use flat mode to select specific specialists:

bruno-swarm run --task "Write unit tests for auth module" --flat --agents test,security

Ollama Configuration

For multi-model operation, set these environment variables before starting Ollama:

export OLLAMA_MAX_LOADED_MODELS=3
export OLLAMA_KEEP_ALIVE=30m

Hardware Requirements

Full swarm (hierarchical): 40+ GB VRAM (orchestrator 28GB + 1 specialist at a time)
Specialists only (flat): 8+ GB VRAM (one 3B model at a time)
All models loaded: 63 GB VRAM (A100 80GB or similar)

Modelfiles

The modelfiles/ directory contains Ollama Modelfile configurations for each model with tuned parameters:

num_ctx 8192 (required for CrewAI system prompts)
num_predict 2048 for specialists, 4096 for orchestrator
temperature 0.7, top_p 0.9, top_k 40

License

Apache 2.0 (same as base Qwen2.5-Coder models)

Downloads last month: -

GGUF

Model size

3B params

Architecture

qwen2

Hardware compatibility

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rawcell/bruno-swarm-models

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-Coder-14B

Finetuned

Qwen/Qwen2.5-Coder-14B-Instruct

Quantized

(87)

this model