revit-coder-14b
A fine-tuned Qwen3-14B specialized in Revit API code generation, IFC reasoning, and BIM development patterns.
An experiment in domain-specific fine-tuning demonstrating that focused training on 177,127 Revit/BIM examples can produce a specialized model for $48 in 8 hours on a single GPU. The model was validated on a 40-question Revit C# benchmark, showing competitive performance with frontier models on domain-specific tasks.
Key insight: This demonstrates the value of domain-specific fine-tuning for specialized use cases rather than claiming superiority over general-purpose frontier models.
GitHub: schauh11/revit-coder-14b - Benchmark suite, training scripts, and full results.
Training
Training Configuration
| Spec | Value |
|---|---|
| Base Model | Qwen3-14B-Instruct |
| Method | QLoRA (rank 64, alpha 128) |
| Framework | Unsloth + HuggingFace TRL |
| Training Data | 159,414 examples (90%) |
| Validation Data | 8,856 examples (5%) |
| Test Data | 8,857 examples (5%) |
| Epochs | 3 |
| Sequence Length | 4096 tokens |
| Batch Size | 16 effective (4 x 4 gradient accumulation) |
| Learning Rate | 2e-4 (cosine schedule) |
| Warmup Steps | 200 |
| Optimizer | AdamW 8-bit |
| Weight Decay | 0.01 |
| GPU | NVIDIA B200 192GB |
| Training Time | ~8 hours |
| Training Cost | ~$48 |
| LoRA Dropout | 0 (required for Unsloth) |
| Early Stopping Patience | 3 epochs |
| Random Seed | 42 |
| Packing | Enabled |
Why these hyperparameters?
- QLoRA rank 64: Balances expressiveness with efficiency for domain-specific patterns
- Packing enabled: Maximizes GPU utilization on variable-length sequences
- Cosine schedule + warmup: Stable learning on technical documentation
- Low dropout: Unsloth fast patching requires 0 dropout for optimized training
- 4096 context: Covers typical Revit API code examples with context
Dataset Splits
| Split | Examples | Percentage | Purpose |
|---|---|---|---|
| Train | 159,414 | 90% | Model training |
| Validation | 8,856 | 5% | Hyperparameter tuning & early stopping |
| Test | 8,857 | 5% | Final evaluation (not used in this benchmark) |
Split strategy: Stratified sampling by domain to maintain proportional representation across all 6 BIM domains. Random seed: 42.
Note: The 40-question benchmark is separate from the training data—it tests zero-shot generalization on new Revit API questions not seen during training.
Training Data Distribution
| Domain | Records | % | Description |
|---|---|---|---|
| revit_csharp | 143,060 | 72.7% | Revit API C# code from docs, examples, references |
| ifc_reasoning | 44,571 | 22.6% | IFC topology, spatial hierarchies, BIM reasoning |
| aps_schema | 4,980 | 2.5% | APS/Forge cloud API patterns |
| revit_patterns | 3,758 | 1.9% | Development patterns (IUpdater, events, filters) |
| revit_python | 285 | 0.1% | pyRevit Python automation |
| mcp_tools | 149 | 0.1% | MCP tool definitions for AI-BIM integration |
Why this distribution?
- 72.7% revit_csharp: Reflects the primary use case—Revit plugin development is predominantly C#/.NET
- 22.6% ifc_reasoning: BIM data exchange and interoperability are core to AEC workflows
- Domain-tagged system prompts: Each domain uses specialized prompts to activate appropriate model behaviors
Data format: ChatML with domain-specific system prompts. Each record includes <|im_start|>system, <|im_start|>user, <|im_start|>assistant sections.
Sources: Revit API Docs 2025/2026, Revit SDK code examples, IFC/BIM specifications, Autodesk forums, APS SDK documentation.
Environmental Impact
| Metric | Value |
|---|---|
| Hardware | 1x NVIDIA B200 192GB |
| Training time | ~8 hours |
| Cloud cost | ~$48 (RunPod) |
| CO2 estimate | ~2.4 kg (based on US grid average) |
Key takeaway: Domain-specific fine-tuning achieved competitive performance with <3% of the compute required to train frontier models from scratch.
Intended Use
Primary: Domain-specialized Revit API code generation. This is an experiment demonstrating that domain-specific fine-tuning can achieve competitive results with significantly less compute than training frontier models from scratch.
Capabilities:
- Generate correct Revit C# code (FilteredElementCollector, Transaction patterns, BuiltInParameter)
- Validate Revit API usage (catch missing Transactions, null checks, type filter issues)
- Reason about IFC spatial hierarchies and property sets
- Produce Revit development patterns (IExternalEventHandler, IUpdater, ISelectionFilter)
Limitations:
- Optimized for Revit 2025/2026 (.NET 8) API, may not cover older API versions
- Strongest on revit_csharp domain; weaker on IFC STEP format generation
- Best results under 800 tokens; quality may degrade on very long outputs
- Not a general-purpose coding model; use frontier models for non-Revit tasks
- The benchmark comparison is asymmetric (fine-tuned vs. zero-shot); Claude with proper system prompts may perform differently
Benchmark Results
40-question Revit C# benchmark - pure code generation focused on practical API usage:
| Model | Avg Score | Questions Scored Higher | Parameters | Inference |
|---|---|---|---|---|
| revit-coder-14b | 0.800 | 25 of 40 | 14B | Local (Ollama) |
| Claude Opus 4.6 | 0.793 | 15 of 40 | ~100B+ | API |
Note: This comparison shows a fine-tuned specialist vs. a zero-shot generalist. The fine-tuned model naturally has advantages on this specific benchmark. In production with proper prompting and examples, Claude may outperform on complex tasks.
By Difficulty
| Difficulty | Count | revit-coder-14b | Claude Opus 4.6 | Notes |
|---|---|---|---|---|
| Easy | 9 | 0.800 | 0.796 | Similar performance |
| Medium | 19 | 0.839 | 0.801 | Fine-tuned model shows strength on practical patterns |
| Hard | 12 | 0.736 | 0.779 | Claude shows strength on complex multi-class problems |
All 40 questions and both models' full responses are published in BENCHMARK_FULL.md.
Benchmark Methodology
Data Independence: The 40 benchmark questions were held out from training data to ensure fair evaluation.
Automated Scoring: Each response is scored on three axes:
- Signal Presence (40%): Fraction of expected domain keywords found (e.g.,
FilteredElementCollector,Transaction,IfcRelAggregates) - Code Quality (30%): Domain-specific structural checks (namespaces, class structure, API patterns)
- Completeness (30%): Response length, code block formatting, error-free output
Composite = 0.4 × signal + 0.3 × quality + 0.3 × completeness
Important: No reference answers or human evaluation were used. Scores reflect structural patterns, not compilation or execution. This is automated evaluation only.
Asymmetric Comparison: The fine-tuned model received domain training; Claude did not. This tests whether domain-specific fine-tuning provides value, not which model is "better."
Usage
Ollama (Recommended)
# Pull or create the model
ollama run revit-coder-14b-f16
# Query
ollama run revit-coder-14b-f16 "Write C# code to collect all walls and group by type name"
Python (transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "schauh11/revit-coder-14b" # HuggingFace repo
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
messages = [
{"role": "system", "content": "You are a Revit API expert specialized in C# and .NET 8."},
{"role": "user", "content": "Write code to get all rooms and their areas."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Unsloth (for inference with LoRA adapter)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="path/to/revit-coder-14b-lora",
max_seq_length=4096,
load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
Citation
@misc{revit-coder-14b-2026,
title={revit-coder-14b: Domain-Specialized Code Generation for Revit API},
author={Sanjay Chauhan},
year={2026},
url={https://huggingface.co/schauh11/revit-coder-14b}
}
License
Apache 2.0, same as the base Qwen3-14B model.
- Downloads last month
- 77
Model tree for schauh11/revit-coder-14b
Evaluation results
- Composite Score (40 questions)self-reported0.800




