revit-coder-14b

Hero Banner

A fine-tuned Qwen3-14B specialized in Revit API code generation, IFC reasoning, and BIM development patterns.

An experiment in domain-specific fine-tuning demonstrating that focused training on 177,127 Revit/BIM examples can produce a specialized model for $48 in 8 hours on a single GPU. The model was validated on a 40-question Revit C# benchmark, showing competitive performance with frontier models on domain-specific tasks.

Key insight: This demonstrates the value of domain-specific fine-tuning for specialized use cases rather than claiming superiority over general-purpose frontier models.

GitHub: schauh11/revit-coder-14b - Benchmark suite, training scripts, and full results.

Training

Training Pipeline

Training Configuration

Spec Value
Base Model Qwen3-14B-Instruct
Method QLoRA (rank 64, alpha 128)
Framework Unsloth + HuggingFace TRL
Training Data 159,414 examples (90%)
Validation Data 8,856 examples (5%)
Test Data 8,857 examples (5%)
Epochs 3
Sequence Length 4096 tokens
Batch Size 16 effective (4 x 4 gradient accumulation)
Learning Rate 2e-4 (cosine schedule)
Warmup Steps 200
Optimizer AdamW 8-bit
Weight Decay 0.01
GPU NVIDIA B200 192GB
Training Time ~8 hours
Training Cost ~$48
LoRA Dropout 0 (required for Unsloth)
Early Stopping Patience 3 epochs
Random Seed 42
Packing Enabled

Why these hyperparameters?

  • QLoRA rank 64: Balances expressiveness with efficiency for domain-specific patterns
  • Packing enabled: Maximizes GPU utilization on variable-length sequences
  • Cosine schedule + warmup: Stable learning on technical documentation
  • Low dropout: Unsloth fast patching requires 0 dropout for optimized training
  • 4096 context: Covers typical Revit API code examples with context

Dataset Splits

Split Examples Percentage Purpose
Train 159,414 90% Model training
Validation 8,856 5% Hyperparameter tuning & early stopping
Test 8,857 5% Final evaluation (not used in this benchmark)

Split strategy: Stratified sampling by domain to maintain proportional representation across all 6 BIM domains. Random seed: 42.

Note: The 40-question benchmark is separate from the training data—it tests zero-shot generalization on new Revit API questions not seen during training.

Training Data Distribution

Data Distribution

Domain Records % Description
revit_csharp 143,060 72.7% Revit API C# code from docs, examples, references
ifc_reasoning 44,571 22.6% IFC topology, spatial hierarchies, BIM reasoning
aps_schema 4,980 2.5% APS/Forge cloud API patterns
revit_patterns 3,758 1.9% Development patterns (IUpdater, events, filters)
revit_python 285 0.1% pyRevit Python automation
mcp_tools 149 0.1% MCP tool definitions for AI-BIM integration

Why this distribution?

  • 72.7% revit_csharp: Reflects the primary use case—Revit plugin development is predominantly C#/.NET
  • 22.6% ifc_reasoning: BIM data exchange and interoperability are core to AEC workflows
  • Domain-tagged system prompts: Each domain uses specialized prompts to activate appropriate model behaviors

Data format: ChatML with domain-specific system prompts. Each record includes <|im_start|>system, <|im_start|>user, <|im_start|>assistant sections.

Sources: Revit API Docs 2025/2026, Revit SDK code examples, IFC/BIM specifications, Autodesk forums, APS SDK documentation.

Environmental Impact

Metric Value
Hardware 1x NVIDIA B200 192GB
Training time ~8 hours
Cloud cost ~$48 (RunPod)
CO2 estimate ~2.4 kg (based on US grid average)

Key takeaway: Domain-specific fine-tuning achieved competitive performance with <3% of the compute required to train frontier models from scratch.

Intended Use

Primary: Domain-specialized Revit API code generation. This is an experiment demonstrating that domain-specific fine-tuning can achieve competitive results with significantly less compute than training frontier models from scratch.

Capabilities:

  • Generate correct Revit C# code (FilteredElementCollector, Transaction patterns, BuiltInParameter)
  • Validate Revit API usage (catch missing Transactions, null checks, type filter issues)
  • Reason about IFC spatial hierarchies and property sets
  • Produce Revit development patterns (IExternalEventHandler, IUpdater, ISelectionFilter)

Limitations:

  • Optimized for Revit 2025/2026 (.NET 8) API, may not cover older API versions
  • Strongest on revit_csharp domain; weaker on IFC STEP format generation
  • Best results under 800 tokens; quality may degrade on very long outputs
  • Not a general-purpose coding model; use frontier models for non-Revit tasks
  • The benchmark comparison is asymmetric (fine-tuned vs. zero-shot); Claude with proper system prompts may perform differently

Benchmark Results

40-question Revit C# benchmark - pure code generation focused on practical API usage:

Model Avg Score Questions Scored Higher Parameters Inference
revit-coder-14b 0.800 25 of 40 14B Local (Ollama)
Claude Opus 4.6 0.793 15 of 40 ~100B+ API

Note: This comparison shows a fine-tuned specialist vs. a zero-shot generalist. The fine-tuned model naturally has advantages on this specific benchmark. In production with proper prompting and examples, Claude may outperform on complex tasks.

By Difficulty

Difficulty Count revit-coder-14b Claude Opus 4.6 Notes
Easy 9 0.800 0.796 Similar performance
Medium 19 0.839 0.801 Fine-tuned model shows strength on practical patterns
Hard 12 0.736 0.779 Claude shows strength on complex multi-class problems

Average Scores by Difficulty

Scoring Components Breakdown

All 40 questions and both models' full responses are published in BENCHMARK_FULL.md.

Benchmark Methodology

Data Independence: The 40 benchmark questions were held out from training data to ensure fair evaluation.

Automated Scoring: Each response is scored on three axes:

  • Signal Presence (40%): Fraction of expected domain keywords found (e.g., FilteredElementCollector, Transaction, IfcRelAggregates)
  • Code Quality (30%): Domain-specific structural checks (namespaces, class structure, API patterns)
  • Completeness (30%): Response length, code block formatting, error-free output

Composite = 0.4 × signal + 0.3 × quality + 0.3 × completeness

Important: No reference answers or human evaluation were used. Scores reflect structural patterns, not compilation or execution. This is automated evaluation only.

Asymmetric Comparison: The fine-tuned model received domain training; Claude did not. This tests whether domain-specific fine-tuning provides value, not which model is "better."

Usage

Ollama (Recommended)

# Pull or create the model
ollama run revit-coder-14b-f16

# Query
ollama run revit-coder-14b-f16 "Write C# code to collect all walls and group by type name"

Python (transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "schauh11/revit-coder-14b"  # HuggingFace repo
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

messages = [
    {"role": "system", "content": "You are a Revit API expert specialized in C# and .NET 8."},
    {"role": "user", "content": "Write code to get all rooms and their areas."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Unsloth (for inference with LoRA adapter)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="path/to/revit-coder-14b-lora",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

Citation

@misc{revit-coder-14b-2026,
  title={revit-coder-14b: Domain-Specialized Code Generation for Revit API},
  author={Sanjay Chauhan},
  year={2026},
  url={https://huggingface.co/schauh11/revit-coder-14b}
}

License

Apache 2.0, same as the base Qwen3-14B model.

Downloads last month
77
Safetensors
Model size
15B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for schauh11/revit-coder-14b

Finetuned
Qwen/Qwen3-14B
Finetuned
(218)
this model

Evaluation results