revit-coder-14b

A fine-tuned Qwen3-14B specialized in Revit API code generation, IFC reasoning, and BIM development patterns.

An experiment in domain-specific fine-tuning demonstrating that focused training on 177,127 Revit/BIM examples can produce a specialized model for $48 in 8 hours on a single GPU. The model was validated on a 40-question Revit C# benchmark, showing competitive performance with frontier models on domain-specific tasks.

Key insight: This demonstrates the value of domain-specific fine-tuning for specialized use cases rather than claiming superiority over general-purpose frontier models.

GitHub: schauh11/revit-coder-14b - Benchmark suite, training scripts, and full results.

Training

Training Configuration

Spec	Value
Base Model	Qwen3-14B-Instruct
Method	QLoRA (rank 64, alpha 128)
Framework	Unsloth + HuggingFace TRL
Training Data	159,414 examples (90%)
Validation Data	8,856 examples (5%)
Test Data	8,857 examples (5%)
Epochs	3
Sequence Length	4096 tokens
Batch Size	16 effective (4 x 4 gradient accumulation)
Learning Rate	2e-4 (cosine schedule)
Warmup Steps	200
Optimizer	AdamW 8-bit
Weight Decay	0.01
GPU	NVIDIA B200 192GB
Training Time	~8 hours
Training Cost	~$48
LoRA Dropout	0 (required for Unsloth)
Early Stopping Patience	3 epochs
Random Seed	42
Packing	Enabled

Why these hyperparameters?

QLoRA rank 64: Balances expressiveness with efficiency for domain-specific patterns
Packing enabled: Maximizes GPU utilization on variable-length sequences
Cosine schedule + warmup: Stable learning on technical documentation
Low dropout: Unsloth fast patching requires 0 dropout for optimized training
4096 context: Covers typical Revit API code examples with context

Dataset Splits

Split	Examples	Percentage	Purpose
Train	159,414	90%	Model training
Validation	8,856	5%	Hyperparameter tuning & early stopping
Test	8,857	5%	Final evaluation (not used in this benchmark)

Split strategy: Stratified sampling by domain to maintain proportional representation across all 6 BIM domains. Random seed: 42.

Note: The 40-question benchmark is separate from the training data—it tests zero-shot generalization on new Revit API questions not seen during training.

Training Data Distribution

Domain	Records	%	Description
revit_csharp	143,060	72.7%	Revit API C# code from docs, examples, references
ifc_reasoning	44,571	22.6%	IFC topology, spatial hierarchies, BIM reasoning
aps_schema	4,980	2.5%	APS/Forge cloud API patterns
revit_patterns	3,758	1.9%	Development patterns (IUpdater, events, filters)
revit_python	285	0.1%	pyRevit Python automation
mcp_tools	149	0.1%	MCP tool definitions for AI-BIM integration

Why this distribution?

72.7% revit_csharp: Reflects the primary use case—Revit plugin development is predominantly C#/.NET
22.6% ifc_reasoning: BIM data exchange and interoperability are core to AEC workflows
Domain-tagged system prompts: Each domain uses specialized prompts to activate appropriate model behaviors

Sources: Revit API Docs 2025/2026, Revit SDK code examples, IFC/BIM specifications, Autodesk forums, APS SDK documentation.

Environmental Impact

Metric	Value
Hardware	1x NVIDIA B200 192GB
Training time	~8 hours
Cloud cost	~$48 (RunPod)
CO2 estimate	~2.4 kg (based on US grid average)

Key takeaway: Domain-specific fine-tuning achieved competitive performance with <3% of the compute required to train frontier models from scratch.

Intended Use

Primary: Domain-specialized Revit API code generation. This is an experiment demonstrating that domain-specific fine-tuning can achieve competitive results with significantly less compute than training frontier models from scratch.

Capabilities:

Generate correct Revit C# code (FilteredElementCollector, Transaction patterns, BuiltInParameter)
Validate Revit API usage (catch missing Transactions, null checks, type filter issues)
Reason about IFC spatial hierarchies and property sets
Produce Revit development patterns (IExternalEventHandler, IUpdater, ISelectionFilter)

Limitations:

Optimized for Revit 2025/2026 (.NET 8) API, may not cover older API versions
Strongest on revit_csharp domain; weaker on IFC STEP format generation
Best results under 800 tokens; quality may degrade on very long outputs
Not a general-purpose coding model; use frontier models for non-Revit tasks
The benchmark comparison is asymmetric (fine-tuned vs. zero-shot); Claude with proper system prompts may perform differently

Benchmark Results

40-question Revit C# benchmark - pure code generation focused on practical API usage:

Model	Avg Score	Questions Scored Higher	Parameters	Inference
revit-coder-14b	0.800	25 of 40	14B	Local (Ollama)
Claude Opus 4.6	0.793	15 of 40	~100B+	API

Note: This comparison shows a fine-tuned specialist vs. a zero-shot generalist. The fine-tuned model naturally has advantages on this specific benchmark. In production with proper prompting and examples, Claude may outperform on complex tasks.

By Difficulty

Difficulty	Count	revit-coder-14b	Claude Opus 4.6	Notes
Easy	9	0.800	0.796	Similar performance
Medium	19	0.839	0.801	Fine-tuned model shows strength on practical patterns
Hard	12	0.736	0.779	Claude shows strength on complex multi-class problems

All 40 questions and both models' full responses are published in BENCHMARK_FULL.md.

Benchmark Methodology

Data Independence: The 40 benchmark questions were held out from training data to ensure fair evaluation.

Automated Scoring: Each response is scored on three axes:

Signal Presence (40%): Fraction of expected domain keywords found (e.g., FilteredElementCollector, Transaction, IfcRelAggregates)
Code Quality (30%): Domain-specific structural checks (namespaces, class structure, API patterns)
Completeness (30%): Response length, code block formatting, error-free output

Composite = 0.4 × signal + 0.3 × quality + 0.3 × completeness

Important: No reference answers or human evaluation were used. Scores reflect structural patterns, not compilation or execution. This is automated evaluation only.

Asymmetric Comparison: The fine-tuned model received domain training; Claude did not. This tests whether domain-specific fine-tuning provides value, not which model is "better."

Usage

Ollama (Recommended)

# Pull or create the model
ollama run revit-coder-14b-f16

# Query
ollama run revit-coder-14b-f16 "Write C# code to collect all walls and group by type name"

Python (transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "schauh11/revit-coder-14b"  # HuggingFace repo
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

messages = [
    {"role": "system", "content": "You are a Revit API expert specialized in C# and .NET 8."},
    {"role": "user", "content": "Write code to get all rooms and their areas."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Unsloth (for inference with LoRA adapter)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="path/to/revit-coder-14b-lora",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

Citation

@misc{revit-coder-14b-2026,
  title={revit-coder-14b: Domain-Specialized Code Generation for Revit API},
  author={Sanjay Chauhan},
  year={2026},
  url={https://huggingface.co/schauh11/revit-coder-14b}
}

License

Apache 2.0, same as the base Qwen3-14B model.

Downloads last month: 77

Safetensors

Model size

15B params

Tensor type

F16

Model tree for schauh11/revit-coder-14b

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Finetuned

(218)

this model

Evaluation results

Composite Score (40 questions)
self-reported

0.800