About the Model
This is a fine-tuned version of Qwen3-VL-8B-Instruct specialized for electronic schematic understanding.
The model is trained to read schematic images and extract exact component information as it appears in the diagram, rather than generating generic component categories. During fine-tuning, the model learns to map visual schematic elements to:
- Component identifiers and part numbers (e.g.
ATMEGA328P-PU) - Footprint and library names (e.g.
7.62MM-3P) - Net and power labels (e.g.
+5V,GND) - Other visible schematic text and symbols
Unlike general vision-language models, this fine-tuned model is optimized for precision copying of schematic labels, making it suitable for downstream tasks such as BOM generation, schematic analysis, CAD migration, and hardware documentation.
The model operates in a causal generation setting, taking a schematic image and a short instruction prompt, and producing structured text outputs such as component lists, YAML/JSON metadata, or raw schematic text.
Usage
import torch
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image
MODEL_ID = "kingabzpro/qwen3vl-open-schematics-lora" # change me
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForVision2Seq.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16,
device_map="auto",
).eval()
def build_prompt(example):
name = example.get("name") or "Unknown project"
ftype = example.get("type") or "unknown format"
return (
f"Project: {name}\nFormat: {ftype}\n"
"From the schematic image, extract all component labels and identifiers exactly as shown "
"(part numbers, values, footprints, net labels like +5V/GND).\n"
"Output only a comma-separated list. Do not generalize or add extra text."
)
def run_inference(model_, example, max_new_tokens=256):
prompt = build_prompt(example)
messages = [{
"role": "user",
"content": [
{"type": "image", "image": example["image"]},
{"type": "text", "text": prompt},
],
}]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
).to(model_.device)
with torch.inference_mode():
out = model_.generate(**inputs, max_new_tokens=max_new_tokens, do_sample=False)
gen = out[0][inputs["input_ids"].shape[1]:]
return processor.decode(gen, skip_special_tokens=True)
# ---- Small usage example ----
example = {
"name": "Arduino-like Board",
"type": "kicad",
"image": Image.open("schematic.png").convert("RGB"),
}
print(run_inference(model, example))
Results
This model is a fine-tuned version of Qwen3-VL-8B-Instruct trained specifically to understand electronics schematics and extract component information directly from schematic images.
Compared to the base model, the fine-tuned model is more focused on relevant schematic entities (components, nets, identifiers) instead of raw pin-level text.
Before (Base Qwen3-VL)
The base model reads a lot of schematic text, but mixes pin names, signals, and low-level labels:
R1,10kΩ,PC6_RESET#,PC6_ADC0,PC6_ADC1,...,PB7_XTAL2,VCC,AVCC,AREF,GND,+5V,
C1,22pF,C2,100nF,C3,22pF,X1,16MHz,U1,ATMEGA328P-PU,...
After (Fine-tuned)
After fine-tuning (1 epoch, ~800 samples), the model outputs a cleaner, more task-focused list:
ATMEGA328P-PU, +5V, GND, R, C, C16MHz,
SERVO_A, SERVO_B, SERVO_C, SERVO_D, SERVO_E, SERVO_F
Target (Dataset)
The training target focuses on component identifiers, footprints, and net labels:
+5V, 7.62MM-3P, 7.62MM-3P_1, ..., ATMEGA328P-PU, ATMEGA328P-PU_1,
GND, MBB02070C1002FCT00, ..., Y5P102K2KV16CC0224_2
Even with a small dataset and a single training epoch, the fine-tuned model already shows improved semantic filtering toward schematic-level components, forming a strong base for further refinement with more data and stricter target alignment.
Model tree for kingabzpro/qwen3vl-open-schematics-lora
Base model
Qwen/Qwen3-VL-8B-Instruct