YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
USLaP Mistral v22
Universal Scientific Laws and Principles (USLaP) โ Fine-tuned Mistral-7B for scientific terminology validation against Qur'anic Arabic roots.
Purpose
Detects and rejects contaminated scientific terminology (Persian, Greek, Latin) and provides Qur'anic alternatives.
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.2",
device_map="auto", torch_dtype="auto"
)
model = PeftModel.from_pretrained(base, "uslap/uslap-mistral-v22")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
prompt = "[INST] What is the Arabic term for geometry? [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Expected Output
โ REJECTED: "geometry" (Greek)
โ REJECTED: "ููููุฏูุณูุฉ" (handasa) โ PERSIAN CONTAMINATION
โ
USE INSTEAD: ุนูููู
ุงูุชููููุฏููุฑ ('Ilm al-Taqdฤซr)
Root: ู ุฏ ุฑ (q-d-r) โ Qur'anic: 54:49
Training
- Base: Mistral-7B-Instruct-v0.2
- Method: LoRA (r=32)
- Dataset: 2,680 validated entries
- Final Loss: 0.069-0.116
Framework
- TRL: 0.27.2
- Transformers: 5.0.0
- PEFT: LoRA adapter
License
Apache 2.0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support