YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

USLaP Mistral v22

Universal Scientific Laws and Principles (USLaP) โ€” Fine-tuned Mistral-7B for scientific terminology validation against Qur'anic Arabic roots.

Purpose

Detects and rejects contaminated scientific terminology (Persian, Greek, Latin) and provides Qur'anic alternatives.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.2", 
    device_map="auto", torch_dtype="auto"
)
model = PeftModel.from_pretrained(base, "uslap/uslap-mistral-v22")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

prompt = "[INST] What is the Arabic term for geometry? [/INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Expected Output

โŒ REJECTED: "geometry" (Greek)
โŒ REJECTED: "ู‡ูŽู†ู’ุฏูŽุณูŽุฉ" (handasa) โ€” PERSIAN CONTAMINATION

โœ… USE INSTEAD: ุนูู„ู’ู… ุงู„ุชูŽู‘ู‚ู’ุฏููŠุฑ ('Ilm al-Taqdฤซr)
Root: ู‚ ุฏ ุฑ (q-d-r) โ€” Qur'anic: 54:49

Training

  • Base: Mistral-7B-Instruct-v0.2
  • Method: LoRA (r=32)
  • Dataset: 2,680 validated entries
  • Final Loss: 0.069-0.116

Framework

  • TRL: 0.27.2
  • Transformers: 5.0.0
  • PEFT: LoRA adapter

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support