OctoMed-7B Digital Twin v1

A medical reasoning AI fine-tuned with GRPO (Group Relative Policy Optimization) for transparent clinical decision support. This model extends OctoMed's multimodal medical capabilities with enhanced reasoning chains.

Model Description

OctoMed-7B Digital Twin v1 is a 7-billion parameter medical language model fine-tuned using reinforcement learning from human feedback (RLHF). Built on top of OctoMed-7B, a state-of-the-art multimodal medical model, this variant specializes in:

Transparent Medical Reasoning: Uses <think>...</think> tags to show step-by-step clinical reasoning
Evidence-Based Responses: Trained to provide accurate, semantically grounded medical information
Clinical Decision Support: Assists both patients and healthcare professionals with medical queries
Multimodal Capabilities: Inherits OctoMed's vision-language understanding (image analysis requires base model)

Key Features

🧠 Structured Reasoning: Explicit reasoning chains for medical transparency
🎯 GRPO Training: Adaptive reward balancing for format (40%) and semantic accuracy (60%)
💾 Parameter Efficient: LoRA adapters with rank 32 (~0.5% trainable parameters)
⚡ 4-bit Quantization: Optimized for deployment on consumer hardware
🏥 Medical Specialization: Fine-tuned on 500 medical reasoning examples

Model Architecture

Component	Specification
Base Model	OctoMed/OctoMed-7B
Parameters	7B (base) + 32M (LoRA adapters)
Context Length	4096 tokens
Quantization	4-bit NF4
LoRA Rank	32
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Method	GRPO (Group Relative Policy Optimization)

Training Details

Training Configuration

Training Steps: 200 (100 warmup steps)
Batch Size: 4 per device
Gradient Accumulation: 4 steps (effective batch = 16)
Learning Rate: 5e-5 with cosine scheduler
Optimizer: AdamW (8-bit)
Mixed Precision: BF16
Dataset: FreedomIntelligence/medical-o1-reasoning-SFT (500 examples)

Reward Functions

The model was trained using two complementary reward signals:

Format Reward (40% final weight):
- Encourages use of <think> reasoning tags
- Rewards substantial reasoning (10+ words)
- Scaled rewards for partial compliance
Semantic Reward (60% final weight):
- Cosine similarity to ground truth answers
- Uses all-MiniLM-L6-v2 for embeddings
- Focuses on answer accuracy, not reasoning style

Reward weights were adaptively adjusted during training from 90%/10% to 40%/60% to balance format adherence with semantic accuracy.

Usage

Using Transformers (Standard Method)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "OctoMed/OctoMed-7B",
    load_in_4bit=True,
    device_map="auto"
)

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "AhmedSSoliman/octomed-7b-digital-twin-v1")
tokenizer = AutoTokenizer.from_pretrained("AhmedSSoliman/octomed-7b-digital-twin-v1")

# Generate response
question = "What are the early signs of sepsis and how should it be managed?"
messages = [
    {"role": "system", "content": "You are a medical AI assistant. Think through your reasoning step-by-step using <think> tags before providing your final answer."},
    {"role": "user", "content": question}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using Unsloth (Optimized & Recommended)

from unsloth import FastLanguageModel

# Load model with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="AhmedSSoliman/octomed-7b-digital-twin-v1",
    max_seq_length=4096,
    dtype=None,  # Auto-detect best dtype
    load_in_4bit=True,
    fast_inference=False  # Python 3.13+ compatibility
)

# Enable inference mode for 2x faster generation
FastLanguageModel.for_inference(model)

# Generate response
messages = [
    {"role": "user", "content": "Explain the pathophysiology of type 2 diabetes."}
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(
    input_ids=inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Outputs

Question: What causes chest pain in a 45-year-old with hypertension?

Model Response:

<think>
A 45-year-old patient with hypertension presenting with chest pain requires immediate evaluation for cardiac causes. Key considerations:

1. Cardiac causes (most urgent):
   - Acute coronary syndrome (ACS)
   - Myocardial infarction
   - Unstable angina

2. Risk factors present:
   - Age (45, middle-aged male risk)
   - Known hypertension (damages vessels)

3. Differential diagnosis:
   - Costochondritis (musculoskeletal)
   - GERD (gastrointestinal)
   - Pulmonary embolism (respiratory)

Given hypertension history, cardiac workup is priority.
</think>

Chest pain in a 45-year-old with hypertension requires urgent evaluation for acute coronary syndrome. Immediate steps include:

1. **Emergency Assessment**: Call emergency services (911) if severe
2. **ECG**: Within 10 minutes of presentation
3. **Cardiac Biomarkers**: Troponin levels
4. **Risk Stratification**: HEART score or similar

Hypertension is a major risk factor for coronary artery disease. Do not delay seeking medical attention—chest pain in this context could indicate a heart attack requiring immediate intervention.

Hardware Requirements

Configuration	VRAM Required	Speed
4-bit (Recommended)	~14 GB	Fast
8-bit	~28 GB	Medium
FP16	~56 GB	Slow

Recommended Setup:

GPU: NVIDIA RTX 3090/4090, A100, or similar
RAM: 32GB+ system memory
Python: 3.9-3.13
CUDA: 11.8+

Limitations & Disclaimers

⚠️ Medical Disclaimer

THIS MODEL IS FOR RESEARCH AND EDUCATIONAL PURPOSES ONLY. IT IS NOT A SUBSTITUTE FOR PROFESSIONAL MEDICAL ADVICE, DIAGNOSIS, OR TREATMENT.

Not FDA Approved: This AI has not been evaluated or approved by any regulatory body
No Medical License: The model cannot practice medicine or replace licensed healthcare providers
Potential Errors: AI outputs may contain inaccuracies, hallucinations, or outdated information
No Emergency Use: Never use this model for medical emergencies—call emergency services immediately
Always Consult Professionals: Seek advice from qualified healthcare providers for medical decisions

Known Limitations

Training Data Cutoff: Knowledge may not reflect the latest medical research
Reasoning Artifacts: <think> tags may sometimes contain verbose or redundant reasoning
Multimodal Gap: This LoRA adapter focuses on text; image analysis requires full base model
Demographic Bias: Medical datasets may underrepresent certain populations
Context Window: 4096 tokens limits handling of very long medical histories

Evaluation

The model was evaluated on clinical reasoning tasks with the following metrics:

Format Compliance: 85% of responses properly use reasoning tags
Semantic Similarity: Average 0.72 cosine similarity to ground truth
Reasoning Quality: Median 45 words per reasoning chain
Response Coherence: Qualitatively assessed as clear and structured

Note: Formal clinical validation has not been performed.

Citation

If you use this model in your research, please cite:

@misc{octomed-7b-digital-twin-v1,
  author = {Ahmed S. Soliman},
  title = {OctoMed-7B Digital Twin v1: GRPO-Enhanced Medical Reasoning},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/AhmedSSoliman/octomed-7b-digital-twin-v1}},
  note = {Fine-tuned with Group Relative Policy Optimization for transparent clinical reasoning}
}

Also cite the base OctoMed model:

@misc{octomed2025,
  title={OctoMed: Multimodal Medical AI},
  author={OctoMed Team},
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/OctoMed/OctoMed-7B}}
}

Acknowledgments

Base Model: OctoMed-7B by the OctoMed Team
Training Framework: Unsloth for efficient LoRA training
Dataset: FreedomIntelligence for medical reasoning data
RL Algorithm: TRL library's GRPO implementation

License

This model inherits the Apache 2.0 license from OctoMed-7B. Use responsibly and in compliance with medical AI regulations in your jurisdiction.

Model Card Contact

For questions or issues, please contact:

GitHub: AhmedSSoliman
HuggingFace: AhmedSSoliman

Developed: December 2025
Framework: Unsloth + TRL + Transformers
Training Method: GRPO (Group Relative Policy Optimization)

Downloads last month: 39

Model tree for AhmedSSoliman/octomed-7b-digital-twin-v1

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

OctoMed/OctoMed-7B

Adapter

(1)

this model

AhmedSSoliman
/

octomed-7b-digital-twin-v1