OctoMed-7B Digital Twin v1

A medical reasoning AI fine-tuned with GRPO (Group Relative Policy Optimization) for transparent clinical decision support. This model extends OctoMed's multimodal medical capabilities with enhanced reasoning chains.

Model Description

OctoMed-7B Digital Twin v1 is a 7-billion parameter medical language model fine-tuned using reinforcement learning from human feedback (RLHF). Built on top of OctoMed-7B, a state-of-the-art multimodal medical model, this variant specializes in:

  • Transparent Medical Reasoning: Uses <think>...</think> tags to show step-by-step clinical reasoning
  • Evidence-Based Responses: Trained to provide accurate, semantically grounded medical information
  • Clinical Decision Support: Assists both patients and healthcare professionals with medical queries
  • Multimodal Capabilities: Inherits OctoMed's vision-language understanding (image analysis requires base model)

Key Features

  • 🧠 Structured Reasoning: Explicit reasoning chains for medical transparency
  • 🎯 GRPO Training: Adaptive reward balancing for format (40%) and semantic accuracy (60%)
  • 💾 Parameter Efficient: LoRA adapters with rank 32 (~0.5% trainable parameters)
  • 4-bit Quantization: Optimized for deployment on consumer hardware
  • 🏥 Medical Specialization: Fine-tuned on 500 medical reasoning examples

Model Architecture

Component Specification
Base Model OctoMed/OctoMed-7B
Parameters 7B (base) + 32M (LoRA adapters)
Context Length 4096 tokens
Quantization 4-bit NF4
LoRA Rank 32
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Method GRPO (Group Relative Policy Optimization)

Training Details

Training Configuration

Training Steps: 200 (100 warmup steps)
Batch Size: 4 per device
Gradient Accumulation: 4 steps (effective batch = 16)
Learning Rate: 5e-5 with cosine scheduler
Optimizer: AdamW (8-bit)
Mixed Precision: BF16
Dataset: FreedomIntelligence/medical-o1-reasoning-SFT (500 examples)

Reward Functions

The model was trained using two complementary reward signals:

  1. Format Reward (40% final weight):

    • Encourages use of <think> reasoning tags
    • Rewards substantial reasoning (10+ words)
    • Scaled rewards for partial compliance
  2. Semantic Reward (60% final weight):

    • Cosine similarity to ground truth answers
    • Uses all-MiniLM-L6-v2 for embeddings
    • Focuses on answer accuracy, not reasoning style

Reward weights were adaptively adjusted during training from 90%/10% to 40%/60% to balance format adherence with semantic accuracy.

Usage

Using Transformers (Standard Method)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "OctoMed/OctoMed-7B",
    load_in_4bit=True,
    device_map="auto"
)

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "AhmedSSoliman/octomed-7b-digital-twin-v1")
tokenizer = AutoTokenizer.from_pretrained("AhmedSSoliman/octomed-7b-digital-twin-v1")

# Generate response
question = "What are the early signs of sepsis and how should it be managed?"
messages = [
    {"role": "system", "content": "You are a medical AI assistant. Think through your reasoning step-by-step using <think> tags before providing your final answer."},
    {"role": "user", "content": question}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Using Unsloth (Optimized & Recommended)

from unsloth import FastLanguageModel

# Load model with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="AhmedSSoliman/octomed-7b-digital-twin-v1",
    max_seq_length=4096,
    dtype=None,  # Auto-detect best dtype
    load_in_4bit=True,
    fast_inference=False  # Python 3.13+ compatibility
)

# Enable inference mode for 2x faster generation
FastLanguageModel.for_inference(model)

# Generate response
messages = [
    {"role": "user", "content": "Explain the pathophysiology of type 2 diabetes."}
]

inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(
    input_ids=inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Example Outputs

Question: What causes chest pain in a 45-year-old with hypertension?

Model Response:

<think>
A 45-year-old patient with hypertension presenting with chest pain requires immediate evaluation for cardiac causes. Key considerations:

1. Cardiac causes (most urgent):
   - Acute coronary syndrome (ACS)
   - Myocardial infarction
   - Unstable angina

2. Risk factors present:
   - Age (45, middle-aged male risk)
   - Known hypertension (damages vessels)

3. Differential diagnosis:
   - Costochondritis (musculoskeletal)
   - GERD (gastrointestinal)
   - Pulmonary embolism (respiratory)

Given hypertension history, cardiac workup is priority.
</think>

Chest pain in a 45-year-old with hypertension requires urgent evaluation for acute coronary syndrome. Immediate steps include:

1. **Emergency Assessment**: Call emergency services (911) if severe
2. **ECG**: Within 10 minutes of presentation
3. **Cardiac Biomarkers**: Troponin levels
4. **Risk Stratification**: HEART score or similar

Hypertension is a major risk factor for coronary artery disease. Do not delay seeking medical attention—chest pain in this context could indicate a heart attack requiring immediate intervention.

Hardware Requirements

Configuration VRAM Required Speed
4-bit (Recommended) ~14 GB Fast
8-bit ~28 GB Medium
FP16 ~56 GB Slow

Recommended Setup:

  • GPU: NVIDIA RTX 3090/4090, A100, or similar
  • RAM: 32GB+ system memory
  • Python: 3.9-3.13
  • CUDA: 11.8+

Limitations & Disclaimers

⚠️ Medical Disclaimer

THIS MODEL IS FOR RESEARCH AND EDUCATIONAL PURPOSES ONLY. IT IS NOT A SUBSTITUTE FOR PROFESSIONAL MEDICAL ADVICE, DIAGNOSIS, OR TREATMENT.

  • Not FDA Approved: This AI has not been evaluated or approved by any regulatory body
  • No Medical License: The model cannot practice medicine or replace licensed healthcare providers
  • Potential Errors: AI outputs may contain inaccuracies, hallucinations, or outdated information
  • No Emergency Use: Never use this model for medical emergencies—call emergency services immediately
  • Always Consult Professionals: Seek advice from qualified healthcare providers for medical decisions

Known Limitations

  1. Training Data Cutoff: Knowledge may not reflect the latest medical research
  2. Reasoning Artifacts: <think> tags may sometimes contain verbose or redundant reasoning
  3. Multimodal Gap: This LoRA adapter focuses on text; image analysis requires full base model
  4. Demographic Bias: Medical datasets may underrepresent certain populations
  5. Context Window: 4096 tokens limits handling of very long medical histories

Evaluation

The model was evaluated on clinical reasoning tasks with the following metrics:

  • Format Compliance: 85% of responses properly use reasoning tags
  • Semantic Similarity: Average 0.72 cosine similarity to ground truth
  • Reasoning Quality: Median 45 words per reasoning chain
  • Response Coherence: Qualitatively assessed as clear and structured

Note: Formal clinical validation has not been performed.

Citation

If you use this model in your research, please cite:

@misc{octomed-7b-digital-twin-v1,
  author = {Ahmed S. Soliman},
  title = {OctoMed-7B Digital Twin v1: GRPO-Enhanced Medical Reasoning},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/AhmedSSoliman/octomed-7b-digital-twin-v1}},
  note = {Fine-tuned with Group Relative Policy Optimization for transparent clinical reasoning}
}

Also cite the base OctoMed model:

@misc{octomed2025,
  title={OctoMed: Multimodal Medical AI},
  author={OctoMed Team},
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/OctoMed/OctoMed-7B}}
}

Acknowledgments

  • Base Model: OctoMed-7B by the OctoMed Team
  • Training Framework: Unsloth for efficient LoRA training
  • Dataset: FreedomIntelligence for medical reasoning data
  • RL Algorithm: TRL library's GRPO implementation

License

This model inherits the Apache 2.0 license from OctoMed-7B. Use responsibly and in compliance with medical AI regulations in your jurisdiction.

Model Card Contact

For questions or issues, please contact:

  • GitHub: AhmedSSoliman
  • HuggingFace: AhmedSSoliman

Developed: December 2025
Framework: Unsloth + TRL + Transformers
Training Method: GRPO (Group Relative Policy Optimization)

Downloads last month
39
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AhmedSSoliman/octomed-7b-digital-twin-v1

Finetuned
OctoMed/OctoMed-7B
Adapter
(1)
this model

Dataset used to train AhmedSSoliman/octomed-7b-digital-twin-v1