MEGA-GRPO

Fine-tuned molecular optimization model using Tanimoto-aware GRPO (Group Relative Policy Optimization) on 500K molecular transformations. Based on Llama 3 8B.

Paper: MEGA: A Large-Scale Molecular Editing Dataset for Guided-Action Optimization Official Repository: https://github.com/nfsrules/MEGA-moledit

Installation

pip install unsloth torch

Usage

from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template

# Configuration
max_seq_length = 1024
lora_rank = 32

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "nfsrulesFR/mega-grpo",
    max_seq_length = max_seq_length,
    load_in_4bit = True,
    fast_inference = True,
    max_lora_rank = lora_rank,
    gpu_memory_utilization = 0.6,
)

# Configure tokenizer
tokenizer.padding_side = 'left'
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

tokenizer = get_chat_template(
    tokenizer,
    chat_template="llama-3",
    mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"},
)

# Generate
input_smiles = "CCO"
task = "Can you make molecule CCO more soluble in water? The output molecule should be similar to the input molecule."

messages = [{"from": "human", "value": task}]

encoded = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    padding=True,
)

outputs = model.generate(
    input_ids=encoded["input_ids"].cuda(),
    attention_mask=encoded["attention_mask"].cuda(),
    max_new_tokens=64,
    use_cache=True,
    pad_token_id=tokenizer.pad_token_id,
)

response = tokenizer.decode(outputs[0][encoded["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Supported Tasks

Task ID Description
101 Increase water solubility
102 Decrease water solubility
103 Increase drug-likeness
104 Decrease drug-likeness
105 Increase permeability
106 Decrease permeability
107 Increase hydrogen bond acceptors
108 Increase hydrogen bond donors
201 Increase solubility + HBA
202 Decrease solubility + increase HBA
203 Increase solubility + HBD
204 Decrease solubility + increase HBD
205 Increase solubility + permeability
206 Increase solubility + decrease permeability

Model Details

  • Base Model: Meta-Llama-3-8B
  • Training: Tanimoto-aware GRPO on 500K molecular transformations
  • Input: SMILES string + task description
  • Output: Modified SMILES string

Citation

@article{mega2025,
  title={MEGA: A Large-Scale Molecular Editing Dataset for Guided-Action Optimization},
  author={Fernandez, Nelson and Illouz, Maxime and Pinto, Luis and Yang, Entao and Amadou Boubacar, Habiboulaye},
  journal={Under review at International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=wzou4rm3Tt}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nfsrulesFR/mega-grpo

Finetuned
(472)
this model