Update model card for CodeGoat24/UnifiedReward-Think-qwen-7b (Pref-GRPO reward model)

by nielsr HF Staff - opened Aug 31, 2025

←

nielsr

Aug 31, 2025

This PR enhances the model card for CodeGoat24/UnifiedReward-Think-qwen-7b to reflect its role as the pairwise preference reward model for the Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning paper.

It includes the following improvements:

Updates the model summary to clarify the model's association with the Pref-GRPO paper.
Links directly to the Pref-GRPO paper on Hugging Face.
Updates the project page link to the Pref-GRPO specific page.
Adds a link to the Pref-GRPO GitHub repository.
Adds the pipeline_tag: image-text-to-text to improve discoverability on the Hub for multimodal evaluation tasks.
Adds the library_name: transformers tag to enable the automated "How to use" code snippet.
Corrects the "Quick Start" code snippet by adding the missing import requests.
Updates the citation block to reference the Pref-GRPO paper.

Please review and merge this PR if everything looks good.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment