Update model card for CodeGoat24/UnifiedReward-Think-qwen-7b (Pref-GRPO reward model)

#1
by nielsr HF Staff - opened

This PR enhances the model card for CodeGoat24/UnifiedReward-Think-qwen-7b to reflect its role as the pairwise preference reward model for the Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning paper.

It includes the following improvements:

  • Updates the model summary to clarify the model's association with the Pref-GRPO paper.
  • Links directly to the Pref-GRPO paper on Hugging Face.
  • Updates the project page link to the Pref-GRPO specific page.
  • Adds a link to the Pref-GRPO GitHub repository.
  • Adds the pipeline_tag: image-text-to-text to improve discoverability on the Hub for multimodal evaluation tasks.
  • Adds the library_name: transformers tag to enable the automated "How to use" code snippet.
  • Corrects the "Quick Start" code snippet by adding the missing import requests.
  • Updates the citation block to reference the Pref-GRPO paper.

Please review and merge this PR if everything looks good.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment