Reward model - a transZ Collection

transZ 's Collections

Reward model

updated Feb 7

Reward modelling

RLHFlow/SHP-standard

Viewer • Updated May 9, 2024 • 93.3k • 35

Note Training
transZ/shp

Viewer • Updated Jan 23 • 10.3k • 12

Note Test and validation
RLHFlow/HH-RLHF-Helpful-standard

Viewer • Updated Apr 27, 2024 • 115k • 171 • 4

Note Training
transZ/anthropic_helpful_test

Viewer • Updated Jan 23 • 2.33k • 6

Note Test
RLHFlow/HH-RLHF-Harmless-and-RedTeam-standard

Viewer • Updated May 8, 2024 • 42.3k • 24 • 4

Note Training
transZ/anthropic_harmless_test

Viewer • Updated Jan 23 • 2.3k • 5

Note Test
transZ/helpsteer3

Viewer • Updated Feb 7 • 18.6k • 5

Note Training and testing
RLHFlow/PKU-SafeRLHF-30K-standard

Viewer • Updated Apr 29, 2024 • 26.9k • 12 • 3

Note Training
transZ/pku_safe_rlhf

Viewer • Updated Feb 7 • 1.22k • 4

Note Test
HuggingFaceH4/cai-conversation-harmless

Viewer • Updated Feb 2, 2024 • 44.8k • 183 • 16

Note Training and testing