pythia-helpful-1epoch
Collection
Pythia-2.8b supervised finetuned and DPO finetuned with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.
•
12 items
•
Updated
Pythia-2.8b supervised finetuned using TRLx library with the helpful subset of Anthropic-hh-rlhf dataset for 1 epoch.
Checkpoints are also uploaded.
Fully reproducible finetuning code is available on GitHub
See Pythia-2.8b for model details (paper).
See further details of these models in the paper Attributing Mode Collapse in the Fine-Tuning of Large Language Models.
You can cite these models if they are helpful as follows:
@inproceedings{o2024attributing,
title={Attributing Mode Collapse in the Fine-Tuning of Large Language Models},
author={O’Mahony, Laura and Grinsztajn, Leo and Schoelkopf, Hailey and Biderman, Stella},
booktitle={ICLR 2024, Mathematical and Empirical Understanding of Foundation Models (ME-FoMo) workshop},
year={2024}
}
hf (pretrained=lomahony/pythia-2.8b-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: 16
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 0 | acc | 0.2901 | ± | 0.0133 |
| none | 0 | acc_norm | 0.3404 | ± | 0.0138 | ||
| arc_easy | 1 | none | 0 | acc | 0.6469 | ± | 0.0098 |
| none | 0 | acc_norm | 0.5766 | ± | 0.0101 | ||
| boolq | 2 | none | 0 | acc | 0.6361 | ± | 0.0084 |
| hellaswag | 1 | none | 0 | acc | 0.4557 | ± | 0.0050 |
| none | 0 | acc_norm | 0.5984 | ± | 0.0049 | ||
| lambada_openai | 1 | none | 0 | perplexity | 5.2226 | ± | 0.1377 |
| none | 0 | acc | 0.6210 | ± | 0.0068 | ||
| openbookqa | 1 | none | 0 | acc | 0.2640 | ± | 0.0197 |
| none | 0 | acc_norm | 0.3760 | ± | 0.0217 | ||
| piqa | 1 | none | 0 | acc | 0.7481 | ± | 0.0101 |
| none | 0 | acc_norm | 0.7481 | ± | 0.0101 | ||
| sciq | 1 | none | 0 | acc | 0.8800 | ± | 0.0103 |
| none | 0 | acc_norm | 0.8180 | ± | 0.0122 | ||
| wikitext | 2 | none | 0 | word_perplexity | 13.4928 | ± | N/A |
| none | 0 | byte_perplexity | 1.6268 | ± | N/A | ||
| none | 0 | bits_per_byte | 0.7020 | ± | N/A | ||
| winogrande | 1 | none | 0 | acc | 0.6125 | ± | 0.0137 |
hf (pretrained=lomahony/pythia-2.8b-helpful-sft), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| arc_challenge | 1 | none | 5 | acc | 0.3285 | ± | 0.0137 |
| none | 5 | acc_norm | 0.3677 | ± | 0.0141 | ||
| arc_easy | 1 | none | 5 | acc | 0.6873 | ± | 0.0095 |
| none | 5 | acc_norm | 0.6835 | ± | 0.0095 | ||
| boolq | 2 | none | 5 | acc | 0.6670 | ± | 0.0082 |
| hellaswag | 1 | none | 5 | acc | 0.4542 | ± | 0.0050 |
| none | 5 | acc_norm | 0.5963 | ± | 0.0049 | ||
| lambada_openai | 1 | none | 5 | perplexity | 7.4076 | ± | 0.2095 |
| none | 5 | acc | 0.5486 | ± | 0.0069 | ||
| openbookqa | 1 | none | 5 | acc | 0.2680 | ± | 0.0198 |
| none | 5 | acc_norm | 0.3620 | ± | 0.0215 | ||
| piqa | 1 | none | 5 | acc | 0.7568 | ± | 0.0100 |
| none | 5 | acc_norm | 0.7486 | ± | 0.0101 | ||
| sciq | 1 | none | 5 | acc | 0.9380 | ± | 0.0076 |
| none | 5 | acc_norm | 0.9330 | ± | 0.0079 | ||
| wikitext | 2 | none | 5 | word_perplexity | 13.4928 | ± | N/A |
| none | 5 | byte_perplexity | 1.6268 | ± | N/A | ||
| none | 5 | bits_per_byte | 0.7020 | ± | N/A | ||
| winogrande | 1 | none | 5 | acc | 0.5935 | ± | 0.0138 |