Spaces:
Restarting
Restarting
Update content.py
Browse files- content.py +10 -10
content.py
CHANGED
|
@@ -26,10 +26,10 @@ SUBMISSION_TEXT = """
|
|
| 26 |
## Submissions
|
| 27 |
Results can be submitted for the test set only. Scores are expressed as the percentage of correct answers for a given split.
|
| 28 |
|
| 29 |
-
Evaluation is done by comparing the newly generated question to the reference questions using
|
| 30 |
|
| 31 |
-
We expect submissions to be json
|
| 32 |
-
```
|
| 33 |
{
|
| 34 |
"CLINTON_1_1": {
|
| 35 |
"intervention_id": "CLINTON_1_1",
|
|
@@ -54,20 +54,20 @@ We expect submissions to be json-line files with the following format.
|
|
| 54 |
}
|
| 55 |
```
|
| 56 |
|
| 57 |
-
|
|
|
|
|
|
|
| 58 |
|
| 59 |
-
This leaderboard was created using as base the [GAIA-benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
|
| 60 |
"""
|
| 61 |
|
| 62 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
| 63 |
CITATION_BUTTON_TEXT = r"""@misc{figueras2025benchmarkingcriticalquestionsgeneration,
|
| 64 |
title={Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models},
|
| 65 |
-
author={
|
| 66 |
year={2025},
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
url={https://arxiv.org/abs/2505.11341},
|
| 71 |
}"""
|
| 72 |
|
| 73 |
|
|
|
|
| 26 |
## Submissions
|
| 27 |
Results can be submitted for the test set only. Scores are expressed as the percentage of correct answers for a given split.
|
| 28 |
|
| 29 |
+
Evaluation is done by comparing the newly generated question to the reference questions using Semantic Text Similarity, and inheriting the label of the most similar reference. Questions were no reference is found are considered invalid. See the evaluation function [here](https://huggingface.co/spaces/HiTZ/Critical_Questions_Leaderboard/blob/main/app.py#L141), or find more details in the [paper](https://arxiv.org/abs/2505.11341).
|
| 30 |
|
| 31 |
+
We expect submissions to be json files with the following format.
|
| 32 |
+
```json
|
| 33 |
{
|
| 34 |
"CLINTON_1_1": {
|
| 35 |
"intervention_id": "CLINTON_1_1",
|
|
|
|
| 54 |
}
|
| 55 |
```
|
| 56 |
|
| 57 |
+
After clicking 'Submit Eval' wait for a couple of minutes before trying to refresh.
|
| 58 |
+
|
| 59 |
+
If you find any issues, please email blancacalvofigueras@gmail.com
|
| 60 |
|
|
|
|
| 61 |
"""
|
| 62 |
|
| 63 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
| 64 |
CITATION_BUTTON_TEXT = r"""@misc{figueras2025benchmarkingcriticalquestionsgeneration,
|
| 65 |
title={Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models},
|
| 66 |
+
author={Calvo Figueras, Banca and Rodrigo Agerri},
|
| 67 |
year={2025},
|
| 68 |
+
booktitle={2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)},
|
| 69 |
+
organization={Association for Computational Linguistics (ACL)},
|
| 70 |
+
url={https://arxiv.org/abs/2505.11341},
|
|
|
|
| 71 |
}"""
|
| 72 |
|
| 73 |
|