Update content.py
Browse files- content.py +14 -7
content.py
CHANGED
|
@@ -1,12 +1,12 @@
|
|
| 1 |
TITLE = """<h1 align="center" id="space-title">Critical Questions Leaderboard</h1>"""
|
| 2 |
|
| 3 |
INTRODUCTION_TEXT = """
|
| 4 |
-
The Critical Questions Leaderboard is a benchmark which aims at evaluating
|
| 5 |
|
| 6 |
## Data
|
| 7 |
The Critical Questions Dataset is made of more than 220 interventions associated to potentially critical questions.
|
| 8 |
|
| 9 |
-
The data can be found in [this dataset](Blanca/
|
| 10 |
|
| 11 |
## Leaderboard
|
| 12 |
Submission made by our team are labelled "CQ authors". While we report average scores over different runs when possible in our paper, we only report the best run in the leaderboard.
|
|
@@ -18,10 +18,9 @@ SUBMISSION_TEXT = """
|
|
| 18 |
## Submissions
|
| 19 |
Results can be submitted for the test set only. Scores are expressed as the percentage of correct answers for a given split.
|
| 20 |
|
| 21 |
-
|
| 22 |
-
Hence, evaluation is done via quasi exact match between a model’s answer and the ground truth (up to some normalization that is tied to the “type” of the ground truth).
|
| 23 |
|
| 24 |
-
We expect submissions to be json-line files with the following format.
|
| 25 |
```json
|
| 26 |
{
|
| 27 |
"CLINTON_1_1": {
|
|
@@ -47,11 +46,19 @@ We expect submissions to be json-line files with the following format. The first
|
|
| 47 |
}
|
| 48 |
```
|
| 49 |
|
| 50 |
-
Our scoring function can be found [here](https://huggingface.co/spaces/
|
|
|
|
| 51 |
"""
|
| 52 |
|
| 53 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
| 54 |
-
CITATION_BUTTON_TEXT = r"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
}"""
|
| 56 |
|
| 57 |
|
|
|
|
| 1 |
TITLE = """<h1 align="center" id="space-title">Critical Questions Leaderboard</h1>"""
|
| 2 |
|
| 3 |
INTRODUCTION_TEXT = """
|
| 4 |
+
The Critical Questions Leaderboard is a benchmark which aims at evaluating the capacity of technology systems to generate critical questions. (See our [paper](https://arxiv.org/abs/2505.11341) for more details.)
|
| 5 |
|
| 6 |
## Data
|
| 7 |
The Critical Questions Dataset is made of more than 220 interventions associated to potentially critical questions.
|
| 8 |
|
| 9 |
+
The data can be found in [this dataset](https://huggingface.co/datasets/Blanca/CQs-Gen). The test set is contained in `test.jsonl`.
|
| 10 |
|
| 11 |
## Leaderboard
|
| 12 |
Submission made by our team are labelled "CQ authors". While we report average scores over different runs when possible in our paper, we only report the best run in the leaderboard.
|
|
|
|
| 18 |
## Submissions
|
| 19 |
Results can be submitted for the test set only. Scores are expressed as the percentage of correct answers for a given split.
|
| 20 |
|
| 21 |
+
Evaluation is done by comparing the newly generated question to the reference questions using gemma-2-9b-it, and inheriting the label of the most similar reference. Questions were no reference is found are considered invalid.
|
|
|
|
| 22 |
|
| 23 |
+
We expect submissions to be json-line files with the following format.
|
| 24 |
```json
|
| 25 |
{
|
| 26 |
"CLINTON_1_1": {
|
|
|
|
| 46 |
}
|
| 47 |
```
|
| 48 |
|
| 49 |
+
Our scoring function can be found [here](https://huggingface.co/spaces/HiTZ/Critical_Questions_Leaderboard/blob/main/scorer.py).
|
| 50 |
+
This leaderboard was created using as base the [GAIA-benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard)
|
| 51 |
"""
|
| 52 |
|
| 53 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
| 54 |
+
CITATION_BUTTON_TEXT = r"""@misc{figueras2025benchmarkingcriticalquestionsgeneration,
|
| 55 |
+
title={Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models},
|
| 56 |
+
author={Banca Calvo Figueras and Rodrigo Agerri},
|
| 57 |
+
year={2025},
|
| 58 |
+
eprint={2505.11341},
|
| 59 |
+
archivePrefix={arXiv},
|
| 60 |
+
primaryClass={cs.CL},
|
| 61 |
+
url={https://arxiv.org/abs/2505.11341},
|
| 62 |
}"""
|
| 63 |
|
| 64 |
|