CATIE

non-profit

Verified

https://www.catie.fr/

CATIE_AQ

catie-aq

Activity Feed Request to join this org

AI & ML interests

Create NLP models and datasets applied to French, to very long sequences and the combination of the two ;)

Recent Activity

bourdoiscatie updated a collection about 2 months ago

NanoBEIR-fr 🍺

bourdoiscatie updated a dataset about 2 months ago

CATIE-AQ/NanoBEIR-fr

bourdoiscatie published a dataset about 2 months ago

CATIE-AQ/NanoBEIR-fr

View all activity

CATIE-AQ 's collections 21

NanoBEIR-fr 🍺

French translation of zeta-alpha-ai's NanoBEIR collection

CATIE-AQ/NanoBEIR-fr

Viewer • Updated Dec 10, 2025 • 63.6k • 58
CATIE-AQ/NanoArguAna-fr

Viewer • Updated Jun 10, 2025 • 3.74k • 12
CATIE-AQ/NanoClimateFEVER-fr

Viewer • Updated Jun 10, 2025 • 3.61k • 7
CATIE-AQ/NanoDBPedia-fr

Viewer • Updated Jun 10, 2025 • 7.25k • 15

CATIE French sparse embedding

A few experiments after the release of sentence transformers v5.0. Could be seen as a V0 before the publication of more powerful french sparse models

CATIE-AQ/CSR_Sparse_Encoder_camembert-large_STS

Feature Extraction • 0.3B • Updated Jul 2, 2025 • 2 • 2
CATIE-AQ/SPLADE_camembert-base_STS

Feature Extraction • 0.1B • Updated Jul 2, 2025 • 36 • 2
CATIE-AQ/SPLADE_moderncamembert-cv2_STS

Feature Extraction • 0.1B • Updated Jul 2, 2025 • 3 • 2
CATIE-AQ/SPLADE_camemberta2.0_STS

Feature Extraction • 0.1B • Updated Jul 2, 2025 • 4 • 2

CATIE English FAT5-Flan

Adapted weights for Google Flan-T5 to use with FAT5

CATIE-AQ/FAT5-small-flan-en

Feature Extraction • 77M • Updated Oct 10, 2024 • 2
CATIE-AQ/FAT5-base-flan-en

Feature Extraction • 0.2B • Updated Oct 10, 2024 • 2
CATIE-AQ/FAT5-large-flan-en

Feature Extraction • 0.8B • Updated Oct 10, 2024 • 11
CATIE-AQ/FAT5-xl-flan-en

Feature Extraction • Updated Dec 1, 2025 • 3

CATIE French prompts datasets

A collection of French prompts datasets created by CATIE.

CATIE-AQ/smoltalk2_LongAlign_64k_context_french_no_think

Viewer • Updated Jul 29, 2025 • 95 • 2 • 1
CATIE-AQ/CFP

Viewer • Updated Nov 3, 2025 • 56.3k • 20
CATIE-AQ/DFP

Viewer • Updated Nov 3, 2025 • 108M • 292 • 8
CATIE-AQ/stsb_multi_mt_fr_prompt_sentence_similarity

Viewer • Updated Jul 15, 2025 • 155k • 21

CATIE French prompts models

A collection of French prompts models created by CATIE.

CATIE-AQ/mistral7B-FR-InstructNLP-LoRA

Text Generation • Updated Sep 22, 2025 • 6 • 3

French caption datasets

Datasets with an image, a prompt question (like "describe this image") and an answer Can be used to train VLMs.

CATIE-AQ/caption-floschne-xm3600-clean

Viewer • Updated Jul 15, 2025 • 8.56k • 2
CATIE-AQ/caption-vidore-tabfquad_test_subsampled-clean

Viewer • Updated Jul 15, 2025 • 280 • 5
CATIE-AQ/caption-vidore-vdsid_french-clean

Viewer • Updated Jul 15, 2025 • 5k • 15
CATIE-AQ/caption-manu-tabfquad_retrieving-clean

Viewer • Updated Jul 15, 2025 • 1.83k • 4

French table-to-text datasets

In 2021 before the release of LoRA, we were interested in Prefix-tuning, which we wanted to apply to French. So we had to translate table-to-text data

CATIE-AQ/web_nlg_french

Viewer • Updated Jul 11, 2025 • 35.4k • 19
CATIE-AQ/e2e_nlg_french

Viewer • Updated Jul 11, 2025 • 33.5k • 3
CATIE-AQ/viggo_french

Viewer • Updated Jul 11, 2025 • 5.1k • 24

CATIE French QA pack

CamemBERT models finetuned on the QA task (SQuAD 2.0 format) + the dataset used (~220,000 rows) + a Space demo

Running

QAmembert

❓

Find answers in French texts using QAmemBERT models
CATIE-AQ/QAmemberta

Question Answering • 0.1B • Updated Nov 26, 2024 • 1 • 1
CATIE-AQ/QAmembert2

Question Answering • 0.1B • Updated Nov 26, 2024 • 4
CATIE-AQ/QAmembert

Question Answering • 0.1B • Updated Nov 26, 2024 • 11 • 14

CATIE French NLI pack

CATIE-AQ/frenchNLI

Viewer • Updated Jul 29, 2025 • 570k • 20 • 1

CATIE French Paraphrase pack

CATIE-AQ/frenchPARAPHRASE

Viewer • Updated Dec 1, 2025 • 255k • 27
CATIE-AQ/french_paraphrase_flan-t5-large

Text Generation • 0.8B • Updated Dec 1, 2025 • 6 • 1
CATIE-AQ/french_paraphrase_flan-t5-base

Text Generation • 0.2B • Updated Dec 1, 2025

XMRec French part (reviews and metadata datasets)

Reviews and metadata datasets from https://xmrec.github.io/data/fr/ by Bonab et al. (2021)

CATIE-AQ/XMRec_reviews_fr

Viewer • Updated Jun 23, 2025 • 48.7k • 1
CATIE-AQ/XMRec_reviews_fr_Arts_Crafts_and_Sewing

Viewer • Updated Jun 23, 2025 • 965 • 4
CATIE-AQ/XMRec_reviews_fr_Automotive

Viewer • Updated Jun 23, 2025 • 458 • 15
CATIE-AQ/XMRec_reviews_fr_Books

Viewer • Updated Jun 23, 2025 • 25.8k • 32

CATIE French dense embedding

CATIE-AQ/distilcamembert-base-embedding

Sentence Similarity • 68.1M • Updated Nov 3, 2025 • 1
CATIE-AQ/camembert-base-embedding

Sentence Similarity • 0.1B • Updated Nov 3, 2025

CATIE French FAT5 UL2

Flash Attention T5 models in French developped by CATIE.

CATIE-AQ/FAT5-small

0.1B • Updated Mar 17, 2025 • 10 • 2
Running

Le FAT5 : Flash Attention T5

⚡

French version of the blog post introducing FAT5 model
Running

9

FAT5 (Flash Attention T5) report

⚡

9

English version of the blog post introducing FAT5 model

CATIE French DPO and conversation datasets

By conversation we mean multi-tour exchanges. For classical prompts (i.e. single-turn) see the CATIE French prompts datasets collection.

CATIE-AQ/facebook-community-alignment-dataset_french_dpo

Viewer • Updated Jul 29, 2025 • 71.5k • 4 • 1
CATIE-AQ/aya_french_dpo

Viewer • Updated Jul 29, 2025 • 418 • 3 • 2
CATIE-AQ/facebook_menlo_french_dpo

Viewer • Updated Oct 14, 2025 • 138 • 18 • 1
CATIE-AQ/everyday-conversations-llama3.1-2k-in-french

Viewer • Updated Jul 31, 2025 • 2.38k • 15 • 1

CATIE French think and toolcalling datasets

CATIE-AQ/smoltalk2_smolagents_toolcalling_french

Viewer • Updated Jul 29, 2025 • 9.08k • 17 • 1
CATIE-AQ/smoltalk2_aya_think_dataset_french_split

Viewer • Updated Jul 29, 2025 • 2.79k • 9 • 2

French VQA datasets

Clean VQA datasets with an image, a question and an answer. Can be used to train VLMs.

CATIE-AQ/VQA-floschne-maxm-clean

Viewer • Updated Jul 15, 2025 • 619 • 9
CATIE-AQ/VQA-cmarkea-doc-vqa-clean

Viewer • Updated Jul 15, 2025 • 60.9k • 145
CATIE-AQ/VQA-cmarkea-table-vqa-clean

Viewer • Updated Jul 15, 2025 • 84.1k • 31
CATIE-AQ/VQA-ByteDance-MTVQA-clean

Viewer • Updated Jul 15, 2025 • 3.63k • 7 • 1

French visual retriever datasets

Datasets with an image and a question. Can be used to train visual retrievers (ColPali and co.).

CATIE-AQ/retriever-manu-tabfquad_retrieving-clean

Viewer • Updated Jul 15, 2025 • 1.83k • 24
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean

Viewer • Updated Jul 15, 2025 • 280 • 2
CATIE-AQ/retriever-vidore-vdsid_french-clean

Viewer • Updated Jul 15, 2025 • 5k • 11
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean

Viewer • Updated Jul 15, 2025 • 1.32k • 8

CATIE French NER pack

CamemBERT models finetuned on the NER task (3 or 4 entities) + the datasets used (420,000 or 385,000 rows respectively) + a Space demo

Running

1

NERmembert

🔍

1

Find named entities in French texts using NERmemBERT models
CATIE-AQ/NERmembert-base-3entities

Token Classification • 0.1B • Updated Nov 26, 2024 • 29 • 2
CATIE-AQ/NERmembert2-3entities

Token Classification • 0.1B • Updated Dec 3, 2024 • 1
CATIE-AQ/NERmemberta-3entities

Token Classification • 0.1B • Updated Dec 5, 2024 • 474 • 1

CATIE French STS pack

CATIE-AQ/frenchSTS

Viewer • Updated Jul 15, 2025 • 45.7k • 15 • 1

CATIE French Summarization pack

CATIE-AQ/LMF2-1.2B_french_summary

Summarization • 1B • Updated Oct 28, 2025 • 41 • 1
CATIE-AQ/LMF2-700M_french_summary

Summarization • 0.7B • Updated Oct 29, 2025 • 57
CATIE-AQ/LMF2_350M_french_summary

Summarization • 0.4B • Updated Oct 28, 2025 • 5
mradermacher/LMF2-1.2B_french_summary-GGUF

1B • Updated Oct 29, 2025 • 157

CATIE French long sequences datasets

CATIE-AQ/french_books_summaries

Viewer • Updated Jul 15, 2025 • 949 • 14 • 1
CATIE-AQ/french_books

Viewer • Updated Jul 15, 2025 • 2.08k • 13 • 2
CATIE-AQ/french_narrativeqa

Viewer • Updated Jul 15, 2025 • 4.21k • 18 • 1

NanoBEIR-fr 🍺

French translation of zeta-alpha-ai's NanoBEIR collection

CATIE-AQ/NanoBEIR-fr

Viewer • Updated Dec 10, 2025 • 63.6k • 58
CATIE-AQ/NanoArguAna-fr

Viewer • Updated Jun 10, 2025 • 3.74k • 12
CATIE-AQ/NanoClimateFEVER-fr

Viewer • Updated Jun 10, 2025 • 3.61k • 7
CATIE-AQ/NanoDBPedia-fr

Viewer • Updated Jun 10, 2025 • 7.25k • 15

CATIE French dense embedding

CATIE-AQ/distilcamembert-base-embedding

Sentence Similarity • 68.1M • Updated Nov 3, 2025 • 1
CATIE-AQ/camembert-base-embedding

Sentence Similarity • 0.1B • Updated Nov 3, 2025

CATIE French sparse embedding

A few experiments after the release of sentence transformers v5.0. Could be seen as a V0 before the publication of more powerful french sparse models

CATIE-AQ/CSR_Sparse_Encoder_camembert-large_STS

Feature Extraction • 0.3B • Updated Jul 2, 2025 • 2 • 2
CATIE-AQ/SPLADE_camembert-base_STS

Feature Extraction • 0.1B • Updated Jul 2, 2025 • 36 • 2
CATIE-AQ/SPLADE_moderncamembert-cv2_STS

Feature Extraction • 0.1B • Updated Jul 2, 2025 • 3 • 2
CATIE-AQ/SPLADE_camemberta2.0_STS

Feature Extraction • 0.1B • Updated Jul 2, 2025 • 4 • 2

CATIE French FAT5 UL2

Flash Attention T5 models in French developped by CATIE.

CATIE-AQ/FAT5-small

0.1B • Updated Mar 17, 2025 • 10 • 2
Running

Le FAT5 : Flash Attention T5

⚡

French version of the blog post introducing FAT5 model
Running

9

FAT5 (Flash Attention T5) report

⚡

9

English version of the blog post introducing FAT5 model

CATIE English FAT5-Flan

Adapted weights for Google Flan-T5 to use with FAT5

CATIE-AQ/FAT5-small-flan-en

Feature Extraction • 77M • Updated Oct 10, 2024 • 2
CATIE-AQ/FAT5-base-flan-en

Feature Extraction • 0.2B • Updated Oct 10, 2024 • 2
CATIE-AQ/FAT5-large-flan-en

Feature Extraction • 0.8B • Updated Oct 10, 2024 • 11
CATIE-AQ/FAT5-xl-flan-en

Feature Extraction • Updated Dec 1, 2025 • 3

CATIE French DPO and conversation datasets

By conversation we mean multi-tour exchanges. For classical prompts (i.e. single-turn) see the CATIE French prompts datasets collection.

CATIE-AQ/facebook-community-alignment-dataset_french_dpo

Viewer • Updated Jul 29, 2025 • 71.5k • 4 • 1
CATIE-AQ/aya_french_dpo

Viewer • Updated Jul 29, 2025 • 418 • 3 • 2
CATIE-AQ/facebook_menlo_french_dpo

Viewer • Updated Oct 14, 2025 • 138 • 18 • 1
CATIE-AQ/everyday-conversations-llama3.1-2k-in-french

Viewer • Updated Jul 31, 2025 • 2.38k • 15 • 1

CATIE French prompts datasets

A collection of French prompts datasets created by CATIE.

CATIE-AQ/smoltalk2_LongAlign_64k_context_french_no_think

Viewer • Updated Jul 29, 2025 • 95 • 2 • 1
CATIE-AQ/CFP

Viewer • Updated Nov 3, 2025 • 56.3k • 20
CATIE-AQ/DFP

Viewer • Updated Nov 3, 2025 • 108M • 292 • 8
CATIE-AQ/stsb_multi_mt_fr_prompt_sentence_similarity

Viewer • Updated Jul 15, 2025 • 155k • 21

CATIE French think and toolcalling datasets

CATIE-AQ/smoltalk2_smolagents_toolcalling_french

Viewer • Updated Jul 29, 2025 • 9.08k • 17 • 1
CATIE-AQ/smoltalk2_aya_think_dataset_french_split

Viewer • Updated Jul 29, 2025 • 2.79k • 9 • 2

CATIE French prompts models

A collection of French prompts models created by CATIE.

CATIE-AQ/mistral7B-FR-InstructNLP-LoRA

Text Generation • Updated Sep 22, 2025 • 6 • 3

French VQA datasets

Clean VQA datasets with an image, a question and an answer. Can be used to train VLMs.

CATIE-AQ/VQA-floschne-maxm-clean

Viewer • Updated Jul 15, 2025 • 619 • 9
CATIE-AQ/VQA-cmarkea-doc-vqa-clean

Viewer • Updated Jul 15, 2025 • 60.9k • 145
CATIE-AQ/VQA-cmarkea-table-vqa-clean

Viewer • Updated Jul 15, 2025 • 84.1k • 31
CATIE-AQ/VQA-ByteDance-MTVQA-clean

Viewer • Updated Jul 15, 2025 • 3.63k • 7 • 1

French caption datasets

Datasets with an image, a prompt question (like "describe this image") and an answer Can be used to train VLMs.

CATIE-AQ/caption-floschne-xm3600-clean

Viewer • Updated Jul 15, 2025 • 8.56k • 2
CATIE-AQ/caption-vidore-tabfquad_test_subsampled-clean

Viewer • Updated Jul 15, 2025 • 280 • 5
CATIE-AQ/caption-vidore-vdsid_french-clean

Viewer • Updated Jul 15, 2025 • 5k • 15
CATIE-AQ/caption-manu-tabfquad_retrieving-clean

Viewer • Updated Jul 15, 2025 • 1.83k • 4

French visual retriever datasets

Datasets with an image and a question. Can be used to train visual retrievers (ColPali and co.).

CATIE-AQ/retriever-manu-tabfquad_retrieving-clean

Viewer • Updated Jul 15, 2025 • 1.83k • 24
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean

Viewer • Updated Jul 15, 2025 • 280 • 2
CATIE-AQ/retriever-vidore-vdsid_french-clean

Viewer • Updated Jul 15, 2025 • 5k • 11
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean

Viewer • Updated Jul 15, 2025 • 1.32k • 8

French table-to-text datasets

In 2021 before the release of LoRA, we were interested in Prefix-tuning, which we wanted to apply to French. So we had to translate table-to-text data

CATIE-AQ/web_nlg_french

Viewer • Updated Jul 11, 2025 • 35.4k • 19
CATIE-AQ/e2e_nlg_french

Viewer • Updated Jul 11, 2025 • 33.5k • 3
CATIE-AQ/viggo_french

Viewer • Updated Jul 11, 2025 • 5.1k • 24

CATIE French NER pack

CamemBERT models finetuned on the NER task (3 or 4 entities) + the datasets used (420,000 or 385,000 rows respectively) + a Space demo

Running

1

NERmembert

🔍

1

Find named entities in French texts using NERmemBERT models
CATIE-AQ/NERmembert-base-3entities

Token Classification • 0.1B • Updated Nov 26, 2024 • 29 • 2
CATIE-AQ/NERmembert2-3entities

Token Classification • 0.1B • Updated Dec 3, 2024 • 1
CATIE-AQ/NERmemberta-3entities

Token Classification • 0.1B • Updated Dec 5, 2024 • 474 • 1

CATIE French QA pack

CamemBERT models finetuned on the QA task (SQuAD 2.0 format) + the dataset used (~220,000 rows) + a Space demo

Running

QAmembert

❓

Find answers in French texts using QAmemBERT models
CATIE-AQ/QAmemberta

Question Answering • 0.1B • Updated Nov 26, 2024 • 1 • 1
CATIE-AQ/QAmembert2

Question Answering • 0.1B • Updated Nov 26, 2024 • 4
CATIE-AQ/QAmembert

Question Answering • 0.1B • Updated Nov 26, 2024 • 11 • 14

CATIE French STS pack

CATIE-AQ/frenchSTS

Viewer • Updated Jul 15, 2025 • 45.7k • 15 • 1

CATIE French NLI pack

CATIE-AQ/frenchNLI

Viewer • Updated Jul 29, 2025 • 570k • 20 • 1

CATIE French Summarization pack

CATIE-AQ/LMF2-1.2B_french_summary

Summarization • 1B • Updated Oct 28, 2025 • 41 • 1
CATIE-AQ/LMF2-700M_french_summary

Summarization • 0.7B • Updated Oct 29, 2025 • 57
CATIE-AQ/LMF2_350M_french_summary

Summarization • 0.4B • Updated Oct 28, 2025 • 5
mradermacher/LMF2-1.2B_french_summary-GGUF

1B • Updated Oct 29, 2025 • 157

CATIE French Paraphrase pack

CATIE-AQ/frenchPARAPHRASE

Viewer • Updated Dec 1, 2025 • 255k • 27
CATIE-AQ/french_paraphrase_flan-t5-large

Text Generation • 0.8B • Updated Dec 1, 2025 • 6 • 1
CATIE-AQ/french_paraphrase_flan-t5-base

Text Generation • 0.2B • Updated Dec 1, 2025

CATIE French long sequences datasets

CATIE-AQ/french_books_summaries

Viewer • Updated Jul 15, 2025 • 949 • 14 • 1
CATIE-AQ/french_books

Viewer • Updated Jul 15, 2025 • 2.08k • 13 • 2
CATIE-AQ/french_narrativeqa

Viewer • Updated Jul 15, 2025 • 4.21k • 18 • 1

XMRec French part (reviews and metadata datasets)

Reviews and metadata datasets from https://xmrec.github.io/data/fr/ by Bonab et al. (2021)

CATIE-AQ/XMRec_reviews_fr

Viewer • Updated Jun 23, 2025 • 48.7k • 1
CATIE-AQ/XMRec_reviews_fr_Arts_Crafts_and_Sewing

Viewer • Updated Jun 23, 2025 • 965 • 4
CATIE-AQ/XMRec_reviews_fr_Automotive

Viewer • Updated Jun 23, 2025 • 458 • 15
CATIE-AQ/XMRec_reviews_fr_Books

Viewer • Updated Jun 23, 2025 • 25.8k • 32

AI & ML interests

Recent Activity

Team members 9

CATIE-AQ 's collections 21

QAmembert

Le FAT5 : Flash Attention T5

FAT5 (Flash Attention T5) report

NERmembert

Le FAT5 : Flash Attention T5

FAT5 (Flash Attention T5) report

NERmembert

QAmembert