French translation of zeta-alpha-ai's NanoBEIR collection
CATIE
non-profit
Verified
AI & ML interests
Create NLP models and datasets applied to French, to very long sequences and the combination of the two ;)
Recent Activity
A few experiments after the release of sentence transformers v5.0. Could be seen as a V0 before the publication of more powerful french sparse models
-
CATIE-AQ/CSR_Sparse_Encoder_camembert-large_STS
Feature Extraction • 0.3B • Updated • 2 • 2 -
CATIE-AQ/SPLADE_camembert-base_STS
Feature Extraction • 0.1B • Updated • 36 • 2 -
CATIE-AQ/SPLADE_moderncamembert-cv2_STS
Feature Extraction • 0.1B • Updated • 3 • 2 -
CATIE-AQ/SPLADE_camemberta2.0_STS
Feature Extraction • 0.1B • Updated • 4 • 2
Adapted weights for Google Flan-T5 to use with FAT5
A collection of French prompts datasets created by CATIE.
A collection of French prompts models created by CATIE.
Datasets with an image, a prompt question (like "describe this image") and an answer
Can be used to train VLMs.
In 2021 before the release of LoRA, we were interested in Prefix-tuning, which we wanted to apply to French. So we had to translate table-to-text data
CamemBERT models finetuned on the QA task (SQuAD 2.0 format) + the dataset used (~220,000 rows) + a Space demo
Reviews and metadata datasets from https://xmrec.github.io/data/fr/ by Bonab et al. (2021)
Flash Attention T5 models in French developped by CATIE.
By conversation we mean multi-tour exchanges. For classical prompts (i.e. single-turn) see the CATIE French prompts datasets collection.
Clean VQA datasets with an image, a question and an answer.
Can be used to train VLMs.
Datasets with an image and a question.
Can be used to train visual retrievers (ColPali and co.).
-
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean
Viewer • Updated • 1.83k • 24 -
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 2 -
CATIE-AQ/retriever-vidore-vdsid_french-clean
Viewer • Updated • 5k • 11 -
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean
Viewer • Updated • 1.32k • 8
CamemBERT models finetuned on the NER task (3 or 4 entities) + the datasets used (420,000 or 385,000 rows respectively) + a Space demo
-
NERmembert
🔍1Find named entities in French texts using NERmemBERT models
-
CATIE-AQ/NERmembert-base-3entities
Token Classification • 0.1B • Updated • 29 • 2 -
CATIE-AQ/NERmembert2-3entities
Token Classification • 0.1B • Updated • 1 -
CATIE-AQ/NERmemberta-3entities
Token Classification • 0.1B • Updated • 474 • 1
French translation of zeta-alpha-ai's NanoBEIR collection
A few experiments after the release of sentence transformers v5.0. Could be seen as a V0 before the publication of more powerful french sparse models
-
CATIE-AQ/CSR_Sparse_Encoder_camembert-large_STS
Feature Extraction • 0.3B • Updated • 2 • 2 -
CATIE-AQ/SPLADE_camembert-base_STS
Feature Extraction • 0.1B • Updated • 36 • 2 -
CATIE-AQ/SPLADE_moderncamembert-cv2_STS
Feature Extraction • 0.1B • Updated • 3 • 2 -
CATIE-AQ/SPLADE_camemberta2.0_STS
Feature Extraction • 0.1B • Updated • 4 • 2
Flash Attention T5 models in French developped by CATIE.
Adapted weights for Google Flan-T5 to use with FAT5
By conversation we mean multi-tour exchanges. For classical prompts (i.e. single-turn) see the CATIE French prompts datasets collection.
A collection of French prompts datasets created by CATIE.
A collection of French prompts models created by CATIE.
Clean VQA datasets with an image, a question and an answer.
Can be used to train VLMs.
Datasets with an image, a prompt question (like "describe this image") and an answer
Can be used to train VLMs.
Datasets with an image and a question.
Can be used to train visual retrievers (ColPali and co.).
-
CATIE-AQ/retriever-manu-tabfquad_retrieving-clean
Viewer • Updated • 1.83k • 24 -
CATIE-AQ/retriever-vidore-tabfquad_test_subsampled-clean
Viewer • Updated • 280 • 2 -
CATIE-AQ/retriever-vidore-vdsid_french-clean
Viewer • Updated • 5k • 11 -
CATIE-AQ/retriever-princeton-nlp-CharXiv-clean
Viewer • Updated • 1.32k • 8
In 2021 before the release of LoRA, we were interested in Prefix-tuning, which we wanted to apply to French. So we had to translate table-to-text data
CamemBERT models finetuned on the NER task (3 or 4 entities) + the datasets used (420,000 or 385,000 rows respectively) + a Space demo
-
NERmembert
🔍1Find named entities in French texts using NERmemBERT models
-
CATIE-AQ/NERmembert-base-3entities
Token Classification • 0.1B • Updated • 29 • 2 -
CATIE-AQ/NERmembert2-3entities
Token Classification • 0.1B • Updated • 1 -
CATIE-AQ/NERmemberta-3entities
Token Classification • 0.1B • Updated • 474 • 1
CamemBERT models finetuned on the QA task (SQuAD 2.0 format) + the dataset used (~220,000 rows) + a Space demo
Reviews and metadata datasets from https://xmrec.github.io/data/fr/ by Bonab et al. (2021)