metadata
language:
- en
tags:
- sentence-transformers
- cross-encoder
- generated_from_trainer
- dataset_size:100000
- loss:CrossEntropyLoss
base_model: distilbert/distilroberta-base
datasets:
- sentence-transformers/all-nli
pipeline_tag: text-classification
library_name: sentence-transformers
metrics:
- f1_macro
- f1_micro
- f1_weighted
model-index:
- name: CrossEncoder based on distilbert/distilroberta-base
results:
- task:
type: cross-encoder-classification
name: Cross Encoder Classification
dataset:
name: AllNLI dev
type: AllNLI-dev
metrics:
- type: f1_macro
value: 0.8471837177220953
name: F1 Macro
- type: f1_micro
value: 0.848
name: F1 Micro
- type: f1_weighted
value: 0.8471638579236317
name: F1 Weighted
- task:
type: cross-encoder-classification
name: Cross Encoder Classification
dataset:
name: AllNLI test
type: AllNLI-test
metrics:
- type: f1_macro
value: 0.7672948900569446
name: F1 Macro
- type: f1_micro
value: 0.7678571428571429
name: F1 Micro
- type: f1_weighted
value: 0.7681818441932339
name: F1 Weighted
CrossEncoder based on distilbert/distilroberta-base
This is a Cross Encoder model finetuned from distilbert/distilroberta-base on the all-nli dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text pair classification.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: distilbert/distilroberta-base
- Maximum Sequence Length: 514 tokens
- Number of Output Labels: 3 labels
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Cross Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Cross Encoders on Hugging Face
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("hajimeni/reranker-distilroberta-base-nli")
# Get scores for pairs of texts
pairs = [
['Two women are embracing while holding to go packages.', 'The sisters are hugging goodbye while holding to go packages after just eating lunch.'],
['Two women are embracing while holding to go packages.', 'Two woman are holding packages.'],
['Two women are embracing while holding to go packages.', 'The men are fighting outside a deli.'],
['Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.', 'Two kids in numbered jerseys wash their hands.'],
['Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.', 'Two kids at a ballgame wash their hands.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5, 3)
Evaluation
Metrics
Cross Encoder Classification
- Datasets:
AllNLI-devandAllNLI-test - Evaluated with
CrossEncoderClassificationEvaluator
| Metric | AllNLI-dev | AllNLI-test |
|---|---|---|
| f1_macro | 0.8472 | 0.7673 |
| f1_micro | 0.848 | 0.7679 |
| f1_weighted | 0.8472 | 0.7682 |
Training Details
Training Dataset
all-nli
- Dataset: all-nli at d482672
- Size: 100,000 training samples
- Columns:
premise,hypothesis, andlabel - Approximate statistics based on the first 1000 samples:
premise hypothesis label type string string int details - min: 23 characters
- mean: 69.54 characters
- max: 227 characters
- min: 11 characters
- mean: 38.26 characters
- max: 131 characters
- 0: ~33.40%
- 1: ~33.30%
- 2: ~33.30%
- Samples:
premise hypothesis label A person on a horse jumps over a broken down airplane.A person is training his horse for a competition.1A person on a horse jumps over a broken down airplane.A person is at a diner, ordering an omelette.2A person on a horse jumps over a broken down airplane.A person is outdoors, on a horse.0 - Loss:
CrossEntropyLoss
Evaluation Dataset
all-nli
- Dataset: all-nli at d482672
- Size: 1,000 evaluation samples
- Columns:
premise,hypothesis, andlabel - Approximate statistics based on the first 1000 samples:
premise hypothesis label type string string int details - min: 16 characters
- mean: 75.01 characters
- max: 229 characters
- min: 11 characters
- mean: 37.66 characters
- max: 116 characters
- 0: ~33.10%
- 1: ~33.30%
- 2: ~33.60%
- Samples:
premise hypothesis label Two women are embracing while holding to go packages.The sisters are hugging goodbye while holding to go packages after just eating lunch.1Two women are embracing while holding to go packages.Two woman are holding packages.0Two women are embracing while holding to go packages.The men are fighting outside a deli.2 - Loss:
CrossEntropyLoss
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: stepsper_device_train_batch_size: 64per_device_eval_batch_size: 64num_train_epochs: 1warmup_ratio: 0.1bf16: True
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 64per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | Validation Loss | AllNLI-dev_f1_macro | AllNLI-test_f1_macro |
|---|---|---|---|---|---|
| -1 | -1 | - | - | 0.1665 | - |
| 0.0640 | 100 | 1.0595 | - | - | - |
| 0.1280 | 200 | 0.7 | - | - | - |
| 0.1919 | 300 | 0.6039 | - | - | - |
| 0.2559 | 400 | 0.5821 | - | - | - |
| 0.3199 | 500 | 0.5521 | 0.4509 | 0.8186 | - |
| 0.3839 | 600 | 0.5148 | - | - | - |
| 0.4479 | 700 | 0.5334 | - | - | - |
| 0.5118 | 800 | 0.5125 | - | - | - |
| 0.5758 | 900 | 0.4893 | - | - | - |
| 0.6398 | 1000 | 0.503 | 0.3864 | 0.8554 | - |
| 0.7038 | 1100 | 0.4706 | - | - | - |
| 0.7678 | 1200 | 0.4635 | - | - | - |
| 0.8317 | 1300 | 0.44 | - | - | - |
| 0.8957 | 1400 | 0.459 | - | - | - |
| 0.9597 | 1500 | 0.4481 | 0.3537 | 0.8472 | - |
| -1 | -1 | - | - | - | 0.7673 |
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 4.0.1
- Transformers: 4.50.2
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.5.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}