MaxLSB
/

Splitformer

Automatic Speech Recognition

Model card Files Files and versions

Splitformer / README.md

MaxLSB's picture

Update README.md

e341838 verified 9 months ago

|

history blame contribute delete

3.22 kB

	---
	license: mit
	datasets:
	- openslr/librispeech_asr
	language:
	- en
	pipeline_tag: automatic-speech-recognition
	---

	# Splitformer

	<div align="center" style="line-height: 1;">
	<a href="https://github.com/augustgw/early-exit-transformer" target="_blank" style="margin: 2px;">
	<img alt="GitHub" src="https://img.shields.io/badge/GitHub-Splitformer-181717?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>
	<a href="https://www.arxiv.org/abs/2506.18035" target="_blank" style="margin: 2px;">
	<img alt="arXiv" src="https://img.shields.io/badge/arXiv-2506.18035-B31B1B?logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
	</a>
	</div>


	## 1. Overview

	Splitformer is a 36.7M parameters Conformer-based ASR model trained from scratch on 1000 hours of the LibriSpeech dataset with an early‐exit objective.

	This architecture introduces parallel downsampling layers before the first and last exits to improve performance with minimal extra overhead, while retaining inference speed.

	Our code for training and inference is available on our [GitHub](https://github.com/augustgw/early-exit-transformer) repository.

	### 2. Results on LibriSpeech

	<table>
	<thead>
	<tr>
	<th rowspan="2">Layer</th>
	<th colspan="2">EE-baseline (31.5M)</th>
	<th colspan="2">Splitformer (36.7M)</th>
	<th colspan="2">Wav2Vec2 (94.0M)</th>
	<th colspan="2">WavLM (94.7M)</th>
	</tr>
	<tr>
	<th>test-clean</th>
	<th>test-other</th>
	<th>test-clean</th>
	<th>test-other</th>
	<th>test-clean</th>
	<th>test-other</th>
	<th>test-clean</th>
	<th>test-other</th>
	</tr>
	</thead>
	<tbody>
	<tr>
	<td>2</td>
	<td>31.0</td>
	<td>51.0</td>
	<td>28.1</td>
	<td>48.3</td>
	<td>33.7</td>
	<td>56.0</td>
	<td>28.0</td>
	<td>48.5</td>
	</tr>
	<tr>
	<td>4</td>
	<td>11.7</td>
	<td>27.8</td>
	<td>10.8</td>
	<td>26.4</td>
	<td>17.4</td>
	<td>36.7</td>
	<td>13.9</td>
	<td>27.3</td>
	</tr>
	<tr>
	<td>6</td>
	<td>7.1</td>
	<td>19.8</td>
	<td>6.7</td>
	<td>19.2</td>
	<td>9.6</td>
	<td>23.7</td>
	<td>8.7</td>
	<td>18.4</td>
	</tr>
	<tr>
	<td>8</td>
	<td>5.8</td>
	<td>16.6</td>
	<td>5.5</td>
	<td>16.3</td>
	<td>5.8</td>
	<td>15.9</td>
	<td>4.8</td>
	<td>12.4</td>
	</tr>
	<tr>
	<td>10</td>
	<td>5.3</td>
	<td>15.3</td>
	<td>5.1</td>
	<td>15.1</td>
	<td>4.5</td>
	<td>12.6</td>
	<td>4.0</td>
	<td>9.5</td>
	</tr>
	<tr>
	<td>12</td>
	<td>5.1</td>
	<td>14.8</td>
	<td>4.8</td>
	<td>14.7</td>
	<td>4.3</td>
	<td>12.2</td>
	<td>3.6</td>
	<td>8.8</td>
	</tr>
	</tbody>
	</table>

	## 3. Citation

	```bibtex
	@misc{lasbordes2025splitformer,
	title={Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices},
	author={Maxence Lasbordes, Daniele Falavigna and Alessio Brutti},
	year={2025},
	note={Proc. of EUSIPCO 2025},
	}