--- language: - de license: mit library_name: transformers pipeline_tag: text-generation tags: - t5 - german - wechsel - cross-lingual datasets: - unpaywall-scientific --- # DE-T5-Sci-Transfer-Init WECHSEL-initialized checkpoint: English EN-T5-Sci weights + German tokenizer (`GermanT5/t5-efficient-gc4-german-base-nl36`) aligned using WECHSEL (static embeddings + bilingual dictionary). **No additional German training** after transfer. Folder includes `transfer_metadata.pt` with alignment diagnostics. ## Model Details - Embedding init: Orthogonal Procrustes map (fastText n-gram embeddings) + temperature-weighted mixtures (k-nearest neighbors) - Special tokens: `` aligned, sentinel behavior preserved - Tokenizer: GermanT5 SentencePiece (files bundled here) ## Evaluation (Global-MMLU, zero-shot) | Metric | EN | DE | | --- | --- | --- | | Overall accuracy | 0.2434 | 0.2463 | | Humanities | 0.2485 | 0.2559 | | STEM | 0.2391 | 0.2445 | | Social Sciences | 0.2317 | 0.2307 | | Other | 0.2517 | 0.2491 | This demonstrates immediate cross-lingual transfer without any German gradient steps. ## Intended Use Starting point for German continued pretraining or fine-tuning where English scientific knowledge should be retained but a German tokenizer is required. ## Limitations - No German data exposure beyond embedding alignment; you should run additional continued pretraining (see next model) for best performance. - Still limited to 512-token context.