--- license: other datasets: - DBbun/EEG-250Hz_v1.0 language: - en pipeline_tag: feature-extraction --- # DBbun EEG Encoder — Pretrained Encoder Evaluation and Demo ## Overview This repository provides a pretrained **EEG encoder** and two demonstration scripts developed by **DBbun LLC**. The model converts short segments of multi-channel EEG into **128-dimensional embeddings** that summarize the temporal and spectral structure of the signal. It was trained self-supervised on DBbun’s synthetic multi-patient EEG corpus sampled at **250 Hz** using the **10–20 montage (38 channels)**. All data are fully synthetic and privacy-safe. --- ## Key Features - **2-second EEG encoder** trained at 250 Hz (38 channels). - Produces **128-D embeddings** suitable for: - Seizure vs. non-seizure discrimination - EEG morphology clustering and visualization - Similarity search and retrieval - Anomaly and quality detection - Downstream feature extraction for ML models - Includes demonstration scripts for embedding extraction and PCA-based visualization. --- ## Related Dataset The encoder was trained and evaluated using **[DBbun/EEG-250Hz_v1.0](https://huggingface.co/datasets/DBbun/EEG-250Hz_v1.0)**. Each file represents one synthetic patient with 38-channel EEG sampled at 250 Hz. When available, `labels_sec` (0 = non-seizure, 1 = seizure) allows computing a **seizure fraction** or training evaluation probes. --- ## Repository Contents | File | Description | |------|--------------| | `encoder_state.pt` | PyTorch weights (state dictionary). | | `encoder_traced.pt` | TorchScript version for deployment. | | `model_def.json` | Model configuration (architecture, channels, latent dimension, dropout, etc.). | | **`DBbun_EEG_Encoder_Eval_Demo_v1.py`** | Baseline script: loads EEG files, runs the pretrained encoder, and exports embeddings. | | **`DBbun_EEG_Encoder_Eval_Demo_v2.py`** | Extended demo: includes **PCA visualization** that colors seizure vs. non-seizure embeddings for interpretability. | --- ## Intended Use This model and accompanying scripts are intended for **research, education, and development** purposes. They support reproducible EEG feature learning, visualization, and benchmarking without access to real patient data. They are **not intended for clinical diagnosis or medical use**. --- ## Suggested Applications Evaluate representation quality on labeled synthetic EEG. Visualize clustering patterns of seizure vs. non-seizure embeddings using PCA. Train simple classifiers (e.g., logistic regression, SVM) on 128-D features for benchmarking. Apply the encoder as a fixed feature extractor in other time-series tasks. --- ## What Users Can Do with the Model The **DBbun EEG Encoder (250 Hz)** acts as a **feature extractor** — it converts raw EEG windows into compact **128-dimensional embeddings** that summarize the shape, rhythm, and energy distribution of brain signals. ### ✅ Typical Use Cases | Goal | What the user does | |------|--------------------| | **Feature extraction** | Feed EEG windows (2 s × 38 channels × 250 Hz) into the encoder → obtain 128-D embeddings for each window. | | **Classification** | Use the embeddings to train a simple model (e.g., logistic regression, random forest, MLP) for tasks such as seizure vs. non-seizure or artifact vs. clean. | | **Visualization** | Reduce embeddings to 2-D (PCA or UMAP) to explore clusters or signal structure. | | **Similarity search** | Build a FAISS or Annoy index to find EEG segments that resemble each other in latent space. | | **Anomaly detection** | Identify rare or abnormal patterns by computing distances to nearest neighbors. | | **Patient-level summaries** | Average embeddings across all windows from one patient to form a stable EEG “signature.” | --- ### 💾 Use of Precomputed Embeddings Precomputed embeddings are optional and depend on the user’s objective: | Scenario | Use precomputed embeddings? | Reason | |-----------|-----------------------------|---------| | **Quick exploration of results** | ✅ Yes | The file `demo_embeddings.npy` already contains 128-D features ready for clustering, visualization, or linear probes. | | **Custom EEG data (real or synthetic)** | ❌ No | The pretrained encoder can be applied directly to new EEG windows to generate embeddings. | | **Cross-model or cross-dataset comparison** | Optional | Both the provided embeddings and newly generated ones can be used for benchmarking and evaluation. | --- ## License Licensed for non-clinical research and educational use. For commercial licensing inquiries, please contact **DBbun LLC**.