--- tags: - walrus - foundation-model - physics - continuum-dynamics - transformer - PDE datasets: - polymathic-ai/shear_flow - polymathic-ai/gray_scott_reaction_diffusion - polymathic-ai/active_matter - polymathic-ai/turbulent_radiative_layer_2D - polymathic-ai/supernova_explosion_64 - polymathic-ai/turbulence_gravity_cooling - polymathic-ai/rayleigh_benard - polymathic-ai/planetswe - polymathic-ai/acoustic_scattering_inclusions - polymathic-ai/MHD_64 - polymathic-ai/rayleigh_taylor_instability - polymathic-ai/acoustic_scattering_discontinuous - polymathic-ai/acoustic_scattering_maze - polymathic-ai/helmholtz_staircase - polymathic-ai/viscoelastic_instability - BGLab/FlowBench license: mit --- # Walrus: A Cross-Domain Foundation Model for Continuum Dynamics [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-blue?logo=github)](https://github.com/PolymathicAI/walrus) [![arXiv](https://img.shields.io/badge/arXiv-2511.15684-b31b1b.svg)](https://arxiv.org/abs/2511.15684) Walrus is a large-scale **physics foundation model** capable of modeling a broad range of continuum dynamical systems. Walrus is trained jointly across **19 diverse physical domains** spanning: - astrophysics - geoscience - rheology - plasma physics - acoustics - classical fluids These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a **general-purpose surrogate** for physical simulation and a **strong initialization** for downstream fine-tuning on new PDE systems. --- # Model Description Walrus is a **1.3B-parameter space–time Transformer** trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t). We define the difference between two consecutive snapshots as: Δu(t+1) = u(t+1) − u(t) Given a short history of snapshots: U(t) = [u(t − τ + 1), ..., u(t)] The model predicts the next state using: u(t+1) ≈ u(t) + M(U(t)) ### Key architectural components - **Adaptive-compute patch embedding** - Token count automatically balanced across resolutions - Enables mixing 2D and 3D datasets efficiently - **Patch Jittering** - A harmonic-analysis–motivated augmentation technique - Reduces aliasing and spectral artifacts - Improves long-horizon stability across 17/19 pretraining datasets - **Tensor-law–aware data augmentation** - 2D data embedded into 3D through plane rotations - Vector/tensor fields rotated with correct physical transformations - **Asymmetric normalization** - **Asymmetric normalization:** Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Δu using the RMS of Δ. --- # Pretraining Details Walrus is pretrained 19 physical datasets with: - **Loss**: Per-field normalized L1 loss - **Optimizer**: AdamW - **Batching**: System-uniform hierarchical sampling - **Time-striding**: Random stride (1–5) per training example - **Patch jitter range**: Uniform per-axis random offset - **Dimensional unification**: 2D fields embedded as thin 3D volumes The model was pretrained on 96 **NVIDIA H100 GPUs** using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss. --- # Intended Use This pretrained checkpoint is suitable for: ### ✔ Next-step prediction ### ✔ Fast surrogate simulation ### ✔ Autoregressive rollout of physical systems ### ✔ Transfer learning to new physical settings # Resources Paper: https://arxiv.org/pdf/2511.15684 Github: https://github.com/PolymathicAI/walrus Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks Note, the training code in the repository is closely coupled with tools from [the Well](https://github.com/PolymathicAI/the_well), so it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model without Well-formatted data. # Demonstrated downstream tasks We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper. Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows: ### PDEGym CE-RM: https://huggingface.co/polymathic-ai/walrus_ft_CE-RM/tree/main ### PDEBench CNS Turbulent: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_64_Turb/tree/main ### PDEBench CNS Random: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_128_Rand/tree/main ### Flowbench FPOSkelenton: https://huggingface.co/polymathic-ai/walrus_ft_flowbench_skelenton/tree/main ### The Well Postmerger Neutron Star: https://huggingface.co/polymathic-ai/walrus_ft_post_neutron_star_merger/tree/main ### The Well Convective envelope RSG: https://huggingface.co/polymathic-ai/walrus_ft_convective_envelope_rsg/tree/main ### PDEArena Conditioned Incompressible NS: https://huggingface.co/polymathic-ai/walrus_ft_pdearena_ins/tree/main ### BubbleML 2.0 PoolBoil Subcooled: https://huggingface.co/polymathic-ai/walrus_ft_bubbleML_poolboil/tree/main Additional checkpoints not included in the Walrus collection on HF can be found [here](https://users.flatironinstitute.org/~polymathic/data/walrus_project_checkpoints/) though the endpoint is a bit finicky. More finetuning checkpoints will continue to be added to HF over time.