walrus / README.md
pmukhop's picture
Initial walrus commit
38cd852
---
tags:
- walrus
- foundation-model
- physics
- continuum-dynamics
- transformer
- PDE
datasets:
- polymathic-ai/shear_flow
- polymathic-ai/gray_scott_reaction_diffusion
- polymathic-ai/active_matter
- polymathic-ai/turbulent_radiative_layer_2D
- polymathic-ai/supernova_explosion_64
- polymathic-ai/turbulence_gravity_cooling
- polymathic-ai/rayleigh_benard
- polymathic-ai/planetswe
- polymathic-ai/acoustic_scattering_inclusions
- polymathic-ai/MHD_64
- polymathic-ai/rayleigh_taylor_instability
- polymathic-ai/acoustic_scattering_discontinuous
- polymathic-ai/acoustic_scattering_maze
- polymathic-ai/helmholtz_staircase
- polymathic-ai/viscoelastic_instability
- BGLab/FlowBench
license: mit
---
# Walrus: A Cross-Domain Foundation Model for Continuum Dynamics
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-blue?logo=github)](https://github.com/PolymathicAI/walrus)
[![arXiv](https://img.shields.io/badge/arXiv-2511.15684-b31b1b.svg)](https://arxiv.org/abs/2511.15684)
Walrus is a large-scale **physics foundation model** capable of modeling a broad range of continuum dynamical systems.
Walrus is trained jointly across **19 diverse physical domains** spanning:
- astrophysics
- geoscience
- rheology
- plasma physics
- acoustics
- classical fluids
These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a **general-purpose surrogate** for physical simulation and a **strong initialization** for downstream fine-tuning on new PDE systems.
---
# Model Description
Walrus is a **1.3B-parameter space–time Transformer** trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t).
We define the difference between two consecutive snapshots as:
Δu(t+1) = u(t+1) − u(t)
Given a short history of snapshots:
U(t) = [u(t − τ + 1), ..., u(t)]
The model predicts the next state using:
u(t+1) ≈ u(t) + M(U(t))
### Key architectural components
- **Adaptive-compute patch embedding**
- Token count automatically balanced across resolutions
- Enables mixing 2D and 3D datasets efficiently
- **Patch Jittering**
- A harmonic-analysis–motivated augmentation technique
- Reduces aliasing and spectral artifacts
- Improves long-horizon stability across 17/19 pretraining datasets
- **Tensor-law–aware data augmentation**
- 2D data embedded into 3D through plane rotations
- Vector/tensor fields rotated with correct physical transformations
- **Asymmetric normalization**
- **Asymmetric normalization:** Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Δu using the RMS of Δ.
---
# Pretraining Details
Walrus is pretrained 19 physical datasets with:
- **Loss**: Per-field normalized L1 loss
- **Optimizer**: AdamW
- **Batching**: System-uniform hierarchical sampling
- **Time-striding**: Random stride (1–5) per training example
- **Patch jitter range**: Uniform per-axis random offset
- **Dimensional unification**: 2D fields embedded as thin 3D volumes
The model was pretrained on 96 **NVIDIA H100 GPUs** using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss.
---
# Intended Use
This pretrained checkpoint is suitable for:
### ✔ Next-step prediction
### ✔ Fast surrogate simulation
### ✔ Autoregressive rollout of physical systems
### ✔ Transfer learning to new physical settings
# Resources
Paper: https://arxiv.org/pdf/2511.15684
Github: https://github.com/PolymathicAI/walrus
Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks
Note, the training code in the repository is closely coupled with tools from [the Well](https://github.com/PolymathicAI/the_well), so
it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model
without Well-formatted data.
# Demonstrated downstream tasks
We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper.
Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows:
### PDEGym CE-RM: https://huggingface.co/polymathic-ai/walrus_ft_CE-RM/tree/main
### PDEBench CNS Turbulent: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_64_Turb/tree/main
### PDEBench CNS Random: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_128_Rand/tree/main
### Flowbench FPOSkelenton: https://huggingface.co/polymathic-ai/walrus_ft_flowbench_skelenton/tree/main
### The Well Postmerger Neutron Star: https://huggingface.co/polymathic-ai/walrus_ft_post_neutron_star_merger/tree/main
### The Well Convective envelope RSG: https://huggingface.co/polymathic-ai/walrus_ft_convective_envelope_rsg/tree/main
### PDEArena Conditioned Incompressible NS: https://huggingface.co/polymathic-ai/walrus_ft_pdearena_ins/tree/main
### BubbleML 2.0 PoolBoil Subcooled: https://huggingface.co/polymathic-ai/walrus_ft_bubbleML_poolboil/tree/main
Additional checkpoints not included in the Walrus collection on HF can be found [here](https://users.flatironinstitute.org/~polymathic/data/walrus_project_checkpoints/) though the endpoint is a bit finicky.
More finetuning checkpoints will continue to be added to HF over time.