|
|
--- |
|
|
tags: |
|
|
- walrus |
|
|
- foundation-model |
|
|
- physics |
|
|
- continuum-dynamics |
|
|
- transformer |
|
|
- PDE |
|
|
datasets: |
|
|
- polymathic-ai/shear_flow |
|
|
- polymathic-ai/gray_scott_reaction_diffusion |
|
|
- polymathic-ai/active_matter |
|
|
- polymathic-ai/turbulent_radiative_layer_2D |
|
|
- polymathic-ai/supernova_explosion_64 |
|
|
- polymathic-ai/turbulence_gravity_cooling |
|
|
- polymathic-ai/rayleigh_benard |
|
|
- polymathic-ai/planetswe |
|
|
- polymathic-ai/acoustic_scattering_inclusions |
|
|
- polymathic-ai/MHD_64 |
|
|
- polymathic-ai/rayleigh_taylor_instability |
|
|
- polymathic-ai/acoustic_scattering_discontinuous |
|
|
- polymathic-ai/acoustic_scattering_maze |
|
|
- polymathic-ai/helmholtz_staircase |
|
|
- polymathic-ai/viscoelastic_instability |
|
|
- BGLab/FlowBench |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# Walrus: A Cross-Domain Foundation Model for Continuum Dynamics |
|
|
|
|
|
[](https://opensource.org/licenses/MIT) |
|
|
[](https://github.com/PolymathicAI/walrus) |
|
|
[](https://arxiv.org/abs/2511.15684) |
|
|
|
|
|
Walrus is a large-scale **physics foundation model** capable of modeling a broad range of continuum dynamical systems. |
|
|
|
|
|
Walrus is trained jointly across **19 diverse physical domains** spanning: |
|
|
- astrophysics |
|
|
- geoscience |
|
|
- rheology |
|
|
- plasma physics |
|
|
- acoustics |
|
|
- classical fluids |
|
|
|
|
|
These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a **general-purpose surrogate** for physical simulation and a **strong initialization** for downstream fine-tuning on new PDE systems. |
|
|
|
|
|
--- |
|
|
|
|
|
# Model Description |
|
|
|
|
|
Walrus is a **1.3B-parameter space–time Transformer** trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t). |
|
|
|
|
|
We define the difference between two consecutive snapshots as: |
|
|
Δu(t+1) = u(t+1) − u(t) |
|
|
|
|
|
Given a short history of snapshots: |
|
|
U(t) = [u(t − τ + 1), ..., u(t)] |
|
|
|
|
|
The model predicts the next state using: |
|
|
u(t+1) ≈ u(t) + M(U(t)) |
|
|
|
|
|
### Key architectural components |
|
|
|
|
|
- **Adaptive-compute patch embedding** |
|
|
- Token count automatically balanced across resolutions |
|
|
- Enables mixing 2D and 3D datasets efficiently |
|
|
|
|
|
- **Patch Jittering** |
|
|
- A harmonic-analysis–motivated augmentation technique |
|
|
- Reduces aliasing and spectral artifacts |
|
|
- Improves long-horizon stability across 17/19 pretraining datasets |
|
|
|
|
|
- **Tensor-law–aware data augmentation** |
|
|
- 2D data embedded into 3D through plane rotations |
|
|
- Vector/tensor fields rotated with correct physical transformations |
|
|
|
|
|
- **Asymmetric normalization** |
|
|
- **Asymmetric normalization:** Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Δu using the RMS of Δ. |
|
|
|
|
|
--- |
|
|
|
|
|
# Pretraining Details |
|
|
|
|
|
Walrus is pretrained 19 physical datasets with: |
|
|
|
|
|
- **Loss**: Per-field normalized L1 loss |
|
|
- **Optimizer**: AdamW |
|
|
- **Batching**: System-uniform hierarchical sampling |
|
|
- **Time-striding**: Random stride (1–5) per training example |
|
|
- **Patch jitter range**: Uniform per-axis random offset |
|
|
- **Dimensional unification**: 2D fields embedded as thin 3D volumes |
|
|
|
|
|
The model was pretrained on 96 **NVIDIA H100 GPUs** using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss. |
|
|
|
|
|
--- |
|
|
|
|
|
# Intended Use |
|
|
|
|
|
This pretrained checkpoint is suitable for: |
|
|
|
|
|
### ✔ Next-step prediction |
|
|
### ✔ Fast surrogate simulation |
|
|
### ✔ Autoregressive rollout of physical systems |
|
|
### ✔ Transfer learning to new physical settings |
|
|
|
|
|
# Resources |
|
|
|
|
|
Paper: https://arxiv.org/pdf/2511.15684 |
|
|
Github: https://github.com/PolymathicAI/walrus |
|
|
Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks |
|
|
|
|
|
Note, the training code in the repository is closely coupled with tools from [the Well](https://github.com/PolymathicAI/the_well), so |
|
|
it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model |
|
|
without Well-formatted data. |
|
|
|
|
|
|
|
|
# Demonstrated downstream tasks |
|
|
|
|
|
We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper. |
|
|
Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows: |
|
|
|
|
|
### PDEGym CE-RM: https://huggingface.co/polymathic-ai/walrus_ft_CE-RM/tree/main |
|
|
### PDEBench CNS Turbulent: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_64_Turb/tree/main |
|
|
### PDEBench CNS Random: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_128_Rand/tree/main |
|
|
### Flowbench FPOSkelenton: https://huggingface.co/polymathic-ai/walrus_ft_flowbench_skelenton/tree/main |
|
|
### The Well Postmerger Neutron Star: https://huggingface.co/polymathic-ai/walrus_ft_post_neutron_star_merger/tree/main |
|
|
### The Well Convective envelope RSG: https://huggingface.co/polymathic-ai/walrus_ft_convective_envelope_rsg/tree/main |
|
|
### PDEArena Conditioned Incompressible NS: https://huggingface.co/polymathic-ai/walrus_ft_pdearena_ins/tree/main |
|
|
### BubbleML 2.0 PoolBoil Subcooled: https://huggingface.co/polymathic-ai/walrus_ft_bubbleML_poolboil/tree/main |
|
|
|
|
|
|
|
|
Additional checkpoints not included in the Walrus collection on HF can be found [here](https://users.flatironinstitute.org/~polymathic/data/walrus_project_checkpoints/) though the endpoint is a bit finicky. |
|
|
|
|
|
More finetuning checkpoints will continue to be added to HF over time. |