File size: 5,605 Bytes
38cd852 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
tags:
- walrus
- foundation-model
- physics
- continuum-dynamics
- transformer
- PDE
datasets:
- polymathic-ai/shear_flow
- polymathic-ai/gray_scott_reaction_diffusion
- polymathic-ai/active_matter
- polymathic-ai/turbulent_radiative_layer_2D
- polymathic-ai/supernova_explosion_64
- polymathic-ai/turbulence_gravity_cooling
- polymathic-ai/rayleigh_benard
- polymathic-ai/planetswe
- polymathic-ai/acoustic_scattering_inclusions
- polymathic-ai/MHD_64
- polymathic-ai/rayleigh_taylor_instability
- polymathic-ai/acoustic_scattering_discontinuous
- polymathic-ai/acoustic_scattering_maze
- polymathic-ai/helmholtz_staircase
- polymathic-ai/viscoelastic_instability
- BGLab/FlowBench
license: mit
---
# Walrus: A Cross-Domain Foundation Model for Continuum Dynamics
[](https://opensource.org/licenses/MIT)
[](https://github.com/PolymathicAI/walrus)
[](https://arxiv.org/abs/2511.15684)
Walrus is a large-scale **physics foundation model** capable of modeling a broad range of continuum dynamical systems.
Walrus is trained jointly across **19 diverse physical domains** spanning:
- astrophysics
- geoscience
- rheology
- plasma physics
- acoustics
- classical fluids
These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a **general-purpose surrogate** for physical simulation and a **strong initialization** for downstream fine-tuning on new PDE systems.
---
# Model Description
Walrus is a **1.3B-parameter space–time Transformer** trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t).
We define the difference between two consecutive snapshots as:
Δu(t+1) = u(t+1) − u(t)
Given a short history of snapshots:
U(t) = [u(t − τ + 1), ..., u(t)]
The model predicts the next state using:
u(t+1) ≈ u(t) + M(U(t))
### Key architectural components
- **Adaptive-compute patch embedding**
- Token count automatically balanced across resolutions
- Enables mixing 2D and 3D datasets efficiently
- **Patch Jittering**
- A harmonic-analysis–motivated augmentation technique
- Reduces aliasing and spectral artifacts
- Improves long-horizon stability across 17/19 pretraining datasets
- **Tensor-law–aware data augmentation**
- 2D data embedded into 3D through plane rotations
- Vector/tensor fields rotated with correct physical transformations
- **Asymmetric normalization**
- **Asymmetric normalization:** Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Δu using the RMS of Δ.
---
# Pretraining Details
Walrus is pretrained 19 physical datasets with:
- **Loss**: Per-field normalized L1 loss
- **Optimizer**: AdamW
- **Batching**: System-uniform hierarchical sampling
- **Time-striding**: Random stride (1–5) per training example
- **Patch jitter range**: Uniform per-axis random offset
- **Dimensional unification**: 2D fields embedded as thin 3D volumes
The model was pretrained on 96 **NVIDIA H100 GPUs** using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss.
---
# Intended Use
This pretrained checkpoint is suitable for:
### ✔ Next-step prediction
### ✔ Fast surrogate simulation
### ✔ Autoregressive rollout of physical systems
### ✔ Transfer learning to new physical settings
# Resources
Paper: https://arxiv.org/pdf/2511.15684
Github: https://github.com/PolymathicAI/walrus
Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks
Note, the training code in the repository is closely coupled with tools from [the Well](https://github.com/PolymathicAI/the_well), so
it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model
without Well-formatted data.
# Demonstrated downstream tasks
We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper.
Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows:
### PDEGym CE-RM: https://huggingface.co/polymathic-ai/walrus_ft_CE-RM/tree/main
### PDEBench CNS Turbulent: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_64_Turb/tree/main
### PDEBench CNS Random: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_128_Rand/tree/main
### Flowbench FPOSkelenton: https://huggingface.co/polymathic-ai/walrus_ft_flowbench_skelenton/tree/main
### The Well Postmerger Neutron Star: https://huggingface.co/polymathic-ai/walrus_ft_post_neutron_star_merger/tree/main
### The Well Convective envelope RSG: https://huggingface.co/polymathic-ai/walrus_ft_convective_envelope_rsg/tree/main
### PDEArena Conditioned Incompressible NS: https://huggingface.co/polymathic-ai/walrus_ft_pdearena_ins/tree/main
### BubbleML 2.0 PoolBoil Subcooled: https://huggingface.co/polymathic-ai/walrus_ft_bubbleML_poolboil/tree/main
Additional checkpoints not included in the Walrus collection on HF can be found [here](https://users.flatironinstitute.org/~polymathic/data/walrus_project_checkpoints/) though the endpoint is a bit finicky.
More finetuning checkpoints will continue to be added to HF over time. |