---
tags:
- walrus
- foundation-model
- physics
- continuum-dynamics
- transformer
- PDE
datasets:
- polymathic-ai/shear_flow
- polymathic-ai/gray_scott_reaction_diffusion
- polymathic-ai/active_matter
- polymathic-ai/turbulent_radiative_layer_2D
- polymathic-ai/supernova_explosion_64
- polymathic-ai/turbulence_gravity_cooling
- polymathic-ai/rayleigh_benard
- polymathic-ai/planetswe
- polymathic-ai/acoustic_scattering_inclusions
- polymathic-ai/MHD_64
- polymathic-ai/rayleigh_taylor_instability
- polymathic-ai/acoustic_scattering_discontinuous
- polymathic-ai/acoustic_scattering_maze
- polymathic-ai/helmholtz_staircase
- polymathic-ai/viscoelastic_instability
- BGLab/FlowBench
license: mit
---

# Walrus: A Cross-Domain Foundation Model for Continuum Dynamics

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-blue?logo=github)](https://github.com/PolymathicAI/walrus)
[![arXiv](https://img.shields.io/badge/arXiv-2511.15684-b31b1b.svg)](https://arxiv.org/abs/2511.15684)

Walrus is a large-scale **physics foundation model** capable of modeling a broad range of continuum dynamical systems.

Walrus is trained jointly across **19 diverse physical domains** spanning:
- astrophysics 
- geoscience 
- rheology 
- plasma physics
- acoustics
- classical fluids

These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a **general-purpose surrogate** for physical simulation and a **strong initialization** for downstream fine-tuning on new PDE systems.

---

# Model Description

Walrus is a **1.3B-parameter space–time Transformer** trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t).

We define the difference between two consecutive snapshots as:
Δu(t+1) = u(t+1) − u(t)

Given a short history of snapshots:
                    U(t) = [u(t − τ + 1), ..., u(t)]

The model predicts the next state using:
                      u(t+1) ≈ u(t) + M(U(t))

### Key architectural components

- **Adaptive-compute patch embedding**  
  - Token count automatically balanced across resolutions  
  - Enables mixing 2D and 3D datasets efficiently  

- **Patch Jittering**  
  - A harmonic-analysis–motivated augmentation technique  
  - Reduces aliasing and spectral artifacts  
  - Improves long-horizon stability across 17/19 pretraining datasets  

- **Tensor-law–aware data augmentation**  
  - 2D data embedded into 3D through plane rotations  
  - Vector/tensor fields rotated with correct physical transformations  

- **Asymmetric normalization**  
  - **Asymmetric normalization:** Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Δu using the RMS of Δ.

---

# Pretraining Details

Walrus is pretrained 19 physical datasets with:

- **Loss**: Per-field normalized L1 loss  
- **Optimizer**: AdamW  
- **Batching**: System-uniform hierarchical sampling  
- **Time-striding**: Random stride (1–5) per training example  
- **Patch jitter range**: Uniform per-axis random offset  
- **Dimensional unification**: 2D fields embedded as thin 3D volumes  

The model was pretrained on 96 **NVIDIA H100 GPUs** using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss.

---

# Intended Use

This pretrained checkpoint is suitable for:

### ✔ Next-step prediction  
### ✔ Fast surrogate simulation  
### ✔ Autoregressive rollout of physical systems  
### ✔ Transfer learning to new physical settings  

# Resources

Paper: https://arxiv.org/pdf/2511.15684  
Github: https://github.com/PolymathicAI/walrus    
Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks   

Note, the training code in the repository is closely coupled with tools from [the Well](https://github.com/PolymathicAI/the_well), so 
it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model
without Well-formatted data. 


# Demonstrated downstream tasks

We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper. 
Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows:

### PDEGym  CE-RM: https://huggingface.co/polymathic-ai/walrus_ft_CE-RM/tree/main
### PDEBench CNS Turbulent: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_64_Turb/tree/main
### PDEBench CNS Random: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_128_Rand/tree/main
### Flowbench FPOSkelenton: https://huggingface.co/polymathic-ai/walrus_ft_flowbench_skelenton/tree/main
### The Well Postmerger Neutron Star: https://huggingface.co/polymathic-ai/walrus_ft_post_neutron_star_merger/tree/main
### The Well Convective envelope RSG: https://huggingface.co/polymathic-ai/walrus_ft_convective_envelope_rsg/tree/main
### PDEArena Conditioned Incompressible NS: https://huggingface.co/polymathic-ai/walrus_ft_pdearena_ins/tree/main
### BubbleML 2.0 PoolBoil Subcooled: https://huggingface.co/polymathic-ai/walrus_ft_bubbleML_poolboil/tree/main


Additional checkpoints not included in the Walrus collection on HF can be found [here](https://users.flatironinstitute.org/~polymathic/data/walrus_project_checkpoints/) though the endpoint is a bit finicky.

More finetuning checkpoints will continue to be added to HF over time.