Initial walrus commit

Browse files

Files changed (4) hide show

README.md +134 -0
extended_config.yaml +328 -0
walrus.pt +3 -0
walrus.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,134 @@

+---
+tags:
+- walrus
+- foundation-model
+- physics
+- continuum-dynamics
+- transformer
+- PDE
+datasets:
+- polymathic-ai/shear_flow
+- polymathic-ai/gray_scott_reaction_diffusion
+- polymathic-ai/active_matter
+- polymathic-ai/turbulent_radiative_layer_2D
+- polymathic-ai/supernova_explosion_64
+- polymathic-ai/turbulence_gravity_cooling
+- polymathic-ai/rayleigh_benard
+- polymathic-ai/planetswe
+- polymathic-ai/acoustic_scattering_inclusions
+- polymathic-ai/MHD_64
+- polymathic-ai/rayleigh_taylor_instability
+- polymathic-ai/acoustic_scattering_discontinuous
+- polymathic-ai/acoustic_scattering_maze
+- polymathic-ai/helmholtz_staircase
+- polymathic-ai/viscoelastic_instability
+- BGLab/FlowBench
+license: mit
+---
+# Walrus: A Cross-Domain Foundation Model for Continuum Dynamics
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+[![GitHub Repo](https://img.shields.io/badge/GitHub-Repo-blue?logo=github)](https://github.com/PolymathicAI/walrus)
+[![arXiv](https://img.shields.io/badge/arXiv-2511.15684-b31b1b.svg)](https://arxiv.org/abs/2511.15684)
+Walrus is a large-scale **physics foundation model** capable of modeling a broad range of continuum dynamical systems.
+Walrus is trained jointly across **19 diverse physical domains** spanning:
+- astrophysics
+- geoscience
+- rheology
+- plasma physics
+- acoustics
+- classical fluids
+These systems have diverse boundary conditions and physical parameterizations. The model is optimized to serve as a **general-purpose surrogate** for physical simulation and a **strong initialization** for downstream fine-tuning on new PDE systems.
+---
+# Model Description
+Walrus is a **1.3B-parameter space–time Transformer** trained autoregressively to predict the temporal evolution of physical fields. Walrus is trained to model the evolution of physical systems in space and time. A simulation snapshot at time t is written as u(t).
+We define the difference between two consecutive snapshots as:
+Δu(t+1) = u(t+1) − u(t)
+Given a short history of snapshots:
+                    U(t) = [u(t − τ + 1), ..., u(t)]
+The model predicts the next state using:
+                      u(t+1) ≈ u(t) + M(U(t))
+### Key architectural components
+- **Adaptive-compute patch embedding**
+  - Token count automatically balanced across resolutions
+  - Enables mixing 2D and 3D datasets efficiently
+- **Patch Jittering**
+  - A harmonic-analysis–motivated augmentation technique
+  - Reduces aliasing and spectral artifacts
+  - Improves long-horizon stability across 17/19 pretraining datasets
+- **Tensor-law–aware data augmentation**
+  - 2D data embedded into 3D through plane rotations
+  - Vector/tensor fields rotated with correct physical transformations
+- **Asymmetric normalization**
+  - **Asymmetric normalization:** Walrus normalizes inputs by RMS over space-time and de-normalizes the predicted Δu using the RMS of Δ.
+---
+# Pretraining Details
+Walrus is pretrained 19 physical datasets with:
+- **Loss**: Per-field normalized L1 loss
+- **Optimizer**: AdamW
+- **Batching**: System-uniform hierarchical sampling
+- **Time-striding**: Random stride (1–5) per training example
+- **Patch jitter range**: Uniform per-axis random offset
+- **Dimensional unification**: 2D fields embedded as thin 3D volumes
+The model was pretrained on 96 **NVIDIA H100 GPUs** using distributed HSDP (4 GPU per shard group) with sampling matching distribution structure for minimal deadweight loss.
+---
+# Intended Use
+This pretrained checkpoint is suitable for:
+### ✔ Next-step prediction
+### ✔ Fast surrogate simulation
+### ✔ Autoregressive rollout of physical systems
+### ✔ Transfer learning to new physical settings
+# Resources
+Paper: https://arxiv.org/pdf/2511.15684
+Github: https://github.com/PolymathicAI/walrus
+Tutorial: https://github.com/PolymathicAI/walrus/demo_notebooks
+Note, the training code in the repository is closely coupled with tools from [the Well](https://github.com/PolymathicAI/the_well), so
+it can be beneficial to format data to match that schema. If that's not possible, the tutorial does show how one would use the model
+without Well-formatted data.
+# Demonstrated downstream tasks
+We show the strong performance of Walrus by finetuning on a range of challenging downstream tasks as shown in the paper.
+Paths to access the finetuned walrus checkpoints for various downstream tasks is as follows:
+### PDEGym  CE-RM: https://huggingface.co/polymathic-ai/walrus_ft_CE-RM/tree/main
+### PDEBench CNS Turbulent: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_64_Turb/tree/main
+### PDEBench CNS Random: https://huggingface.co/polymathic-ai/walrus_ft_CNS3D_128_Rand/tree/main
+### Flowbench FPOSkelenton: https://huggingface.co/polymathic-ai/walrus_ft_flowbench_skelenton/tree/main
+### The Well Postmerger Neutron Star: https://huggingface.co/polymathic-ai/walrus_ft_post_neutron_star_merger/tree/main
+### The Well Convective envelope RSG: https://huggingface.co/polymathic-ai/walrus_ft_convective_envelope_rsg/tree/main
+### PDEArena Conditioned Incompressible NS: https://huggingface.co/polymathic-ai/walrus_ft_pdearena_ins/tree/main
+### BubbleML 2.0 PoolBoil Subcooled: https://huggingface.co/polymathic-ai/walrus_ft_bubbleML_poolboil/tree/main
+Additional checkpoints not included in the Walrus collection on HF can be found [here](https://users.flatironinstitute.org/~polymathic/data/walrus_project_checkpoints/) though the endpoint is a bit finicky.
+More finetuning checkpoints will continue to be added to HF over time.

extended_config.yaml ADDED Viewed

	@@ -0,0 +1,328 @@

+data_workers: 10
+name: Walrus-wella-delta-Isotr[Space-Adapt-]-AdamW-0.0002
+automatic_setup: true
+trainer:
+  _target_: walrus.trainer.Trainer
+  max_epoch: 200
+  val_frequency: 10
+  rollout_val_frequency: 10
+  short_validation_length: 20
+  max_rollout_steps: 200
+  num_time_intervals: 5
+  enable_amp: false
+  loss_fn:
+    _target_: the_well.benchmark.metrics.MAE
+  formatter:
+    _target_: hydra.utils.get_class
+    path: walrus.data.well_to_multi_transformer.ChannelsFirstWithTimeFormatter
+  revin:
+    _target_: walrus.trainer.normalization_strat.SamplewiseRevNormalization
+    _partial_: true
+  prediction_type: delta
+  grad_acc_steps: 4
+  image_validation: true
+  video_validation: true
+  gradient_log_level: 0
+  clip_gradient: 10
+  log_interval: 200
+  loss_multiplier: 100.0
+  lr_scheduler_per_step: false
+  skip_spectral_metrics: true
+optimizer:
+  _target_: torch.optim.AdamW
+  weight_decay: 0.0001
+  eps: 1.0e-10
+  lr: 0.0002
+lr_scheduler:
+  _target_: walrus.optim.schedulers.InverseSqrtLinearWarmupSqrtCooldown
+  warmup_epochs: 10
+  cooldown_epochs: 10
+  warmup_lr_factor: 0.1
+  cooldown_lr_factor: 0.001
+model:
+  encoder:
+    _partial_: true
+    _target_: walrus.models.encoders.vstride_encoder.SpaceBagAdaptiveDVstrideEncoder
+    learned_pad: true
+    base_kernel_size1d:
+    - - 4
+      - 4
+    base_kernel_size2d:
+    - - 8
+      - 4
+    - - 8
+      - 4
+    base_kernel_size3d:
+    - - 8
+      - 4
+    - - 8
+      - 4
+    - - 8
+      - 4
+    groups: 12
+    kernel_scales_seq:
+    - - 2
+      - 2
+    - - 4
+      - 2
+    - - 4
+      - 4
+    - - 8
+      - 4
+    variable_downsample: true
+    variable_deterministic_ds: true
+    activation:
+      _partial_: true
+      _target_: torch.nn.SiLU
+  decoder:
+    _partial_: true
+    _target_: walrus.models.decoders.vstride_decoder.AdaptiveDVstrideDecoder
+    learned_pad: true
+    base_kernel_size1d:
+    - - 4
+      - 4
+    base_kernel_size2d:
+    - - 8
+      - 4
+    - - 8
+      - 4
+    base_kernel_size3d:
+    - - 8
+      - 4
+    - - 8
+      - 4
+    - - 8
+      - 4
+    groups: 12
+    activation:
+      _partial_: true
+      _target_: torch.nn.SiLU
+  processor:
+    space_mixing:
+      _partial_: true
+      _target_: walrus.models.spatial_blocks.full_attention.FullAttention
+      num_heads: 16
+      mlp_dim: null
+    time_mixing:
+      _partial_: true
+      _target_: walrus.models.temporal_blocks.axial_time_attention.AxialTimeAttention
+      num_heads: 16
+      bias_type: rel
+    channel_mixing:
+      _partial_: true
+      _target_: torch.nn.Identity
+    _partial_: true
+    _target_: walrus.models.spatiotemporal_blocks.space_time_split.SpaceTimeSplitBlock
+  norm_layer:
+    _partial_: true
+    _target_: walrus.models.shared_utils.normalization.RMSGroupNorm
+  _target_: walrus.models.IsotropicModel
+  hidden_dim: 1408
+  projection_dim: 48
+  intermediate_dim: 352
+  processor_blocks: 40
+  drop_path: 0.05
+  groups: 16
+  max_d: 3
+  static_axes: true
+  weight_tied_axes: false
+  causal_in_time: true
+  include_d:
+  - 2
+  - 3
+  override_dimensionality: 0
+  jitter_patches: true
+  gradient_checkpointing_freq: 2
+  use_periodic_fixed_jitter: true
+  input_field_drop: 0.0
+data:
+  field_index_map_override:
+    closed_boundary: 0
+    open_boundary: 1
+    bias_correction: 2
+    pressure: 3
+    velocity_x: 4
+    velocity_y: 5
+    velocity_z: 6
+    zeros_like_density: 7
+    speed_of_sound: 8
+    concentration: 9
+    D_xx: 10
+    D_xy: 11
+    D_xz: 12
+    D_yx: 13
+    D_yy: 14
+    D_yz: 15
+    D_zx: 16
+    D_zy: 17
+    D_zz: 18
+    E_xx: 19
+    E_xy: 20
+    E_xz: 21
+    E_yx: 22
+    E_yy: 23
+    E_yz: 24
+    E_zx: 25
+    E_zy: 26
+    E_zz: 27
+    density: 28
+    energy: 29
+    velocity_r: 30
+    velocity_theta: 31
+    velocity_phi: 32
+    momentum_x: 33
+    momentum_y: 34
+    momentum_z: 35
+    pressure_re: 36
+    pressure_im: 37
+    mask: 38
+    magnetic_field_x: 39
+    magnetic_field_y: 40
+    magnetic_field_z: 41
+    A: 42
+    B: 43
+    height: 44
+    internal_energy: 45
+    temperature: 46
+    electron_fraction: 47
+    entropy: 48
+    magnetic_field_log_r: 49
+    magnetic_field_theta: 50
+    magnetic_field_phi: 51
+    velocity_log_r: 52
+    buoyancy: 53
+    tracer: 54
+    log10_density: 55
+    log10_temperature: 56
+    c_zz: 57
+    C_xx: 58
+    C_xy: 59
+    C_xz: 60
+    C_yx: 61
+    C_yy: 62
+    C_yz: 63
+    C_zx: 64
+    C_zy: 65
+    C_zz: 66
+  transform:
+    train:
+      _target_: the_well.data.augmentation.RandomRotation90
+      p: 1.0
+  well_base_path: /mnt/gpuxl/polymathic/the_well/datasets/
+  wandb_data_name: well_allmain_only
+  module_parameters:
+    _target_: walrus.data.MixedWellDataModule
+    batch_size: 2
+    n_steps_input: 6
+    n_steps_output: 1
+    min_dt_stride: 1
+    max_dt_stride: 5
+    max_samples: 2000
+    well_dataset_info:
+      active_matter:
+        include_filters: []
+        exclude_filters: []
+      planetswe:
+        include_filters: []
+        exclude_filters: []
+      acoustic_scattering_maze:
+        include_filters: []
+        exclude_filters: []
+        field_transforms:
+          density: torch.zeros_like
+      acoustic_scattering_inclusions:
+        include_filters: []
+        exclude_filters: []
+        field_transforms:
+          density: torch.zeros_like
+      acoustic_scattering_discontinuous:
+        include_filters: []
+        exclude_filters: []
+        field_transforms:
+          density: torch.zeros_like
+      euler_multi_quadrants_openBC:
+        include_filters: []
+        exclude_filters: []
+      euler_multi_quadrants_periodicBC:
+        include_filters: []
+        exclude_filters: []
+      gray_scott_reaction_diffusion:
+        include_filters: []
+        exclude_filters: []
+      rayleigh_benard:
+        include_filters: []
+        exclude_filters: []
+      shear_flow:
+        include_filters: []
+        exclude_filters: []
+      turbulent_radiative_layer_2D:
+        include_filters: []
+        exclude_filters: []
+      helmholtz_staircase:
+        include_filters: []
+        exclude_filters: []
+      viscoelastic_instability:
+        include_filters: []
+        exclude_filters: []
+      supernova_explosion_128:
+        include_filters: []
+        exclude_filters: []
+        step_downsample_factor: 0.5
+        batch_downsample_factor: 0.5
+        field_transforms:
+          density: torch.log10
+          temperature: torch.log10
+      turbulence_gravity_cooling:
+        include_filters: []
+        exclude_filters: []
+        step_downsample_factor: 0.5
+        batch_downsample_factor: 0.5
+        field_transforms:
+          density: torch.log10
+          temperature: torch.log10
+      turbulent_radiative_layer_3D:
+        include_filters: []
+        exclude_filters: []
+        step_downsample_factor: 0.5
+        batch_downsample_factor: 0.5
+        field_transforms:
+          density: torch.log10
+          temperature: torch.log10
+      MHD_64:
+        include_filters: []
+        exclude_filters: []
+        step_downsample_factor: 0.5
+        batch_downsample_factor: 0.5
+      rayleigh_taylor_instability:
+        include_filters: []
+        exclude_filters: []
+        step_downsample_factor: 0.5
+        batch_downsample_factor: 0.5
+      flowbench_FPO_NS_2D_512x128_harmonics:
+        include_filters: []
+        exclude_filters: []
+        path: /mnt/gpuxl/polymathic/WellFormattedExternalData/flowbench/flowbench_FPO_NS_2D_512x128_harmonics
+auto_resume: true
+folder_override: ''
+checkpoint_override: ''
+config_override:
+validation_mode: false
+frozen_components:
+- model
+distribution:
+  distribution_type: hsdp
+  local_size: 4
+logger:
+  wandb: true
+  wandb_project_name: walrus_Training_Attempts
+checkpoint:
+  _target_: walrus.trainer.checkpoints.CheckPointer
+  save_dir: /mnt/home/polymathic/ceph/walrus_logging/runs/Walrus_ft_major_v2-wella-delta-Isotr[Space-Adapt-]-AdamW-0.0002/0/checkpoints
+  load_checkpoint_path: null
+  coalesced_checkpoint_path: null
+  save_best: true
+  checkpoint_frequency: 20
+  align_fields: true
+  load_chkpt_after_finetuning_expansion: false
+finetuning_mods: {}
+experiment_dir: /mnt/home/polymathic/ceph/walrus_logging/runs

walrus.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c5338a8ca88cdc36f8479dc4fe136416fed0d0b82521380998d2a14c8a01c3f
+size 5145064530

walrus.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d96dc428879c51a9d979f3d855cf2843ebb3e29790190fab34226db8aeec194
+size 5144892644