Commit History
Merge pull request #25 from MotifTechnologies/fix/invalidate_cache_adamw b61425a unverified
TaehyunKim commited on
Invalidate AdamW tensor caches on load_state_dict [skip-build] 89b6099
draft commit for cpu_offload (#23) 10848ab unverified
Replace toy PP tests with real-model-based pipeline tests [skip-build] 67f7e11
Add correctness verification to PP tests using fully_shard [skip-build] a4d1f34
Remove correctness check from PP tests, focus on deadlock detection [skip-build] c0bbf2e
Add PP + dp_replicate deadlock regression tests [skip-build] cd587a6
Update fast path comment to reflect current behavior [skip-build] 7e33533
Update comment to reflect use_local_synchronization behavior [skip-build] 3f5cf49
Fix deadlock in construct_shard_mesh with PP + dp_replicate > 1 da7e5da
Apply pre-commit formatting (isort) [skip-build] 96b287c
Add MoE uneven shard test with mixed expert and non-expert params [skip-build] bdada12
Add uneven shard correctness test [skip-build] 1a97671
Add optimization docs and update implementation guide [skip-build] 14040eb
Update tests for MoE and parallel optimizations [skip-build] 81f49fe
Muon optimizer: expert batching, parallel caching, A2A overlap [skip-build] 0f37d63
Optimize pipeline: batched update, zero-copy scatter, prelaunch gather [skip-build] 2816b64
Cache AdamW placement grouping and tensor lists [skip-build] 8ca2492
Add torch.compile, CUDA graph, and compiled momentum [skip-build] e74d98f
Apply suggestions from code review cdaaf4f
TaehyunKim Copilot commited on
Add mhc_attn, mhc_ffn, lambda_proj to skip_keys ba293d0
Remove verbose param_groups summary logging 24f0957
Support multi-component expert_keys (e.g. "experts.w1") 5a99e12
Extract is_expert_param() helper to consolidate expert key matching e615b1c
Include original (pre-normalize) FQN in is_muon logging 135fc66
Add info-level logging for param group classification (Muon vs AdamW) 1118752
Use component-level matching for expert_keys to avoid shared_experts collision f008017
Normalize parameter FQNs to handle torch.compile / checkpoint wrappers 95a620f
Merge pull request #17 from MotifTechnologies/optimal-ns-coefficients b220459 unverified
Apply pre-commit formatting (yapf) [skip-build] bf30b9b
Add max_iter cap and non-finite checks to _optimal_quintic [skip-build] 206b280
Apply pre-commit formatting (yapf, isort) [skip-build] aff01db
Add comment explaining _coeffs_list and Polar Express vs former NS [skip-build] abaa449
Replace hardcoded NS coefficients with analytically optimal ones [skip-build] 573242f
Refactor pipeline to async generator pattern (#16) 33929c0 unverified
Support mHC (#15) ae32572 unverified
Update arxiv URL fa059da
Support param group with various placements (#13) e2b41e5 unverified
Merge pull request #14 from MotifTechnologies/fix_bug_in_fsdp 5458c82 unverified
TaehyunKim commited on
Add built binary [skip-build] 6ec5093
github-actions[bot] commited on
fix bug in fsdp 811726c
feat(workflow): add Slack notifications for build start, success, and failure [skip-build] (#12) 0b8d958 unverified
Merge pull request #11 from MotifTechnologies/ca1207-patch-1 53deea3 unverified
TaehyunKim commited on
Add built binary [skip-build] de5bead
github-actions[bot] commited on
Update torch-ext/optimizer/muon.py b0230e7 unverified
TaehyunKim commited on
Update torch-ext/optimizer/muon.py ff2fcfb unverified
TaehyunKim commited on
Update muon.py c16b438 unverified
TaehyunKim commited on
Merge pull request #10 from MotifTechnologies/fix_a2a_gs_assert 4f71bc9 unverified
TaehyunKim commited on