A Non-Profit Foundation

Funding the future of machine learning

The Shampoo Foundation commits 95% of $1 billion to advance research in deep learning, optimization, and AI safety—empowering the next generation of breakthroughs that benefit humanity.

$950M Committed to grants
95% Distributed to research
Potential impact

Accelerating progress through strategic philanthropy

The Shampoo Foundation was established with a singular vision: to remove barriers that prevent brilliant minds from pursuing transformative research in machine learning and optimization.

We believe the most important advances emerge when researchers have the freedom to explore ambitious ideas without constraint. Our grants provide that freedom—funding fundamental research, open-source tools, and educational initiatives that shape the future of AI.

🔬

Fundamental Research

Supporting long-term research that may not have immediate commercial applications but advances our understanding of intelligence.

🌍

Open Science

Championing open-source tools, datasets, and publications that democratize access to cutting-edge ML research.

🎓

Education & Access

Creating pathways for underrepresented communities to participate in and lead AI research.

Understanding Shampoo

A second-order optimizer that achieves the benefits of curvature-aware optimization while remaining computationally tractable at scale.

KRONECKER-FACTORED PRECONDITIONING FULL HESSIAN (Impractical) (mn × mn) O(n⁴) memory SHAMPOO'S INSIGHT Kronecker Factorization L (m × m) R (n × n) = O(m² + n²) memory ✓ Tractable! THE PRECONDITIONED UPDATE G Gradient L −¼ Left Precond. × G Transforming × R −¼ Right Precond. H Preconditioned!

📉 First-Order Methods

Traditional optimizers like SGD and Adam only use gradient information. They treat all parameter directions equally, leading to slow convergence on ill-conditioned problems where the loss surface has very different curvatures in different directions.

2,512 steps to 75.9% on ResNet-50

🚀 Second-Order: Shampoo

Shampoo uses curvature information via efficient Kronecker-factored preconditioners. By accumulating gradient statistics in L and R matrices, it adapts to the geometry of the loss landscape—taking larger steps in flat directions and smaller steps in steep ones.

1,729 steps to 75.9% on ResNet-50
Compute Gradient

Calculate gradient G for the current mini-batch

Σ
Accumulate Statistics

Update L ← L + GGᵀ and R ← R + GᵀG

Compute Roots

Efficiently compute L⁻¹ᐟ⁴ and R⁻¹ᐟ⁴ via Schur-Newton

Apply Update

W ← W − η · L⁻¹ᐟ⁴ G R⁻¹ᐟ⁴

Ht = Lt−¼ Gt Rt−¼
The Shampoo preconditioned gradient update — elegant, efficient, powerful

Destroy
The Gradient

A street guide to second-order optimization

The Problem

For 50 years, SGD has ruled machine learning like a corrupt regime. It only looks at the slope. It ignores the terrain. It takes the same tiny steps whether crossing a flat plain or climbing a cliff.

Adam tried to fix it. Added momentum. Added adaptive rates. But it's still first-order. Still blind to curvature. Still part of the system.

The Revolution

Shampoo sees what others don't. It reads the curvature of the loss landscape. It knows when to sprint and when to tiptoe.

The secret? Kronecker factorization. Instead of storing an impossible (n² × n²) matrix, it keeps two smaller ones. The establishment said it couldn't be done. They were wrong.

Wt+1 = Wt − η · L−¼ G R−¼

First Order

2,512

steps to converge
VS

Shampoo

1,729

steps to converge
RAW GRADIENT CHAOS PRECONDITION L−¼ × G × R−¼ PRECONDITIONED ORDER
"They told us second-order was too expensive. We told them they weren't trying hard enough." — The Shampoo Manifesto

Optimizer
Kombat

Second Order Showdown

>>> INSERT COIN TO CONTINUE <<<

📉

SGD

First Order • Basic
🧴

Shampoo

Second Order • OG
🫧

SOAP

Second Order • Hybrid
μ

Muon

Orthogonal • Speed

CASPR

Kron-Sum • Theory
Shampoo
Adam
🧴
Shampoo
The Preconditioner
SPEED
85
POWER
92
MEMORY
60
THEORY
95
Special Moves
L
Kronecker Factor
R
Fourth Root Fury
G
Preconditioned HADOUKEN
VS
📊
Adam
The Adaptive One
SPEED
70
POWER
65
MEMORY
85
THEORY
50
Special Moves
M
Momentum Rush
V
Variance Vortex
β
Beta Correction
47
Hit Combo!

Shampoo Wins!

FLAWLESS VICTORY — 40% faster convergence, 2× fewer iterations. The preconditioner has spoken.

>>> Global Leaderboard <<<

Rank Optimizer Steps to 75.9% Special Ability
1ST μ MUON ~1,200 2× COMPUTE EFFICIENCY
2ND 🫧 SOAP ~1,500 ADAM IN EIGENBASIS
3RD 🧴 SHAMPOO 1,729 KRONECKER FACTORIZATION
4TH CASPR ~1,800 TIGHTER BOUNDS
5TH 📊 ADAMW 2,512 WEIGHT DECAY
6TH 📉 SGD 4,000+ SIMPLICITY

© 2018-2025 OPTIMIZATION LABS

A Garden of Optimizers

Four paths through the landscape of loss, each with its own light and color, converging toward the same distant horizon.

minimum SGD Shampoo SOAP Muon CASPR Chemins vers le minimum Shampoo (2018) SOAP (2024) Muon (2024) CASPR (2023)
🧴

Shampoo

Gupta, Anil et al. — 2018

The progenitor. Maintains Kronecker-factored preconditioners L and R, capturing row and column gradient statistics separately.

H = L−¼ G R−¼
40% faster wall-clock vs Adam
🫧

SOAP

Vyas et al. — 2024

Shampoo's refined heir. Runs Adam in the eigenbasis of Shampoo's preconditioner, combining the best of both worlds.

Adam in QTWP basis
40%+ fewer iterations vs AdamW
μ

Muon

Keller Jordan — 2024

The minimalist. Orthogonalizes momentum via Newton-Schulz iteration. No second moments, half the memory of Adam.

G̃ = NS(momentum)
compute efficiency vs AdamW

CASPR

OpenReview — 2023

The theoretician. Combines axis preconditioners via Kronecker-sum approximation. Shampoo is its special case.

P ≈ L ⊕ R (Kron-sum)
Tighter convergence bounds
L'Arbre Généalogique des Optimiseurs
Full-Matrix Adagrad O(n⁴) — impractical Shampoo Kronecker factorization CASPR Kronecker-sum approx. SOAP + Adam in eigenbasis without accumulation Muon Newton-Schulz orthog. "CASPR − accumulation = Muon"
"As the impressionists captured light through countless small brushstrokes, so too do these optimizers approximate the curvature of loss through elegant factorizations — each revealing truth through its own particular lens."
— On the Art of Optimization

Where we direct our resources

We fund work across six interconnected domains, each critical to building beneficial AI systems.

01

Optimization Theory

Advancing second-order methods, adaptive algorithms, and theoretical foundations that make training more efficient and reliable.

02

Large-Scale Systems

Enabling efficient distributed training across thousands of accelerators with minimal overhead and maximum reproducibility.

03

AI Safety & Alignment

Ensuring advanced AI systems remain beneficial, interpretable, and aligned with human values as capabilities scale.

04

Open Infrastructure

Building and maintaining open-source frameworks, tools, and compute resources accessible to researchers worldwide.

05

Scientific Applications

Applying ML to accelerate discovery in biology, climate science, materials research, and other high-impact domains.

06

Researcher Development

Fellowships, mentorship programs, and grants for early-career researchers pursuing ambitious, unconventional ideas.

RA
Founder

Rohan Anil

Researcher & Philanthropist

Rohan Anil is a pioneering researcher whose work on the Shampoo optimizer transformed our understanding of practical second-order optimization in deep learning. His contributions—described as "a breakthrough in deep learning practical optimization at scale"—demonstrated that methods once considered computationally prohibitive could achieve state-of-the-art results.

Having witnessed firsthand how resource constraints limit scientific progress, Rohan established The Shampoo Foundation to ensure the next generation of researchers has the support to pursue transformative ideas without barriers.

"
The most profound advances in science come when brilliant people have the freedom to pursue ambitious ideas. Our role is simply to remove the obstacles.
— Rohan Anil, Founder

Ready to advance the field?

We welcome proposals from researchers, institutions, and organizations working on fundamental problems in machine learning, optimization, and AI safety.

Begin Application

Applications reviewed on a rolling basis. Typical response within 6-8 weeks.