🔧 Contrib#
Algorithms or wrappers that don’t meet (yet) the Inclusion Criteria or are not supported by the main library.
|
The ACProp optimizer. |
|
AdEMAMix. |
|
ADOPT (Adaptive Optimization with Provable Theoretical guarantees). |
|
Simplified AdEMAMix. |
|
Rescale updates according to the COntinuous COin Betting algorithm. |
|
State for COntinuous COin Betting. |
|
Learning rate free AdamW by D-Adaptation. |
|
State of the GradientTransformation returned by dadapt_adamw. |
|
Aggregates gradients based on the DPSGD algorithm. |
|
State containing PRNGKey for differentially_private_aggregate. |
|
Distance over Gradients (DoG) optimizer. |
|
State for DoG optimizer. |
|
Distance over weighted Gradients optimizer. |
|
State for DoWG optimizer. |
|
The DPSGD optimizer. |
|
GaLore: Memory-efficient training via gradient lowrank projection. |
|
State for the GaLore optimizer. |
|
The MADGRAD optimizer. |
|
State for the MADGRAD optimizer. |
|
Mechanic - a black box learning rate tuner/optimizer. |
|
State of the GradientTransformation returned by mechanize. |
|
Adaptive Learning Rates for SGD with momentum. |
|
State of the GradientTransformation returned by momo. |
|
Adaptive Learning Rates for Adam(W). |
|
State of the |
|
Muon: Momentum Orthogonalized by Newton-schulz. |
|
State for the Muon algorithm. |
|
Learning rate free AdamW with Prodigy. |
|
State of the GradientTransformation returned by prodigy. |
|
Implementation of SAM (Sharpness Aware Minimization). |
|
State of GradientTransformation returned by sam. |
|
Turn base_optimizer schedule_free. |
|
Schedule-Free wrapper for AdamW. |
|
Params for evaluation of |
|
Schedule-Free wrapper for SGD. |
|
State for schedule_free. |
|
Sophia optimizer. |
|
State for Sophia Optimizer. |
|
Splits the real and imaginary components of complex updates into two. |
|
Maintains the inner transformation state for split_real_and_imaginary. |
|
Scale updates according to the Ademamix algorithm. |
|
State for the Ademamix algorithm. |
|
Scale updates according to the Simplified AdEMAMix optimizer. |
|
State for the Simplified AdEMAMix optimizer. |
|
Rescale updates according to the ADOPT algorithm. |
|
Rescale updates according to ACProp (asynchronous version of AdaBelief). |
|
Rescale updates according to the MADGRAD algorithm. |
|
Rescale updates according to the Muon algorithm. |
|
Returns a GradientTransformationExtraArgs computing the Hessian diagonal. |
|