Optimizers

Optimizers#

adabelief(learning_rate[, b1, b2, eps, ...])

The AdaBelief optimizer.

adadelta([learning_rate, rho, eps, ...])

The Adadelta optimizer.

adan(learning_rate[, b1, b2, b3, eps, ...])

The ADAptive Nesterov momentum algorithm (Adan).

adafactor(learning_rate, ...)

The Adafactor optimizer.

adagrad(learning_rate[, ...])

The Adagrad optimizer.

adam(learning_rate[, b1, b2, eps, eps_root, ...])

The Adam optimizer.

adamw(learning_rate[, b1, b2, eps, ...])

Adam with weight decay regularization.

adamax(learning_rate[, b1, b2, eps])

A variant of the Adam optimizer that uses the infinity norm.

adamaxw(learning_rate[, b1, b2, eps, ...])

Adamax with weight decay regularization.

amsgrad(learning_rate[, b1, b2, eps, ...])

The AMSGrad optimizer.

fromage(learning_rate[, min_norm])

The Frobenius matched gradient descent (Fromage) optimizer.

lamb(learning_rate[, b1, b2, eps, eps_root, ...])

The LAMB optimizer.

lars(learning_rate[, weight_decay, ...])

The LARS optimizer.

lbfgs(learning_rate, memory_size, ...)

L-BFGS optimizer.

lion(learning_rate[, b1, b2, mu_dtype, ...])

The Lion optimizer.

nadam(learning_rate[, b1, b2, eps, ...])

The NAdam optimizer.

nadamw(learning_rate[, b1, b2, eps, ...])

NAdamW optimizer, implemented as part of the AdamW optimizer.

noisy_sgd(learning_rate[, eta, gamma, key, seed])

A variant of SGD with added noise.

novograd(learning_rate[, b1, b2, eps, ...])

NovoGrad optimizer.

optimistic_gradient_descent(learning_rate[, ...])

An Optimistic Gradient Descent optimizer.

optimistic_adam_v2(learning_rate, *[, ...])

The Optimistic Adam optimizer.

polyak_sgd([max_learning_rate, scaling, ...])

SGD with Polyak step-size.

radam(learning_rate[, b1, b2, eps, ...])

The Rectified Adam optimizer.

rmsprop(learning_rate[, decay, eps, ...])

A flexible RMSProp optimizer.

sgd(learning_rate[, momentum, nesterov, ...])

A canonical Stochastic Gradient Descent optimizer.

sign_sgd(learning_rate)

A variant of SGD using only the signs of the gradient components.

signum(learning_rate[, beta, accumulator_dtype])

A variant of SGD using signs of the components of an EMA of the gradient.

sm3(learning_rate[, momentum])

The SM3 optimizer.

yogi(learning_rate[, b1, b2, eps])

The Yogi optimizer.

rprop(learning_rate[, eta_minus, eta_plus, ...])

The Rprop optimizer.