optax.scale_by_adam

optax.scale_by_adam#

optax.scale_by_adam( b1: jax.typing.ArrayLike = 0.9, b2: jax.typing.ArrayLike = 0.999, eps: jax.typing.ArrayLike = 1e-08, eps_root: jax.typing.ArrayLike = 0.0, mu_dtype: str | type[Any] | dtype | SupportsDType | None = None, *, nesterov: bool = False, ) → optax.GradientTransformation[source]#

Rescale updates according to the Adam algorithm.

See optax.adam() for more details.

Parameters:

b1 – Decay rate for the exponentially weighted average of grads.
b2 – Decay rate for the exponentially weighted average of squared grads.
eps – Term added to the denominator to improve numerical stability.
eps_root – Term added to the denominator inside the square-root to improve numerical stability when backpropagating gradients through the rescaling.
mu_dtype – Optional dtype to be used for the first order accumulator; if None then the dtype is inferred from params and updates.
nesterov – Whether to use Nesterov momentum. The variant of Adam with Nesterov momentum is described in [Dozat 2016]

Returns:

A optax.GradientTransformation object.

optax.scale_by_adam

Contents

optax.scale_by_adam#