optax.schedules.cosine_onecycle_schedule#
- optax.schedules.cosine_onecycle_schedule(transition_steps: int, peak_value: jax.typing.ArrayLike, pct_start: float = 0.3, div_factor: float = 25.0, final_div_factor: float = 10000.0) base.Schedule[source]#
Returns a function which implements the onecycle learning rate schedule.
This schedule increases the learning rate and then decreases it in a cosine-like manner. The number of steps over which the learning rate increases is determined by the
pct_startargument. The maximum value of the learning rate is determined by thepeak_valueargument, the initial value of the learning rate is determined through the formulainit_value = peak_value / div_factor, and the final value is determined by thefinal_div_factorargument.- Parameters:
transition_steps โ Number of steps over which annealing takes place.
peak_value โ Maximum value attained by schedule at pct_start percent of the cycle (in number of steps).
pct_start โ The percentage of the cycle (in number of steps) spent increasing the learning rate.
div_factor โ Determines the initial value via
init_value = peak_value / div_factor.final_div_factor โ Determines the final value via
final_value = init_value / final_div_factor.
- Returns:
- schedule
A function that maps step counts to values
References
Smith et al, Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates, 2017