optax.schedules.linear_onecycle_schedule#
- optax.schedules.linear_onecycle_schedule(transition_steps: int, peak_value: jax.typing.ArrayLike, pct_start: float = 0.3, pct_final: float = 0.85, div_factor: float = 25.0, final_div_factor: float = 10000.0) base.Schedule[source]#
Returns a learning rate with three linear phases.
Phase 1, from iteration 0 to
pct_start * transition_steps. The learning rate increases linearly frompeak_value / div_factortopeak_value.Phase 2, from iteration
pct_start * transition_stepstopct_final * transition_steps. The learning rate decreases linearly frompeak_valueback to the initialpeak_value/div_factor.Phase 3: For the remaining steps, the learning rate interpolates between
peak_value/div_factorandpeak_value / final_div_factor. Iffinal_div_factoris larger thandiv_factor, this is a decreasing phase.
- Parameters:
transition_steps โ Number of steps over which annealing takes place.
peak_value โ Maximum value attained by schedule at pct_start percent of the cycle (in number of steps).
pct_start โ The percentage of the cycle (in number of steps) spent increasing the learning rate.
pct_final โ The percentage of the cycle (in number of steps) spent increasing to
peak_valuethen decreasing back toinit_value.div_factor โ Determines the initial value via
init_value = peak_value / div_factor.final_div_factor โ Determines the final value via
final_value = init_value / final_div_factor.
- Returns:
- schedule
A function that maps step counts to values
References
Smith et al, Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates, 2017