optax.schedules.linear_onecycle_schedule

optax.schedules.linear_onecycle_schedule#

optax.schedules.linear_onecycle_schedule(transition_steps: int, peak_value: jax.typing.ArrayLike, pct_start: float = 0.3, pct_final: float = 0.85, div_factor: float = 25.0, final_div_factor: float = 10000.0) base.Schedule[source]#

Returns a learning rate with three linear phases.

  • Phase 1, from iteration 0 to pct_start * transition_steps. The learning rate increases linearly from peak_value / div_factor to peak_value.

  • Phase 2, from iteration pct_start * transition_steps to pct_final * transition_steps. The learning rate decreases linearly from peak_value back to the initial peak_value/div_factor.

  • Phase 3: For the remaining steps, the learning rate interpolates between peak_value/div_factor and peak_value / final_div_factor. If final_div_factor is larger than div_factor, this is a decreasing phase.

Parameters:
  • transition_steps โ€“ Number of steps over which annealing takes place.

  • peak_value โ€“ Maximum value attained by schedule at pct_start percent of the cycle (in number of steps).

  • pct_start โ€“ The percentage of the cycle (in number of steps) spent increasing the learning rate.

  • pct_final โ€“ The percentage of the cycle (in number of steps) spent increasing to peak_value then decreasing back to init_value.

  • div_factor โ€“ Determines the initial value via init_value = peak_value / div_factor.

  • final_div_factor โ€“ Determines the final value via final_value = init_value / final_div_factor.

Returns:

schedule

A function that maps step counts to values

References

Smith et al, Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates, 2017