arxiv:2110.10828

AdamD: Improved bias-correction in Adam

Published on Oct 22, 2021

Authors:

Abstract

A modified Adam optimizer bias-correction approach improves early training gradient updates by retaining second-moment bias-correction while removing first-order bias-correction.

AI-generated summary

Here I present a small update to the bias-correction term in the Adam optimizer that has the advantage of making smaller gradient updates in the first several steps of training. With the default bias-correction, Adam may actually make larger than requested gradient updates early in training. By only including the well-justified bias-correction of the second moment gradient estimate, v_t, and excluding the bias-correction on the first-order estimate, m_t, we attain these more desirable gradient update properties in the first series of steps. The default implementation of Adam may be as sensitive as it is to the hyperparameters β_1, β_2 partially due to the originally proposed bias correction procedure, and its behavior in early steps.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2110.10828 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2110.10828 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2110.10828 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.