Since its inception by Cauchy in 1847, the gradient descent algorithm has been without guidance as to how to efficiently set the learning rate. This paper identifies a concept, defines metrics, and introduces algorithms to provide such guidance. The result is a family of algorithms (Neograd) based on a {\em constant $ρ$ ansatz}, where $ρ$ is a metric based on the error of the updates. This allows one to adjust the learning rate at each step, using a formulaic estimate based on $ρ$. It is now no longer necessary to do trial runs beforehand to estimate a single learning rate for an entire optimization run. The additional costs to operate this metric are trivial. One member of this family of algorithms, NeogradM, can quickly reach much lower cost function values than other first order algorithms. Comparisons are made mainly between NeogradM and Adam on an array of test functions and on a neural network model for identifying hand-written digits. The results show great performance improveme

1 mentions:
Date: 2020/10/16 05:21

## Referring Tweets

@rasbt Hah, what a day! This (Adabelief) is the 2nd new optimizer I discovered today. Next to the freshly uploaded Neograd, which just saw on arXiv earlier today: t.co/8I6x3ILv64 (the saying goes "all good things come in threes" right?). GitHub repo here: t.co/AcIyWx609a t.co/2jZk6PVvLX

## Related Entries

Stat453: Intro to Deep Learning (SS20) -- L10 Input and Weight Normalization Part 2/2 - YouTube
0 users, 1 mentions 2020/03/26 15:50
Stat453: Intro to Deep Learning (SS20) -- L12 Intro to Convolutional Neural Networks (Part 1) - You...
0 users, 1 mentions 2020/04/02 15:51
Intro to Deep Learning -- L14 Intro to Recurrent Neural Networks [Stat453, SS20] - YouTube
0 users, 1 mentions 2020/04/14 15:51
Intro to Deep Learning -- Student Presentations, Day 1 [Stat453, SS20] - YouTube
0 users, 1 mentions 2020/04/23 20:20
Information | Free Full-Text | Kernel-Based Ensemble Learning in Python
0 users, 2 mentions 2020/06/14 05:21