[2211.09760] VeLO: Training Versatile Learned Optimizers by Scaling Up

While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.

4 mentions: @Luke_Metz@HochreiterSepp@typedfemale@kamikudakun
Date:

Referring Tweets

@Luke_Metz
@Luke_Metz Tired of having to manually tune optimizers? We’re excited to release VeLO, the first hparam-free, super versatile learned optimizer that outperforms hand-designed optimizers on real world problems. It was trained on thousands of TPU months of compute. 1/N t.co/0vcHSJio9U
@HochreiterSepp
@HochreiterSepp ArXiv t.co/twaDdqtkVl: Meta-learning an optimizer for deep learning that is realized via an LSTM with gradients as inputs and updates as output. Trained 4000 TPU months. Better than SGD and ADAM variants. Is open source. We did the same in 2001: t.co/kIL6fNh6mB
@typedfemale
@typedfemale (from: t.co/HLrAbCUD0O)
@kamikudakun
@kamikudakun VeLO: スケールアップによる汎用性の高い学習済みオプティマイザーのトレーニング VeLO: Training Versatile Learned Optimizers by Scaling Up t.co/U28cLX5EdX

Related Entries

[2208.02814] Conformal Risk Control
Read more [2208.02814] Conformal Risk Control
0 users, 4 mentions 2022/08/05 19:37
[2210.05546] What does a deep neural network confidently perceive? The effective dimension of high c...
Read more [2210.05546] What does a deep neural network confidently perceive? The effective dimension of high c...
0 users, 3 mentions 2022/10/13 15:09
[2211.00247] Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement...
Read more [2211.00247] Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement...
0 users, 2 mentions 2022/11/04 22:39
[2211.05641] Regression as Classification: Influence of Task Formulation on Neural Network Features
Read more [2211.05641] Regression as Classification: Influence of Task Formulation on Neural Network Features
0 users, 4 mentions 2022/11/11 10:37
[2211.09066] Teaching Algorithmic Reasoning via In-context Learning
Read more [2211.09066] Teaching Algorithmic Reasoning via In-context Learning
0 users, 3 mentions 2022/11/17 22:39

ML-Newsについて

機械学習の技術に関する情報は流速も早いし、分野も多様でキャッチアップが大変です。Twitterで機械学習用のリストを作っても、普段は機械学習以外の話題が多く流れており、効率的に情報収集するのは困難です。

ML-NewsはSNSを情報源とした機械学習に特化したニュースサイトです。機械学習に関する論文ブログライブラリコンペティション発表資料勉強会などの最新の情報を効率的に収集できます。

機械学習を応用した自然言語処理、画像認識、情報検索などの分野の情報や機械学習で必要になるデータ基盤やMLOpsの話題もカバーしています。
安定したサイト運営のためにGitHub sponsorを募集しています。

お知らせ

  • 2021/12/31: デザインを刷新しました
  • 2021/04/08: 日本語Kaggleのカテゴリを新設しました