[1708.02182] Regularizing and Optimizing LSTM Language Modelscontact arXivarXiv Twitter

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization. Further, we introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Using these and other regularization strategies, we achieve state-of-the-art word level perplexities on two data sets: 57.3 on Penn Treebank and 65.8 on WikiText-2. In exploring the effectiveness of a neural cache in conjunction with our proposed model, we achieve an even lower state-of-

2 mentions: @icoxfog417@Smerity
Date: 2020/02/10 12:54

Referring Tweets

@icoxfog417 LSTMに対する正則化と最適化方法を提案した研究。様々な手法を提案しているが、再帰(h_t-1)にかかる重みに対しDropConnectをかける手法は、CuDNNLSTMなど高速だがdropout非対応のセルの外側で使用できるため、速度と正則化を両立できる。PTB/WikiText2双方で顕著な効果を確認 t.co/cQ9e2OI0KP
@Smerity @stanfordnlp @yaringal @NvidiaAI A bit of good news though - @yaringal-style dropout / DropConnect is one of the few things still possible with a blackbox LSTM implementation! You can apply dropout to the RNN recurrent weights themselves and then run a batch with the blackbox LSTM =] See t.co/gf64rSNlPU t.co/z8hvdUuEFp

Related Entries

Read more GitHub - neulab/lrlm: Code for the paper "Latent Relation Language Models" at AAAI-20.
0 users, 2 mentions 2020/02/10 12:54
Read more 子どもの言語獲得のモデル化とNN Language ModelsNN
0 users, 0 mentions 2018/10/05 03:23
Read more Generalized Language Models
1 users, 24 mentions 2019/02/03 02:18
Read more GitHub - facebookresearch/XLM: PyTorch original implementation of Cross-lingual Language Model Pretr...
0 users, 4 mentions 2019/09/03 17:17
Read more GitHub - facebookresearch/XLM: PyTorch original implementation of Cross-lingual Language Model Pretr...
0 users, 10 mentions 2019/08/21 02:16