[2005.09137] Weak-Attention Suppression For Transformer Based Speech Recognitionopen searchopen navigation menucontact arXivarXiv Twitter

Transformers, originally proposed for natural language processing (NLP) tasks, have recently achieved great success in automatic speech recognition (ASR). However, adjacent acoustic units (i.e., frames) are highly correlated, and long-distance dependencies between them are weak, unlike text units. It suggests that ASR will likely benefit from sparse and localized attention. In this paper, we propose Weak-Attention Suppression (WAS), a method that dynamically induces sparsity in attention probabilities. We demonstrate that WAS leads to consistent Word Error Rate (WER) improvement over strong transformer baselines. On the widely used LibriSpeech benchmark, our proposed method reduced WER by 10%$ on test-clean and 5% on test-other for streamable transformers, resulting in a new state-of-the-art among streaming models. Further analysis shows that WAS learns to suppress attention of non-critical and redundant continuous acoustic frames, and is more likely to suppress past frames rather than

Keywords: attention
Date: 2020/05/20 14:21

Related Entries

Read more [DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
1 users, 1 mentions 2020/05/15 11:21
Read more What is the TensorFloat-32 Precision Format? | NVIDIA Blog ...
0 users, 17 mentions 2020/05/14 13:38
Read more A highly efficient, real-time text to speech system deployed on CPUs
0 users, 16 mentions 2020/05/15 17:21
Read more Siamese and Dual BERT for Multi Text Classification
0 users, 3 mentions 2020/05/15 06:44
Read more WT5?! Text-to-TextモデルでNLPタスクの予測理由を説明する手法! | AI-SCHOLAR.TECH
1 users, 0 mentions 2020/05/22 11:55