Huawei & Tsinghua U Method Boosts Task-Agnostic BERT Distillation Efficiency by Reusing Teacher Model Parameters | Synced

Powerful large-scale pretrained language models such as Google’s BERT have been a game-changer in the arena of natural language processing (NLP) and beyond. The impressive achievements however have come with huge computational and memory demands, which has made it difficult to deploy such models on resource-restricted devices.Previous studies have proposed task-agnostic BERT distillation to tackleContinue Reading

Keywords: bert
Date: 2021/05/04 15:16

