[2002.03936v1] Subclass Distillationcontact arXivarXiv Twitter

After a large "teacher" neural network has been trained on labeled data, the probabilities that the teacher assigns to incorrect classes reveal a lot of information about the way in which the teacher generalizes. By training a small "student" model to match these probabilities, it is possible to transfer most of the generalization ability of the teacher to the student, often producing a much better small model than directly training the student on the training data. The transfer works best when there are many possible classes because more is then revealed about the function learned by the teacher, but in cases where there are only a few possible classes we show that we can improve the transfer by forcing the teacher to divide each class into many subclasses that it invents during the supervised training. The student is then trained to match the subclass probabilities. For datasets where there are known, natural subclasses we demonstrate that the teacher learns similar subclasses and th

2 mentions: @re_mahmoudi
Date: 2020/02/11 11:21

Related Entries

Read more GitHub - victordibia/handtracking: Building a Real-time Hand-Detector using Neural Networks (SSD) on...
0 users, 1 mentions 2019/11/09 15:51
Read more Neural networks for Graph Data NeurIPS2018読み会@PFN
25 users, 9 mentions 2019/01/26 09:46
Read more [1911.12116] Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey
0 users, 4 mentions 2019/11/28 23:20
Read more GitHub - thunlp/GNNPapers: Must-read papers on graph neural networks (GNN)
1 users, 0 mentions 2019/08/18 08:16
Read more Deep Forest :Deep Neural Networkの代替へ向けて - QiitaQiita
0 users, 0 mentions 2018/04/25 17:22