[1906.02715] Visualizing and Measuring the Geometry of BERT

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.

9 mentions: @wattenberg@viegasf@keigohtr@societyoftrees@dtsbourg@wzuidema@yousui_t_ken@innatok
Keywords: bert
Date: 2019/06/09 03:48

Referring Tweets

@wattenberg How does a neural net represent language? See the visualizations and geometry in this PAIR team paper https://t.co/55GO5lJtsl and blog post https://t.co/jsjwY4Dl4y https://t.co/ZkG81UBHcE
@viegasf Analyzing and visualizing syntax trees in the high-dimensional spaces of neural nets. Check out the new PAIR paper on BERT geometry https://t.co/SPryH5mqnB And the blog post on “Language, trees, and geometry in neural networks” https://t.co/6hMthb5QNL https://t.co/NhTYvhb8aV
@societyoftrees Nice geometrical explanation for the squared distance relation between parsed trees & euclidean contextual embeddings: https://t.co/dOBmepRaXX, building on https://t.co/Q8OlUVUMjC https://t.co/W4Cd5Nr1uS
@innatok Speaking about the mixture of physical and virtual reality. Great paper on the Word Embeddings vector distances vs Eucleadean (sq) ones #NLP #AI https://t.co/2zZWwJg1Lt
@dtsbourg More cool work investigating the linguistic features of BERT! --- "Visualizing and Measuring the Geometry of BERT" by @_coenen, Reif, Yuan et al. https://t.co/Yet0CAsv2C https://t.co/e4FJ0ao4zw
@wzuidema Jawahar, Sagot, Seddah What does BERT learn about the structure of language? ACL2019 https://t.co/ucLAOXS3Jh Coenen, Reif, Yuan, Kim, Pearce, Viégas, Wattenberg Visualizing and Measuring the Geometry of BERT https://t.co/9un6mZ8Q0J Blog: https://t.co/EU5lrQDv1N

Related Entries

Read more GitHub - soskek/bert-chainer: Chainer implementation of "BERT: Pre-training of Deep Bidirectional Tr...
7 users, 0 mentions 2018/12/02 18:01
Read more [DL Hacks]BERT: Pre-training of Deep Bidirectional Transformers for L…
4 users, 5 mentions 2018/12/07 04:31
Read more [DL輪読会]BERT: Pre-training of Deep Bidirectional Transformers for Lang…
0 users, 0 mentions 2018/10/20 12:15
Read more GitHub - huggingface/pytorch-pretrained-BERT: The Big-&-Extending-Repository-of-Transformers: PyTorc...
1 users, 7 mentions 2019/03/04 21:47
Read more The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) – Jay Alammar – Visualizing ...
0 users, 7 mentions 2019/03/01 00:47