[1905.09755] Misspelling Oblivious Word Embeddings

In this paper we present a method to learn word embeddings that are resilient to misspellings. Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words. We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns. In our method, misspellings of each word are embedded close to their correct variants. We train these embeddings on a new dataset we are releasing publicly. Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.

2 mentions: @icoxfog417@plimantour
Keywords: embedding
Date: 2019/09/10 02:17

Referring Tweets

@icoxfog417 誤字脱字に強い分散表現の提案。fastTextをベースにして、ミススペルの単語と本物の単語の分散表現を近づけるLossの項を導入している。 t.co/flJfVDnYNd

Bookmark Comments

Related Entries

Read more Neural Text Embeddings for Information Retrieval (WSDM 2017)
5 users, 0 mentions 2018/12/05 22:16
Read more Document Embedding Techniques - Towards Data Science
0 users, 4 mentions 2019/09/09 13:56
Read more [DL輪読会]Learning an Embedding Space for Transferable Robot Skills
0 users, 0 mentions 2018/04/24 10:16
Read more Document Embedding Techniques - Towards Data Science
0 users, 6 mentions 2019/09/12 19:05
Read more Variational Autoencoder in Tensorflow - facial expression low dimensional embedding - Machine learni...
0 users, 0 mentions 2018/04/22 03:40