[2008.09470] Top2Vec: Distributed Representations of Topicsopen searchopen navigation menucontact arXivsubscribe to arXiv mailings

Topic modeling is used for discovering latent semantic structure, usually referred to as topics, in a large collection of documents. The most widely used methods are Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis. Despite their popularity they have several weaknesses. In order to achieve optimal results they often require the number of topics to be known, custom stop-word lists, stemming, and lemmatization. Additionally these methods rely on bag-of-words representation of documents which ignore the ordering and semantics of words. Distributed representations of documents and words have gained popularity due to their ability to capture semantics of words and documents. We present $\texttt{top2vec}$, which leverages joint document and word semantic embedding to find $\textit{topic vectors}$. This model does not require stop-word lists, stemming or lemmatization, and it automatically finds the number of topics. The resulting topic vectors are jointly embedded with

1 mentions: @upura0
Date: 2020/10/18 02:22

Referring Tweets

@upura0 2020年8月にarXiv投稿された「Top2Vec」 t.co/LpQIHYMzsG の紹介記事。トピックモデルの一種らしい。GitHubのREADME t.co/sGYoydHxHE が詳しい。 TOP2VEC: New way of topic modelling by sagar pundir in @TDataScience t.co/6na744vOX1

Bookmark Comments

Related Entries

Read more 第2回 nlpaper.challenge NLP/CV交流勉強会 - connpass
0 users, 18 mentions 2019/01/28 09:46
Read more 第3回 nlpaper.challenge NLP/CV交流勉強会 - connpass
0 users, 14 mentions 2019/02/20 12:47
Read more 第3回 nlpaper.challenge NLP/CV交流勉強会 - connpass
0 users, 16 mentions 2019/02/21 09:47
Read more 第4回 nlpaper.challenge NLP/CV交流勉強会(最終回) - connpass
0 users, 21 mentions 2019/03/25 14:17
Read more 第1回 NLP/CV最先端勉強会 - connpass
0 users, 20 mentions 2019/06/21 12:48