[2102.07033] PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowledge. However, these models lack the accuracy of retrieve-and-read systems, as substantially less knowledge is covered by the available QA-pairs relative to text corpora like Wikipedia. To facilitate improved QA-pair models, we introduce Probably Asked Questions (PAQ), a very large resource of 65M automatically-generated QA-pairs. We introduce a new QA-pair retriever, RePAQ, to complement PAQ. We find that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster. Using PAQ, we train CBQA models which outp

8 mentions: @PSH_Lewis@peteskomoroch@ak92501@Montreal_AI@PSH_Lewis@cyberandy@arxivabs
Date: 2021/02/22 02:21

Referring Tweets

@PSH_Lewis Check the paper out for full details! t.co/MA8BTtWBi6 10/10
@peteskomoroch New Dataset: "PAQ – 65 Million Probably-Asked Questions and What You Can Do With Them" from @PSH_Lewis & team @ FB, extracted using Wikipedia, models implemented in @PyTorch using @huggingface Transformers. QA Pairs Data: t.co/oklGIdA0kr Paper: t.co/FdiZQmJsyQ
@ak92501 PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them pdf: t.co/rFoyYe3zkR abs: t.co/0QGt6mTtB6 github: t.co/dXoN1LCTDw t.co/Gk3xyOy5XF

Related Entries

Read more [2008.11649] Discrete Word Embedding for Logical Natural Language Understandingopen searchopen navig...
0 users, 5 mentions 2020/08/28 17:22
Read more [2009.14794] Rethinking Attention with Performersopen searchopen navigation menucontact arXivsubscri...
0 users, 6 mentions 2020/10/01 02:21
Read more [2010.12683] Long Document Ranking with Query-Directed Sparse Transformeropen searchopen navigation ...
0 users, 3 mentions 2020/10/28 15:51
Read more [2101.06561] GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation
0 users, 7 mentions 2021/01/20 05:21
Read more [2101.12176] On the Origin of Implicit Regularization in Stochastic Gradient Descent
0 users, 6 mentions 2021/01/30 03:51