[2010.04736] Evaluating and Characterizing Human Rationalesopen searchopen navigation menucontact arXivsubscribe to arXiv mailings

Two main approaches for evaluating the quality of machine-generated rationales are: 1) using human rationales as a gold standard; and 2) automated metrics based on how rationales affect model behavior. An open question, however, is how human rationales fare with these automatic metrics. Analyzing a variety of datasets and models, we find that human rationales do not necessarily perform well on these metrics. To unpack this finding, we propose improved metrics to account for model-dependent baseline performance. We then propose two methods to further characterize rationale quality, one based on model retraining and one on using "fidelity curves" to reveal properties such as irrelevance and redundancy. Our work leads to actionable suggestions for evaluating and characterizing rationales.

Date: 2020/10/18 18:53

Related Entries

Read more [1908.07123] Estimating Attention Flow in Online Video Network
0 users, 9 mentions 2019/08/24 15:47
Read more [1910.04386] Dialog on a canvas with a machine
0 users, 21 mentions 2019/10/11 02:18
Read more [2001.11274] Scalable Psychological Momentum Forecasting in Esportscontact arXivarXiv Twitter
0 users, 7 mentions 2020/02/02 15:51
Read more [2005.07062] Simulation-Based Inference for Global Health Decisionsopen searchopen navigation menuco...
0 users, 2 mentions 2020/05/18 11:21
Read more [2008.02323] Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggeringopen searchopen ...
0 users, 8 mentions 2020/08/07 03:51