Matches in SemOpenAlex for { <https://semopenalex.org/work/W3213233110> ?p ?o ?g. }
- W3213233110 endingPage "7542" @default.
- W3213233110 startingPage "7529" @default.
- W3213233110 abstract "Building compositional explanations requires models to combine two or more facts that, together, describe why the answer to a question is correct. Typically, these “multi-hop” explanations are evaluated relative to one (or a small number of) gold explanations. In this work, we show these evaluations substantially underestimate model performance, both in terms of the relevance of included facts, as well as the completeness of model-generated explanations, because models regularly discover and produce valid explanations that are different than gold explanations. To address this, we construct a large corpus of 126k domain-expert (science teacher) relevance ratings that augment a corpus of explanations to standardized science exam questions, discovering 80k additional relevant facts not rated as gold. We build three strong models based on different methodologies (generation, ranking, and schemas), and empirically show that while expert-augmented ratings provide better estimates of explanation quality, both original (gold) and expert-augmented automatic evaluations still substantially underestimate performance by up to 36% when compared with full manual expert judgements, with different models being disproportionately affected. This poses a significant methodological challenge to accurately evaluating explanations produced by compositional reasoning models." @default.
- W3213233110 created "2021-11-22" @default.
- W3213233110 creator A5003273878 @default.
- W3213233110 creator A5007471680 @default.
- W3213233110 creator A5030776937 @default.
- W3213233110 creator A5081688167 @default.
- W3213233110 date "2021-11-01" @default.
- W3213233110 modified "2023-09-27" @default.
- W3213233110 title "On the Challenges of Evaluating Compositional Explanations in Multi-Hop Inference: Relevance, Completeness, and Expert Ratings" @default.
- W3213233110 cites W2164777277 @default.
- W3213233110 cites W2337282450 @default.
- W3213233110 cites W2558203065 @default.
- W3213233110 cites W2606964149 @default.
- W3213233110 cites W2794325560 @default.
- W3213233110 cites W2804897457 @default.
- W3213233110 cites W2889787757 @default.
- W3213233110 cites W2890894339 @default.
- W3213233110 cites W2950246755 @default.
- W3213233110 cites W2963318894 @default.
- W3213233110 cites W2963341956 @default.
- W3213233110 cites W2965373594 @default.
- W3213233110 cites W2983719617 @default.
- W3213233110 cites W2983995706 @default.
- W3213233110 cites W2996848635 @default.
- W3213233110 cites W3017018726 @default.
- W3213233110 cites W3034830866 @default.
- W3213233110 cites W3035164567 @default.
- W3213233110 cites W3082274269 @default.
- W3213233110 cites W3089695206 @default.
- W3213233110 cites W3098824823 @default.
- W3213233110 cites W3099655892 @default.
- W3213233110 cites W3104242913 @default.
- W3213233110 cites W3104992282 @default.
- W3213233110 cites W3117019976 @default.
- W3213233110 cites W3119370638 @default.
- W3213233110 cites W3120569501 @default.
- W3213233110 cites W3125833936 @default.
- W3213233110 cites W3129831491 @default.
- W3213233110 cites W3130196849 @default.
- W3213233110 cites W3159892921 @default.
- W3213233110 cites W3169722921 @default.
- W3213233110 cites W3172327250 @default.
- W3213233110 hasPublicationYear "2021" @default.
- W3213233110 type Work @default.
- W3213233110 sameAs 3213233110 @default.
- W3213233110 citedByCount "0" @default.
- W3213233110 crossrefType "proceedings-article" @default.
- W3213233110 hasAuthorship W3213233110A5003273878 @default.
- W3213233110 hasAuthorship W3213233110A5007471680 @default.
- W3213233110 hasAuthorship W3213233110A5030776937 @default.
- W3213233110 hasAuthorship W3213233110A5081688167 @default.
- W3213233110 hasConcept C111472728 @default.
- W3213233110 hasConcept C119857082 @default.
- W3213233110 hasConcept C134306372 @default.
- W3213233110 hasConcept C138885662 @default.
- W3213233110 hasConcept C154945302 @default.
- W3213233110 hasConcept C15744967 @default.
- W3213233110 hasConcept C158154518 @default.
- W3213233110 hasConcept C17231256 @default.
- W3213233110 hasConcept C17744445 @default.
- W3213233110 hasConcept C180747234 @default.
- W3213233110 hasConcept C189430467 @default.
- W3213233110 hasConcept C199360897 @default.
- W3213233110 hasConcept C199539241 @default.
- W3213233110 hasConcept C204321447 @default.
- W3213233110 hasConcept C23123220 @default.
- W3213233110 hasConcept C2522767166 @default.
- W3213233110 hasConcept C2776214188 @default.
- W3213233110 hasConcept C2779530757 @default.
- W3213233110 hasConcept C2780801425 @default.
- W3213233110 hasConcept C33923547 @default.
- W3213233110 hasConcept C41008148 @default.
- W3213233110 hasConceptScore W3213233110C111472728 @default.
- W3213233110 hasConceptScore W3213233110C119857082 @default.
- W3213233110 hasConceptScore W3213233110C134306372 @default.
- W3213233110 hasConceptScore W3213233110C138885662 @default.
- W3213233110 hasConceptScore W3213233110C154945302 @default.
- W3213233110 hasConceptScore W3213233110C15744967 @default.
- W3213233110 hasConceptScore W3213233110C158154518 @default.
- W3213233110 hasConceptScore W3213233110C17231256 @default.
- W3213233110 hasConceptScore W3213233110C17744445 @default.
- W3213233110 hasConceptScore W3213233110C180747234 @default.
- W3213233110 hasConceptScore W3213233110C189430467 @default.
- W3213233110 hasConceptScore W3213233110C199360897 @default.
- W3213233110 hasConceptScore W3213233110C199539241 @default.
- W3213233110 hasConceptScore W3213233110C204321447 @default.
- W3213233110 hasConceptScore W3213233110C23123220 @default.
- W3213233110 hasConceptScore W3213233110C2522767166 @default.
- W3213233110 hasConceptScore W3213233110C2776214188 @default.
- W3213233110 hasConceptScore W3213233110C2779530757 @default.
- W3213233110 hasConceptScore W3213233110C2780801425 @default.
- W3213233110 hasConceptScore W3213233110C33923547 @default.
- W3213233110 hasConceptScore W3213233110C41008148 @default.
- W3213233110 hasLocation W32132331101 @default.
- W3213233110 hasOpenAccess W3213233110 @default.
- W3213233110 hasPrimaryLocation W32132331101 @default.
- W3213233110 hasRelatedWork W2037602277 @default.
- W3213233110 hasRelatedWork W2742632194 @default.