Matches in SemOpenAlex for { <https://semopenalex.org/work/W2896780650> ?p ?o ?g. }
- W2896780650 abstract "For evaluating generation systems, automatic metrics such as BLEU cost nothing to run but have been shown to correlate poorly with human judgment, leading to systematic bias against certain model improvements. On the other hand, averaging human judgments, the unbiased gold standard, is often too expensive. In this paper, we use control variates to combine automatic metrics with human evaluation to obtain an unbiased estimator with lower cost than human evaluation alone. In practice, however, we obtain only a 7-13% cost reduction on evaluating summarization and open-response question answering systems. We then prove that our estimator is optimal: there is no unbiased estimator with lower cost. Our theory further highlights the two fundamental bottlenecks—the automatic metric and the prompt shown to human evaluators—both of which need to be improved to obtain greater cost savings." @default.
- W2896780650 created "2018-10-26" @default.
- W2896780650 creator A5024150057 @default.
- W2896780650 creator A5025255782 @default.
- W2896780650 creator A5034715099 @default.
- W2896780650 date "2018-01-01" @default.
- W2896780650 modified "2023-10-13" @default.
- W2896780650 title "The price of debiasing automatic metrics in natural language evalaution" @default.
- W2896780650 cites W1544827683 @default.
- W2896780650 cites W1861492603 @default.
- W2896780650 cites W1956340063 @default.
- W2896780650 cites W2030265142 @default.
- W2896780650 cites W2070150502 @default.
- W2896780650 cites W2101105183 @default.
- W2896780650 cites W2108682071 @default.
- W2896780650 cites W2117010802 @default.
- W2896780650 cites W2119196781 @default.
- W2896780650 cites W2123442489 @default.
- W2896780650 cites W2133459682 @default.
- W2896780650 cites W2149327368 @default.
- W2896780650 cites W2295951612 @default.
- W2896780650 cites W2341401723 @default.
- W2896780650 cites W2469104253 @default.
- W2896780650 cites W2525778437 @default.
- W2896780650 cites W2558203065 @default.
- W2896780650 cites W2593751037 @default.
- W2896780650 cites W2605035112 @default.
- W2896780650 cites W2606974598 @default.
- W2896780650 cites W2615953416 @default.
- W2896780650 cites W2626154462 @default.
- W2896780650 cites W2745039414 @default.
- W2896780650 cites W2759567155 @default.
- W2896780650 cites W2760672101 @default.
- W2896780650 cites W2916445322 @default.
- W2896780650 cites W2963173382 @default.
- W2896780650 cites W2963527228 @default.
- W2896780650 cites W2963672599 @default.
- W2896780650 cites W2963863909 @default.
- W2896780650 cites W2963903950 @default.
- W2896780650 cites W2963963993 @default.
- W2896780650 doi "https://doi.org/10.18653/v1/p18-1060" @default.
- W2896780650 hasPublicationYear "2018" @default.
- W2896780650 type Work @default.
- W2896780650 sameAs 2896780650 @default.
- W2896780650 citedByCount "85" @default.
- W2896780650 countsByYear W28967806502018 @default.
- W2896780650 countsByYear W28967806502019 @default.
- W2896780650 countsByYear W28967806502020 @default.
- W2896780650 countsByYear W28967806502021 @default.
- W2896780650 countsByYear W28967806502022 @default.
- W2896780650 countsByYear W28967806502023 @default.
- W2896780650 crossrefType "proceedings-article" @default.
- W2896780650 hasAuthorship W2896780650A5024150057 @default.
- W2896780650 hasAuthorship W2896780650A5025255782 @default.
- W2896780650 hasAuthorship W2896780650A5034715099 @default.
- W2896780650 hasBestOaLocation W28967806501 @default.
- W2896780650 hasConcept C105795698 @default.
- W2896780650 hasConcept C119857082 @default.
- W2896780650 hasConcept C154945302 @default.
- W2896780650 hasConcept C15744967 @default.
- W2896780650 hasConcept C162324750 @default.
- W2896780650 hasConcept C165646398 @default.
- W2896780650 hasConcept C170858558 @default.
- W2896780650 hasConcept C176217482 @default.
- W2896780650 hasConcept C185429906 @default.
- W2896780650 hasConcept C188147891 @default.
- W2896780650 hasConcept C191393472 @default.
- W2896780650 hasConcept C21547014 @default.
- W2896780650 hasConcept C2779458634 @default.
- W2896780650 hasConcept C33923547 @default.
- W2896780650 hasConcept C41008148 @default.
- W2896780650 hasConceptScore W2896780650C105795698 @default.
- W2896780650 hasConceptScore W2896780650C119857082 @default.
- W2896780650 hasConceptScore W2896780650C154945302 @default.
- W2896780650 hasConceptScore W2896780650C15744967 @default.
- W2896780650 hasConceptScore W2896780650C162324750 @default.
- W2896780650 hasConceptScore W2896780650C165646398 @default.
- W2896780650 hasConceptScore W2896780650C170858558 @default.
- W2896780650 hasConceptScore W2896780650C176217482 @default.
- W2896780650 hasConceptScore W2896780650C185429906 @default.
- W2896780650 hasConceptScore W2896780650C188147891 @default.
- W2896780650 hasConceptScore W2896780650C191393472 @default.
- W2896780650 hasConceptScore W2896780650C21547014 @default.
- W2896780650 hasConceptScore W2896780650C2779458634 @default.
- W2896780650 hasConceptScore W2896780650C33923547 @default.
- W2896780650 hasConceptScore W2896780650C41008148 @default.
- W2896780650 hasLocation W28967806501 @default.
- W2896780650 hasOpenAccess W2896780650 @default.
- W2896780650 hasPrimaryLocation W28967806501 @default.
- W2896780650 hasRelatedWork W1495104519 @default.
- W2896780650 hasRelatedWork W2171721708 @default.
- W2896780650 hasRelatedWork W2199432031 @default.
- W2896780650 hasRelatedWork W3214527415 @default.
- W2896780650 hasRelatedWork W4224293420 @default.
- W2896780650 hasRelatedWork W4281684980 @default.
- W2896780650 hasRelatedWork W4287887864 @default.
- W2896780650 hasRelatedWork W4362554880 @default.
- W2896780650 hasRelatedWork W4386875279 @default.
- W2896780650 hasRelatedWork W4225584739 @default.
- W2896780650 isParatext "false" @default.