Matches in SemOpenAlex for { <https://semopenalex.org/work/W3088916145> ?p ?o ?g. }
- W3088916145 abstract "There is an increasing focus on model-based dialog evaluation metrics such as ADEM, RUBER, and the more recent BERT-based metrics. These models aim to assign a high score to all relevant responses and a low score to all irrelevant responses. Ideally, such models should be trained using multiple relevant and irrelevant responses for any given context. However, no such data is publicly available, and hence existing models are usually trained using a single relevant response and multiple randomly selected responses from other contexts (random negatives). To allow for better training and robust evaluation of model-based metrics, we introduce the DailyDialog++ dataset, consisting of (i) five relevant responses for each context and (ii) five adversarially crafted irrelevant responses for each context. Using this dataset, we first show that even in the presence of multiple correct references, n-gram based metrics and embedding based metrics do not perform well at separating relevant responses from even random negatives. While model-based metrics perform better than n-gram and embedding based metrics on random negatives, their performance drops substantially when evaluated on adversarial examples. To check if large scale pretraining could help, we propose a new BERT-based evaluation metric called DEB, which is pretrained on 727M Reddit conversations and then finetuned on our dataset. DEB significantly outperforms existing models, showing better correlation with human judgements and better performance on random negatives (88.27% accuracy). However, its performance again drops substantially, when evaluated on adversarial responses, thereby highlighting that even large-scale pretrained evaluation models are not robust to the adversarial examples in our dataset. The dataset and code are publicly available." @default.
- W3088916145 created "2020-10-01" @default.
- W3088916145 creator A5047163223 @default.
- W3088916145 creator A5050036814 @default.
- W3088916145 creator A5060553255 @default.
- W3088916145 creator A5087678154 @default.
- W3088916145 date "2020-09-23" @default.
- W3088916145 modified "2023-09-27" @default.
- W3088916145 title "Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining" @default.
- W3088916145 cites W1518951372 @default.
- W3088916145 cites W1654173042 @default.
- W3088916145 cites W1924770834 @default.
- W3088916145 cites W2101105183 @default.
- W3088916145 cites W2123301721 @default.
- W3088916145 cites W2140054881 @default.
- W3088916145 cites W2154652894 @default.
- W3088916145 cites W2163074454 @default.
- W3088916145 cites W2250645967 @default.
- W3088916145 cites W2328886022 @default.
- W3088916145 cites W2581637843 @default.
- W3088916145 cites W2729046720 @default.
- W3088916145 cites W2891103209 @default.
- W3088916145 cites W2916772188 @default.
- W3088916145 cites W2916898195 @default.
- W3088916145 cites W2953356739 @default.
- W3088916145 cites W2962786758 @default.
- W3088916145 cites W2962883855 @default.
- W3088916145 cites W2963326483 @default.
- W3088916145 cites W2963341956 @default.
- W3088916145 cites W2963403868 @default.
- W3088916145 cites W2963499246 @default.
- W3088916145 cites W2963527228 @default.
- W3088916145 cites W2963544536 @default.
- W3088916145 cites W2963790827 @default.
- W3088916145 cites W2963801581 @default.
- W3088916145 cites W2963825865 @default.
- W3088916145 cites W2963879591 @default.
- W3088916145 cites W2964178377 @default.
- W3088916145 cites W2966292672 @default.
- W3088916145 cites W2988937804 @default.
- W3088916145 cites W2996403597 @default.
- W3088916145 cites W630532510 @default.
- W3088916145 hasPublicationYear "2020" @default.
- W3088916145 type Work @default.
- W3088916145 sameAs 3088916145 @default.
- W3088916145 citedByCount "0" @default.
- W3088916145 crossrefType "posted-content" @default.
- W3088916145 hasAuthorship W3088916145A5047163223 @default.
- W3088916145 hasAuthorship W3088916145A5050036814 @default.
- W3088916145 hasAuthorship W3088916145A5060553255 @default.
- W3088916145 hasAuthorship W3088916145A5087678154 @default.
- W3088916145 hasConcept C119857082 @default.
- W3088916145 hasConcept C121332964 @default.
- W3088916145 hasConcept C124101348 @default.
- W3088916145 hasConcept C136764020 @default.
- W3088916145 hasConcept C151730666 @default.
- W3088916145 hasConcept C154945302 @default.
- W3088916145 hasConcept C162324750 @default.
- W3088916145 hasConcept C169258074 @default.
- W3088916145 hasConcept C173853756 @default.
- W3088916145 hasConcept C176217482 @default.
- W3088916145 hasConcept C204321447 @default.
- W3088916145 hasConcept C21547014 @default.
- W3088916145 hasConcept C2778755073 @default.
- W3088916145 hasConcept C2779343474 @default.
- W3088916145 hasConcept C37736160 @default.
- W3088916145 hasConcept C41008148 @default.
- W3088916145 hasConcept C41608201 @default.
- W3088916145 hasConcept C62520636 @default.
- W3088916145 hasConcept C86803240 @default.
- W3088916145 hasConceptScore W3088916145C119857082 @default.
- W3088916145 hasConceptScore W3088916145C121332964 @default.
- W3088916145 hasConceptScore W3088916145C124101348 @default.
- W3088916145 hasConceptScore W3088916145C136764020 @default.
- W3088916145 hasConceptScore W3088916145C151730666 @default.
- W3088916145 hasConceptScore W3088916145C154945302 @default.
- W3088916145 hasConceptScore W3088916145C162324750 @default.
- W3088916145 hasConceptScore W3088916145C169258074 @default.
- W3088916145 hasConceptScore W3088916145C173853756 @default.
- W3088916145 hasConceptScore W3088916145C176217482 @default.
- W3088916145 hasConceptScore W3088916145C204321447 @default.
- W3088916145 hasConceptScore W3088916145C21547014 @default.
- W3088916145 hasConceptScore W3088916145C2778755073 @default.
- W3088916145 hasConceptScore W3088916145C2779343474 @default.
- W3088916145 hasConceptScore W3088916145C37736160 @default.
- W3088916145 hasConceptScore W3088916145C41008148 @default.
- W3088916145 hasConceptScore W3088916145C41608201 @default.
- W3088916145 hasConceptScore W3088916145C62520636 @default.
- W3088916145 hasConceptScore W3088916145C86803240 @default.
- W3088916145 hasLocation W30889161451 @default.
- W3088916145 hasOpenAccess W3088916145 @default.
- W3088916145 hasPrimaryLocation W30889161451 @default.
- W3088916145 hasRelatedWork W2553022200 @default.
- W3088916145 hasRelatedWork W2798597814 @default.
- W3088916145 hasRelatedWork W2964316912 @default.
- W3088916145 hasRelatedWork W2971192944 @default.
- W3088916145 hasRelatedWork W2971870866 @default.
- W3088916145 hasRelatedWork W2982807168 @default.
- W3088916145 hasRelatedWork W2983165789 @default.
- W3088916145 hasRelatedWork W3007641164 @default.