Matches in SemOpenAlex for { <https://semopenalex.org/work/W3018444312> ?p ?o ?g. }
- W3018444312 abstract "Large datasets are essential for many NLP tasks. Current publicly available open-domain dialogue datasets offer a trade-off between size and quality (e.g. DailyDialog vs. Opensubtitles). We aim to close this gap by building a high-quality dataset consisting of 14.8M utterances in English. We extract and process dialogues from publicly available online books. We present a detailed description of our pipeline and heuristics and an error analysis of extracted dialogues. Better response quality can be achieved in zero-shot and finetuning settings by training on our data than on the larger but much noisier Opensubtitles dataset. Researchers can easily build their versions of the dataset by adjusting various trade-off parameters. The code can be extended to further languages with limited effort (this https URL)." @default.
- W3018444312 created "2020-05-01" @default.
- W3018444312 creator A5016775285 @default.
- W3018444312 creator A5029370035 @default.
- W3018444312 date "2020-04-27" @default.
- W3018444312 modified "2023-09-28" @default.
- W3018444312 title "The Gutenberg Dialogue Dataset." @default.
- W3018444312 cites W1522301498 @default.
- W3018444312 cites W2101105183 @default.
- W3018444312 cites W2125320996 @default.
- W3018444312 cites W2166637769 @default.
- W3018444312 cites W2328886022 @default.
- W3018444312 cites W2581637843 @default.
- W3018444312 cites W2759361123 @default.
- W3018444312 cites W2783549597 @default.
- W3018444312 cites W2884970917 @default.
- W3018444312 cites W2889581899 @default.
- W3018444312 cites W2890276793 @default.
- W3018444312 cites W2891744372 @default.
- W3018444312 cites W2891826200 @default.
- W3018444312 cites W2914204778 @default.
- W3018444312 cites W2950142196 @default.
- W3018444312 cites W2951990924 @default.
- W3018444312 cites W2962767298 @default.
- W3018444312 cites W2962821719 @default.
- W3018444312 cites W2963050684 @default.
- W3018444312 cites W2963206148 @default.
- W3018444312 cites W2963341956 @default.
- W3018444312 cites W2963403868 @default.
- W3018444312 cites W2963475460 @default.
- W3018444312 cites W2963527228 @default.
- W3018444312 cites W2963544536 @default.
- W3018444312 cites W2963790827 @default.
- W3018444312 cites W2963825865 @default.
- W3018444312 cites W2964110616 @default.
- W3018444312 cites W2964134121 @default.
- W3018444312 cites W2964178377 @default.
- W3018444312 cites W2964352131 @default.
- W3018444312 cites W2969389652 @default.
- W3018444312 cites W2970597249 @default.
- W3018444312 cites W2972437240 @default.
- W3018444312 cites W2996287690 @default.
- W3018444312 cites W3000779003 @default.
- W3018444312 cites W3023786569 @default.
- W3018444312 cites W3037026762 @default.
- W3018444312 cites W630532510 @default.
- W3018444312 hasPublicationYear "2020" @default.
- W3018444312 type Work @default.
- W3018444312 sameAs 3018444312 @default.
- W3018444312 citedByCount "1" @default.
- W3018444312 countsByYear W30184443122021 @default.
- W3018444312 crossrefType "posted-content" @default.
- W3018444312 hasAuthorship W3018444312A5016775285 @default.
- W3018444312 hasAuthorship W3018444312A5029370035 @default.
- W3018444312 hasConcept C111472728 @default.
- W3018444312 hasConcept C111919701 @default.
- W3018444312 hasConcept C119857082 @default.
- W3018444312 hasConcept C124101348 @default.
- W3018444312 hasConcept C127705205 @default.
- W3018444312 hasConcept C134306372 @default.
- W3018444312 hasConcept C138885662 @default.
- W3018444312 hasConcept C154945302 @default.
- W3018444312 hasConcept C177264268 @default.
- W3018444312 hasConcept C199360897 @default.
- W3018444312 hasConcept C204321447 @default.
- W3018444312 hasConcept C23123220 @default.
- W3018444312 hasConcept C2776760102 @default.
- W3018444312 hasConcept C2779530757 @default.
- W3018444312 hasConcept C33923547 @default.
- W3018444312 hasConcept C36503486 @default.
- W3018444312 hasConcept C41008148 @default.
- W3018444312 hasConcept C43521106 @default.
- W3018444312 hasConcept C98045186 @default.
- W3018444312 hasConceptScore W3018444312C111472728 @default.
- W3018444312 hasConceptScore W3018444312C111919701 @default.
- W3018444312 hasConceptScore W3018444312C119857082 @default.
- W3018444312 hasConceptScore W3018444312C124101348 @default.
- W3018444312 hasConceptScore W3018444312C127705205 @default.
- W3018444312 hasConceptScore W3018444312C134306372 @default.
- W3018444312 hasConceptScore W3018444312C138885662 @default.
- W3018444312 hasConceptScore W3018444312C154945302 @default.
- W3018444312 hasConceptScore W3018444312C177264268 @default.
- W3018444312 hasConceptScore W3018444312C199360897 @default.
- W3018444312 hasConceptScore W3018444312C204321447 @default.
- W3018444312 hasConceptScore W3018444312C23123220 @default.
- W3018444312 hasConceptScore W3018444312C2776760102 @default.
- W3018444312 hasConceptScore W3018444312C2779530757 @default.
- W3018444312 hasConceptScore W3018444312C33923547 @default.
- W3018444312 hasConceptScore W3018444312C36503486 @default.
- W3018444312 hasConceptScore W3018444312C41008148 @default.
- W3018444312 hasConceptScore W3018444312C43521106 @default.
- W3018444312 hasConceptScore W3018444312C98045186 @default.
- W3018444312 hasLocation W30184443121 @default.
- W3018444312 hasOpenAccess W3018444312 @default.
- W3018444312 hasPrimaryLocation W30184443121 @default.
- W3018444312 hasRelatedWork W2180008620 @default.
- W3018444312 hasRelatedWork W2318294835 @default.
- W3018444312 hasRelatedWork W2742353928 @default.
- W3018444312 hasRelatedWork W2761590056 @default.
- W3018444312 hasRelatedWork W2928893288 @default.