Matches in SemOpenAlex for { <https://semopenalex.org/work/W3098985263> ?p ?o ?g. }
Showing items 1 to 86 of
86
with 100 items per page.
- W3098985263 endingPage "3021" @default.
- W3098985263 startingPage "3008" @default.
- W3098985263 abstract "As language models become more powerful, training and evaluation are increasingly bottlenecked by the data and metrics used for a particular task. For example, summarization models are often trained to predict human reference summaries and evaluated using ROUGE, but both of these metrics are rough proxies for what we really care about---summary quality. In this work, we show that it is possible to significantly improve summary quality by training a model to optimize for human preferences. We collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. We apply our method to a version of the TL;DR dataset of Reddit posts and find that our models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Our models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. We conduct extensive analyses to understand our human feedback dataset and fine-tuned models We establish that our reward model generalizes to new datasets, and that optimizing our reward model results in better summaries than optimizing ROUGE according to humans. We hope the evidence from our paper motivates machine learning researchers to pay closer attention to how their training loss affects the model behavior they actually want." @default.
- W3098985263 created "2020-11-23" @default.
- W3098985263 creator A5004295653 @default.
- W3098985263 creator A5019575601 @default.
- W3098985263 creator A5037736834 @default.
- W3098985263 creator A5051250767 @default.
- W3098985263 creator A5066197394 @default.
- W3098985263 creator A5068949174 @default.
- W3098985263 creator A5081413425 @default.
- W3098985263 creator A5089747113 @default.
- W3098985263 creator A5051743369 @default.
- W3098985263 date "2020-09-02" @default.
- W3098985263 modified "2023-09-29" @default.
- W3098985263 title "Learning to summarize from human feedback" @default.
- W3098985263 hasPublicationYear "2020" @default.
- W3098985263 type Work @default.
- W3098985263 sameAs 3098985263 @default.
- W3098985263 citedByCount "14" @default.
- W3098985263 countsByYear W30989852632020 @default.
- W3098985263 countsByYear W30989852632021 @default.
- W3098985263 crossrefType "proceedings-article" @default.
- W3098985263 hasAuthorship W3098985263A5004295653 @default.
- W3098985263 hasAuthorship W3098985263A5019575601 @default.
- W3098985263 hasAuthorship W3098985263A5037736834 @default.
- W3098985263 hasAuthorship W3098985263A5051250767 @default.
- W3098985263 hasAuthorship W3098985263A5051743369 @default.
- W3098985263 hasAuthorship W3098985263A5066197394 @default.
- W3098985263 hasAuthorship W3098985263A5068949174 @default.
- W3098985263 hasAuthorship W3098985263A5081413425 @default.
- W3098985263 hasAuthorship W3098985263A5089747113 @default.
- W3098985263 hasConcept C111472728 @default.
- W3098985263 hasConcept C119857082 @default.
- W3098985263 hasConcept C137293760 @default.
- W3098985263 hasConcept C138885662 @default.
- W3098985263 hasConcept C150899416 @default.
- W3098985263 hasConcept C154945302 @default.
- W3098985263 hasConcept C162324750 @default.
- W3098985263 hasConcept C170858558 @default.
- W3098985263 hasConcept C187736073 @default.
- W3098985263 hasConcept C2779530757 @default.
- W3098985263 hasConcept C2780451532 @default.
- W3098985263 hasConcept C41008148 @default.
- W3098985263 hasConcept C97541855 @default.
- W3098985263 hasConceptScore W3098985263C111472728 @default.
- W3098985263 hasConceptScore W3098985263C119857082 @default.
- W3098985263 hasConceptScore W3098985263C137293760 @default.
- W3098985263 hasConceptScore W3098985263C138885662 @default.
- W3098985263 hasConceptScore W3098985263C150899416 @default.
- W3098985263 hasConceptScore W3098985263C154945302 @default.
- W3098985263 hasConceptScore W3098985263C162324750 @default.
- W3098985263 hasConceptScore W3098985263C170858558 @default.
- W3098985263 hasConceptScore W3098985263C187736073 @default.
- W3098985263 hasConceptScore W3098985263C2779530757 @default.
- W3098985263 hasConceptScore W3098985263C2780451532 @default.
- W3098985263 hasConceptScore W3098985263C41008148 @default.
- W3098985263 hasConceptScore W3098985263C97541855 @default.
- W3098985263 hasLocation W30989852631 @default.
- W3098985263 hasOpenAccess W3098985263 @default.
- W3098985263 hasPrimaryLocation W30989852631 @default.
- W3098985263 hasRelatedWork W200488758 @default.
- W3098985263 hasRelatedWork W2419451029 @default.
- W3098985263 hasRelatedWork W2951574953 @default.
- W3098985263 hasRelatedWork W2952059673 @default.
- W3098985263 hasRelatedWork W2955013659 @default.
- W3098985263 hasRelatedWork W2963820331 @default.
- W3098985263 hasRelatedWork W2965373594 @default.
- W3098985263 hasRelatedWork W2970892365 @default.
- W3098985263 hasRelatedWork W2972415475 @default.
- W3098985263 hasRelatedWork W3046127589 @default.
- W3098985263 hasRelatedWork W3082115681 @default.
- W3098985263 hasRelatedWork W3084850341 @default.
- W3098985263 hasRelatedWork W3087340488 @default.
- W3098985263 hasRelatedWork W3125020117 @default.
- W3098985263 hasRelatedWork W3152682942 @default.
- W3098985263 hasRelatedWork W3157855984 @default.
- W3098985263 hasRelatedWork W3167695799 @default.
- W3098985263 hasRelatedWork W3176960401 @default.
- W3098985263 hasRelatedWork W3193171773 @default.
- W3098985263 hasRelatedWork W3213754662 @default.
- W3098985263 hasVolume "33" @default.
- W3098985263 isParatext "false" @default.
- W3098985263 isRetracted "false" @default.
- W3098985263 magId "3098985263" @default.
- W3098985263 workType "article" @default.