Matches in SemOpenAlex for { <https://semopenalex.org/work/W2947037594> ?p ?o ?g. }
- W2947037594 abstract "Standard reinforcement learning methods aim to master one way of solving a task whereas there may exist multiple near-optimal policies. Being able to identify this collection of near-optimal policies can allow a domain expert to efficiently explore the space of reasonable solutions. Unfortunately, existing approaches that quantify uncertainty over policies are not ultimately relevant to finding policies with qualitatively distinct behaviors. In this work, we formalize the difference between policies as a difference between the distribution of trajectories induced by each policy, which encourages diversity with respect to both state visitation and action choices. We derive a gradient-based optimization technique that can be combined with existing policy gradient methods to now identify diverse collections of well-performing policies. We demonstrate our approach on benchmarks and a healthcare task." @default.
- W2947037594 created "2019-06-07" @default.
- W2947037594 creator A5031617463 @default.
- W2947037594 creator A5038771285 @default.
- W2947037594 date "2019-05-31" @default.
- W2947037594 modified "2023-09-23" @default.
- W2947037594 title "Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies" @default.
- W2947037594 cites W1898928487 @default.
- W2947037594 cites W192920577 @default.
- W2947037594 cites W1987725948 @default.
- W2947037594 cites W2011861268 @default.
- W2947037594 cites W2104733512 @default.
- W2947037594 cites W2110097068 @default.
- W2947037594 cites W2121506959 @default.
- W2947037594 cites W2125865219 @default.
- W2947037594 cites W2128349740 @default.
- W2947037594 cites W2151083897 @default.
- W2947037594 cites W2153484303 @default.
- W2947037594 cites W2155027007 @default.
- W2947037594 cites W2159272820 @default.
- W2947037594 cites W2575572349 @default.
- W2947037594 cites W2594103415 @default.
- W2947037594 cites W2604373826 @default.
- W2947037594 cites W2606757878 @default.
- W2947037594 cites W2736601468 @default.
- W2947037594 cites W2739341730 @default.
- W2947037594 cites W2757504960 @default.
- W2947037594 cites W2785720656 @default.
- W2947037594 cites W2785940258 @default.
- W2947037594 cites W2803839069 @default.
- W2947037594 cites W2896893468 @default.
- W2947037594 cites W2962839548 @default.
- W2947037594 cites W2963400359 @default.
- W2947037594 cites W2963438456 @default.
- W2947037594 cites W2963906781 @default.
- W2947037594 cites W2964174623 @default.
- W2947037594 hasPublicationYear "2019" @default.
- W2947037594 type Work @default.
- W2947037594 sameAs 2947037594 @default.
- W2947037594 citedByCount "6" @default.
- W2947037594 countsByYear W29470375942019 @default.
- W2947037594 countsByYear W29470375942020 @default.
- W2947037594 countsByYear W29470375942021 @default.
- W2947037594 crossrefType "posted-content" @default.
- W2947037594 hasAuthorship W2947037594A5031617463 @default.
- W2947037594 hasAuthorship W2947037594A5038771285 @default.
- W2947037594 hasConcept C110121322 @default.
- W2947037594 hasConcept C111919701 @default.
- W2947037594 hasConcept C126255220 @default.
- W2947037594 hasConcept C127413603 @default.
- W2947037594 hasConcept C134306372 @default.
- W2947037594 hasConcept C154945302 @default.
- W2947037594 hasConcept C162324750 @default.
- W2947037594 hasConcept C177264268 @default.
- W2947037594 hasConcept C17744445 @default.
- W2947037594 hasConcept C18762648 @default.
- W2947037594 hasConcept C187736073 @default.
- W2947037594 hasConcept C199360897 @default.
- W2947037594 hasConcept C199539241 @default.
- W2947037594 hasConcept C2778572836 @default.
- W2947037594 hasConcept C2780451532 @default.
- W2947037594 hasConcept C2781316041 @default.
- W2947037594 hasConcept C33923547 @default.
- W2947037594 hasConcept C36503486 @default.
- W2947037594 hasConcept C41008148 @default.
- W2947037594 hasConcept C78519656 @default.
- W2947037594 hasConcept C97541855 @default.
- W2947037594 hasConceptScore W2947037594C110121322 @default.
- W2947037594 hasConceptScore W2947037594C111919701 @default.
- W2947037594 hasConceptScore W2947037594C126255220 @default.
- W2947037594 hasConceptScore W2947037594C127413603 @default.
- W2947037594 hasConceptScore W2947037594C134306372 @default.
- W2947037594 hasConceptScore W2947037594C154945302 @default.
- W2947037594 hasConceptScore W2947037594C162324750 @default.
- W2947037594 hasConceptScore W2947037594C177264268 @default.
- W2947037594 hasConceptScore W2947037594C17744445 @default.
- W2947037594 hasConceptScore W2947037594C18762648 @default.
- W2947037594 hasConceptScore W2947037594C187736073 @default.
- W2947037594 hasConceptScore W2947037594C199360897 @default.
- W2947037594 hasConceptScore W2947037594C199539241 @default.
- W2947037594 hasConceptScore W2947037594C2778572836 @default.
- W2947037594 hasConceptScore W2947037594C2780451532 @default.
- W2947037594 hasConceptScore W2947037594C2781316041 @default.
- W2947037594 hasConceptScore W2947037594C33923547 @default.
- W2947037594 hasConceptScore W2947037594C36503486 @default.
- W2947037594 hasConceptScore W2947037594C41008148 @default.
- W2947037594 hasConceptScore W2947037594C78519656 @default.
- W2947037594 hasConceptScore W2947037594C97541855 @default.
- W2947037594 hasOpenAccess W2947037594 @default.
- W2947037594 hasRelatedWork W1784395471 @default.
- W2947037594 hasRelatedWork W2000975201 @default.
- W2947037594 hasRelatedWork W2007113996 @default.
- W2947037594 hasRelatedWork W2103098029 @default.
- W2947037594 hasRelatedWork W2620289825 @default.
- W2947037594 hasRelatedWork W2910781732 @default.
- W2947037594 hasRelatedWork W2952561542 @default.
- W2947037594 hasRelatedWork W2964297722 @default.
- W2947037594 hasRelatedWork W2966128956 @default.
- W2947037594 hasRelatedWork W2977099601 @default.
- W2947037594 hasRelatedWork W3081226161 @default.