Matches in SemOpenAlex for { <https://semopenalex.org/work/W3045080532> ?p ?o ?g. }
- W3045080532 abstract "Distributional reinforcement learning (RL) has achieved state-of-the-art performance in Atari games by recasting the traditional RL into a distribution estimation problem, explicitly estimating the probability distribution instead of the expectation of a total return. The bottleneck in distributional RL lies in the estimation of this distribution where one must resort to an approximate representation of the return distributions which are infinite-dimensional. Most existing methods focus on learning a set of predefined statistic functionals of the return distributions requiring involved projections to maintain the order statistics. We take a different perspective using deterministic sampling wherein we approximate the return distributions with a set of deterministic particles that are not attached to any predefined statistic functional, allowing us to freely approximate the return distributions. The learning is then interpreted as evolution of these particles so that a distance between the return distribution and its target distribution is minimized. This learning aim is realized via maximum mean discrepancy (MMD) distance which in turn leads to a simpler loss amenable to backpropagation. Experiments on the suite of Atari 2600 games show that our algorithm outperforms the standard distributional RL baselines and sets a new record in the Atari games for non-distributed agents." @default.
- W3045080532 created "2020-07-29" @default.
- W3045080532 creator A5011012522 @default.
- W3045080532 creator A5027440196 @default.
- W3045080532 creator A5045540854 @default.
- W3045080532 date "2020-07-24" @default.
- W3045080532 modified "2023-09-26" @default.
- W3045080532 title "Distributional Reinforcement Learning with Maximum Mean Discrepancy." @default.
- W3045080532 cites W1458771408 @default.
- W3045080532 cites W1710476689 @default.
- W3045080532 cites W1946137962 @default.
- W3045080532 cites W1992154527 @default.
- W3045080532 cites W2033436114 @default.
- W3045080532 cites W2100600008 @default.
- W3045080532 cites W2100677568 @default.
- W3045080532 cites W2104128541 @default.
- W3045080532 cites W2111181349 @default.
- W3045080532 cites W2119567691 @default.
- W3045080532 cites W2145339207 @default.
- W3045080532 cites W2155968351 @default.
- W3045080532 cites W2212660284 @default.
- W3045080532 cites W2313791856 @default.
- W3045080532 cites W2341171179 @default.
- W3045080532 cites W2511837229 @default.
- W3045080532 cites W2619903301 @default.
- W3045080532 cites W2765302304 @default.
- W3045080532 cites W2803308811 @default.
- W3045080532 cites W2905342215 @default.
- W3045080532 cites W2949374561 @default.
- W3045080532 cites W2950292946 @default.
- W3045080532 cites W2951338616 @default.
- W3045080532 cites W2952023104 @default.
- W3045080532 cites W2953318193 @default.
- W3045080532 cites W2963423916 @default.
- W3045080532 cites W2963477884 @default.
- W3045080532 cites W2964121744 @default.
- W3045080532 cites W2968113041 @default.
- W3045080532 cites W2970036354 @default.
- W3045080532 cites W3139377883 @default.
- W3045080532 cites W2951873965 @default.
- W3045080532 hasPublicationYear "2020" @default.
- W3045080532 type Work @default.
- W3045080532 sameAs 3045080532 @default.
- W3045080532 citedByCount "6" @default.
- W3045080532 countsByYear W30450805322021 @default.
- W3045080532 crossrefType "posted-content" @default.
- W3045080532 hasAuthorship W3045080532A5011012522 @default.
- W3045080532 hasAuthorship W3045080532A5027440196 @default.
- W3045080532 hasAuthorship W3045080532A5045540854 @default.
- W3045080532 hasConcept C105795698 @default.
- W3045080532 hasConcept C126255220 @default.
- W3045080532 hasConcept C149635348 @default.
- W3045080532 hasConcept C154945302 @default.
- W3045080532 hasConcept C177264268 @default.
- W3045080532 hasConcept C17744445 @default.
- W3045080532 hasConcept C199360897 @default.
- W3045080532 hasConcept C199539241 @default.
- W3045080532 hasConcept C2776359362 @default.
- W3045080532 hasConcept C2780513914 @default.
- W3045080532 hasConcept C33923547 @default.
- W3045080532 hasConcept C41008148 @default.
- W3045080532 hasConcept C89128539 @default.
- W3045080532 hasConcept C94625758 @default.
- W3045080532 hasConcept C97541855 @default.
- W3045080532 hasConceptScore W3045080532C105795698 @default.
- W3045080532 hasConceptScore W3045080532C126255220 @default.
- W3045080532 hasConceptScore W3045080532C149635348 @default.
- W3045080532 hasConceptScore W3045080532C154945302 @default.
- W3045080532 hasConceptScore W3045080532C177264268 @default.
- W3045080532 hasConceptScore W3045080532C17744445 @default.
- W3045080532 hasConceptScore W3045080532C199360897 @default.
- W3045080532 hasConceptScore W3045080532C199539241 @default.
- W3045080532 hasConceptScore W3045080532C2776359362 @default.
- W3045080532 hasConceptScore W3045080532C2780513914 @default.
- W3045080532 hasConceptScore W3045080532C33923547 @default.
- W3045080532 hasConceptScore W3045080532C41008148 @default.
- W3045080532 hasConceptScore W3045080532C89128539 @default.
- W3045080532 hasConceptScore W3045080532C94625758 @default.
- W3045080532 hasConceptScore W3045080532C97541855 @default.
- W3045080532 hasLocation W30450805321 @default.
- W3045080532 hasOpenAccess W3045080532 @default.
- W3045080532 hasPrimaryLocation W30450805321 @default.
- W3045080532 hasRelatedWork W1550698229 @default.
- W3045080532 hasRelatedWork W1552704443 @default.
- W3045080532 hasRelatedWork W15601695 @default.
- W3045080532 hasRelatedWork W1991548885 @default.
- W3045080532 hasRelatedWork W2153846128 @default.
- W3045080532 hasRelatedWork W2339451794 @default.
- W3045080532 hasRelatedWork W2484199440 @default.
- W3045080532 hasRelatedWork W2548641835 @default.
- W3045080532 hasRelatedWork W2565272766 @default.
- W3045080532 hasRelatedWork W2807557496 @default.
- W3045080532 hasRelatedWork W2963423916 @default.
- W3045080532 hasRelatedWork W2970036354 @default.
- W3045080532 hasRelatedWork W2987414108 @default.
- W3045080532 hasRelatedWork W3013875875 @default.
- W3045080532 hasRelatedWork W3047914308 @default.
- W3045080532 hasRelatedWork W3174145161 @default.
- W3045080532 hasRelatedWork W3174436986 @default.
- W3045080532 hasRelatedWork W3180887884 @default.