Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287018575> ?p ?o ?g. }
Showing items 1 to 88 of
88
with 100 items per page.
- W4287018575 abstract "Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field." @default.
- W4287018575 created "2022-07-25" @default.
- W4287018575 creator A5001087292 @default.
- W4287018575 creator A5015580523 @default.
- W4287018575 creator A5031185465 @default.
- W4287018575 creator A5068291173 @default.
- W4287018575 creator A5070953294 @default.
- W4287018575 date "2021-08-30" @default.
- W4287018575 modified "2023-10-07" @default.
- W4287018575 title "Deep Reinforcement Learning at the Edge of the Statistical Precipice" @default.
- W4287018575 doi "https://doi.org/10.48550/arxiv.2108.13264" @default.
- W4287018575 hasPublicationYear "2021" @default.
- W4287018575 type Work @default.
- W4287018575 citedByCount "0" @default.
- W4287018575 crossrefType "posted-content" @default.
- W4287018575 hasAuthorship W4287018575A5001087292 @default.
- W4287018575 hasAuthorship W4287018575A5015580523 @default.
- W4287018575 hasAuthorship W4287018575A5031185465 @default.
- W4287018575 hasAuthorship W4287018575A5068291173 @default.
- W4287018575 hasAuthorship W4287018575A5070953294 @default.
- W4287018575 hasBestOaLocation W42870185751 @default.
- W4287018575 hasConcept C105795698 @default.
- W4287018575 hasConcept C119857082 @default.
- W4287018575 hasConcept C127413603 @default.
- W4287018575 hasConcept C13280743 @default.
- W4287018575 hasConcept C149782125 @default.
- W4287018575 hasConcept C154945302 @default.
- W4287018575 hasConcept C159985019 @default.
- W4287018575 hasConcept C166957645 @default.
- W4287018575 hasConcept C185798385 @default.
- W4287018575 hasConcept C192562407 @default.
- W4287018575 hasConcept C201995342 @default.
- W4287018575 hasConcept C202444582 @default.
- W4287018575 hasConcept C205649164 @default.
- W4287018575 hasConcept C2524010 @default.
- W4287018575 hasConcept C2780451532 @default.
- W4287018575 hasConcept C28719098 @default.
- W4287018575 hasConcept C33923547 @default.
- W4287018575 hasConcept C41008148 @default.
- W4287018575 hasConcept C41426520 @default.
- W4287018575 hasConcept C44249647 @default.
- W4287018575 hasConcept C4679612 @default.
- W4287018575 hasConcept C79581498 @default.
- W4287018575 hasConcept C95457728 @default.
- W4287018575 hasConcept C9652623 @default.
- W4287018575 hasConcept C97541855 @default.
- W4287018575 hasConceptScore W4287018575C105795698 @default.
- W4287018575 hasConceptScore W4287018575C119857082 @default.
- W4287018575 hasConceptScore W4287018575C127413603 @default.
- W4287018575 hasConceptScore W4287018575C13280743 @default.
- W4287018575 hasConceptScore W4287018575C149782125 @default.
- W4287018575 hasConceptScore W4287018575C154945302 @default.
- W4287018575 hasConceptScore W4287018575C159985019 @default.
- W4287018575 hasConceptScore W4287018575C166957645 @default.
- W4287018575 hasConceptScore W4287018575C185798385 @default.
- W4287018575 hasConceptScore W4287018575C192562407 @default.
- W4287018575 hasConceptScore W4287018575C201995342 @default.
- W4287018575 hasConceptScore W4287018575C202444582 @default.
- W4287018575 hasConceptScore W4287018575C205649164 @default.
- W4287018575 hasConceptScore W4287018575C2524010 @default.
- W4287018575 hasConceptScore W4287018575C2780451532 @default.
- W4287018575 hasConceptScore W4287018575C28719098 @default.
- W4287018575 hasConceptScore W4287018575C33923547 @default.
- W4287018575 hasConceptScore W4287018575C41008148 @default.
- W4287018575 hasConceptScore W4287018575C41426520 @default.
- W4287018575 hasConceptScore W4287018575C44249647 @default.
- W4287018575 hasConceptScore W4287018575C4679612 @default.
- W4287018575 hasConceptScore W4287018575C79581498 @default.
- W4287018575 hasConceptScore W4287018575C95457728 @default.
- W4287018575 hasConceptScore W4287018575C9652623 @default.
- W4287018575 hasConceptScore W4287018575C97541855 @default.
- W4287018575 hasLocation W42870185751 @default.
- W4287018575 hasLocation W42870185752 @default.
- W4287018575 hasOpenAccess W4287018575 @default.
- W4287018575 hasPrimaryLocation W42870185751 @default.
- W4287018575 hasRelatedWork W1511772879 @default.
- W4287018575 hasRelatedWork W2042205862 @default.
- W4287018575 hasRelatedWork W2083794993 @default.
- W4287018575 hasRelatedWork W2127898439 @default.
- W4287018575 hasRelatedWork W2186315912 @default.
- W4287018575 hasRelatedWork W2588591308 @default.
- W4287018575 hasRelatedWork W2979471250 @default.
- W4287018575 hasRelatedWork W3170750609 @default.
- W4287018575 hasRelatedWork W3195664246 @default.
- W4287018575 hasRelatedWork W4379115841 @default.
- W4287018575 isParatext "false" @default.
- W4287018575 isRetracted "false" @default.
- W4287018575 workType "article" @default.