Matches in SemOpenAlex for { <https://semopenalex.org/work/W4366736145> ?p ?o ?g. }
Showing items 1 to 57 of
57
with 100 items per page.
- W4366736145 abstract "Deep reinforcement learning algorithms that learn policies by trial-and-error must learn from limited amounts of data collected by actively interacting with the environment. While many prior works have shown that proper regularization techniques are crucial for enabling data-efficient RL, a general understanding of the bottlenecks in data-efficient RL has remained unclear. Consequently, it has been difficult to devise a universal technique that works well across all domains. In this paper, we attempt to understand the primary bottleneck in sample-efficient deep RL by examining several potential hypotheses such as non-stationarity, excessive action distribution shift, and overfitting. We perform thorough empirical analysis on state-based DeepMind control suite (DMC) tasks in a controlled and systematic way to show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms, and prior methods that lead to good performance do in fact, control the validation TD error to be low. This observation gives us a robust principle for making deep RL efficient: we can hill-climb on the validation TD error by utilizing any form of regularization techniques from supervised learning. We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks." @default.
- W4366736145 created "2023-04-24" @default.
- W4366736145 creator A5026322200 @default.
- W4366736145 creator A5047802575 @default.
- W4366736145 creator A5069290071 @default.
- W4366736145 creator A5086657309 @default.
- W4366736145 date "2023-04-20" @default.
- W4366736145 modified "2023-10-01" @default.
- W4366736145 title "Efficient Deep Reinforcement Learning Requires Regulating Overfitting" @default.
- W4366736145 doi "https://doi.org/10.48550/arxiv.2304.10466" @default.
- W4366736145 hasPublicationYear "2023" @default.
- W4366736145 type Work @default.
- W4366736145 citedByCount "0" @default.
- W4366736145 crossrefType "posted-content" @default.
- W4366736145 hasAuthorship W4366736145A5026322200 @default.
- W4366736145 hasAuthorship W4366736145A5047802575 @default.
- W4366736145 hasAuthorship W4366736145A5069290071 @default.
- W4366736145 hasAuthorship W4366736145A5086657309 @default.
- W4366736145 hasBestOaLocation W43667361451 @default.
- W4366736145 hasConcept C108583219 @default.
- W4366736145 hasConcept C119857082 @default.
- W4366736145 hasConcept C149635348 @default.
- W4366736145 hasConcept C154945302 @default.
- W4366736145 hasConcept C22019652 @default.
- W4366736145 hasConcept C2776135515 @default.
- W4366736145 hasConcept C2780513914 @default.
- W4366736145 hasConcept C41008148 @default.
- W4366736145 hasConcept C50644808 @default.
- W4366736145 hasConcept C5465570 @default.
- W4366736145 hasConcept C97541855 @default.
- W4366736145 hasConceptScore W4366736145C108583219 @default.
- W4366736145 hasConceptScore W4366736145C119857082 @default.
- W4366736145 hasConceptScore W4366736145C149635348 @default.
- W4366736145 hasConceptScore W4366736145C154945302 @default.
- W4366736145 hasConceptScore W4366736145C22019652 @default.
- W4366736145 hasConceptScore W4366736145C2776135515 @default.
- W4366736145 hasConceptScore W4366736145C2780513914 @default.
- W4366736145 hasConceptScore W4366736145C41008148 @default.
- W4366736145 hasConceptScore W4366736145C50644808 @default.
- W4366736145 hasConceptScore W4366736145C5465570 @default.
- W4366736145 hasConceptScore W4366736145C97541855 @default.
- W4366736145 hasLocation W43667361451 @default.
- W4366736145 hasOpenAccess W4366736145 @default.
- W4366736145 hasPrimaryLocation W43667361451 @default.
- W4366736145 hasRelatedWork W2963680188 @default.
- W4366736145 hasRelatedWork W3018907748 @default.
- W4366736145 hasRelatedWork W3041434171 @default.
- W4366736145 hasRelatedWork W3099765033 @default.
- W4366736145 hasRelatedWork W3186840088 @default.
- W4366736145 hasRelatedWork W3186919929 @default.
- W4366736145 hasRelatedWork W4287064118 @default.
- W4366736145 hasRelatedWork W4287725140 @default.
- W4366736145 hasRelatedWork W4361732492 @default.
- W4366736145 hasRelatedWork W4362499066 @default.
- W4366736145 isParatext "false" @default.
- W4366736145 isRetracted "false" @default.
- W4366736145 workType "article" @default.