Matches in SemOpenAlex for { <https://semopenalex.org/work/W2988703099> ?p ?o ?g. }
Showing items 1 to 90 of
90
with 100 items per page.
- W2988703099 abstract "Off-policy deep reinforcement learning (RL) algorithms are incapable of learning solely from batch offline data without online interactions with the environment, due to the phenomenon known as textit{extrapolation error}. This is often due to past data available in the replay buffer that may be quite different from the data distribution under the current policy. We argue that most off-policy learning methods fundamentally suffer from a textit{state distribution shift} due to the mismatch between the state visitation distribution of the data collected by the behavior and target policies. This data distribution shift between current and past samples can significantly impact the performance of most modern off-policy based policy optimization algorithms. In this work, we first do a systematic analysis of state distribution mismatch in off-policy learning, and then develop a novel off-policy policy optimization method to constraint the state distribution shift. To do this, we first estimate the state distribution based on features of the state, using a density estimator and then develop a novel constrained off-policy gradient objective that minimizes the state distribution shift. Our experimental results on continuous control tasks show that minimizing this distribution mismatch can significantly improve performance in most popular practical off-policy policy gradient algorithms." @default.
- W2988703099 created "2019-11-22" @default.
- W2988703099 creator A5024575600 @default.
- W2988703099 creator A5048667820 @default.
- W2988703099 creator A5079926596 @default.
- W2988703099 date "2019-11-16" @default.
- W2988703099 modified "2023-09-27" @default.
- W2988703099 title "Off-Policy Policy Gradient Algorithms by Constraining the State Distribution Shift." @default.
- W2988703099 cites W1514587017 @default.
- W2988703099 cites W1771410628 @default.
- W2988703099 cites W1959608418 @default.
- W2988703099 cites W2155027007 @default.
- W2988703099 cites W2158782408 @default.
- W2988703099 cites W2165150801 @default.
- W2988703099 cites W2173248099 @default.
- W2988703099 cites W2592754049 @default.
- W2988703099 cites W2904453761 @default.
- W2988703099 cites W2936304709 @default.
- W2988703099 cites W2949608212 @default.
- W2988703099 cites W2952594289 @default.
- W2988703099 cites W2953334758 @default.
- W2988703099 cites W2962902376 @default.
- W2988703099 cites W2963477884 @default.
- W2988703099 cites W2963744705 @default.
- W2988703099 cites W2963923407 @default.
- W2988703099 hasPublicationYear "2019" @default.
- W2988703099 type Work @default.
- W2988703099 sameAs 2988703099 @default.
- W2988703099 citedByCount "0" @default.
- W2988703099 crossrefType "posted-content" @default.
- W2988703099 hasAuthorship W2988703099A5024575600 @default.
- W2988703099 hasAuthorship W2988703099A5048667820 @default.
- W2988703099 hasAuthorship W2988703099A5079926596 @default.
- W2988703099 hasConcept C105795698 @default.
- W2988703099 hasConcept C110121322 @default.
- W2988703099 hasConcept C11413529 @default.
- W2988703099 hasConcept C119857082 @default.
- W2988703099 hasConcept C126255220 @default.
- W2988703099 hasConcept C132459708 @default.
- W2988703099 hasConcept C134306372 @default.
- W2988703099 hasConcept C154945302 @default.
- W2988703099 hasConcept C185429906 @default.
- W2988703099 hasConcept C2524010 @default.
- W2988703099 hasConcept C2776036281 @default.
- W2988703099 hasConcept C33923547 @default.
- W2988703099 hasConcept C41008148 @default.
- W2988703099 hasConcept C48103436 @default.
- W2988703099 hasConcept C97541855 @default.
- W2988703099 hasConceptScore W2988703099C105795698 @default.
- W2988703099 hasConceptScore W2988703099C110121322 @default.
- W2988703099 hasConceptScore W2988703099C11413529 @default.
- W2988703099 hasConceptScore W2988703099C119857082 @default.
- W2988703099 hasConceptScore W2988703099C126255220 @default.
- W2988703099 hasConceptScore W2988703099C132459708 @default.
- W2988703099 hasConceptScore W2988703099C134306372 @default.
- W2988703099 hasConceptScore W2988703099C154945302 @default.
- W2988703099 hasConceptScore W2988703099C185429906 @default.
- W2988703099 hasConceptScore W2988703099C2524010 @default.
- W2988703099 hasConceptScore W2988703099C2776036281 @default.
- W2988703099 hasConceptScore W2988703099C33923547 @default.
- W2988703099 hasConceptScore W2988703099C41008148 @default.
- W2988703099 hasConceptScore W2988703099C48103436 @default.
- W2988703099 hasConceptScore W2988703099C97541855 @default.
- W2988703099 hasLocation W29887030991 @default.
- W2988703099 hasOpenAccess W2988703099 @default.
- W2988703099 hasPrimaryLocation W29887030991 @default.
- W2988703099 hasRelatedWork W2145288976 @default.
- W2988703099 hasRelatedWork W2444691181 @default.
- W2988703099 hasRelatedWork W2751325639 @default.
- W2988703099 hasRelatedWork W2763287778 @default.
- W2988703099 hasRelatedWork W2893470574 @default.
- W2988703099 hasRelatedWork W2950849043 @default.
- W2988703099 hasRelatedWork W2951713996 @default.
- W2988703099 hasRelatedWork W2964068481 @default.
- W2988703099 hasRelatedWork W2970954742 @default.
- W2988703099 hasRelatedWork W2979895333 @default.
- W2988703099 hasRelatedWork W2986781835 @default.
- W2988703099 hasRelatedWork W3006151906 @default.
- W2988703099 hasRelatedWork W3009509472 @default.
- W2988703099 hasRelatedWork W3033120977 @default.
- W2988703099 hasRelatedWork W3046758693 @default.
- W2988703099 hasRelatedWork W3085501055 @default.
- W2988703099 hasRelatedWork W3095986187 @default.
- W2988703099 hasRelatedWork W3124201714 @default.
- W2988703099 hasRelatedWork W3159360018 @default.
- W2988703099 hasRelatedWork W3212781603 @default.
- W2988703099 isParatext "false" @default.
- W2988703099 isRetracted "false" @default.
- W2988703099 magId "2988703099" @default.
- W2988703099 workType "article" @default.