Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287183716> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4287183716 abstract "We study the problem of model-free reinforcement learning, which is often solved following the principle of Generalized Policy Iteration (GPI). While GPI is typically an interplay between policy evaluation and policy improvement, most conventional model-free methods assume the independence of the granularity and other details of the GPI steps, despite of the inherent connections between them. In this paper, we present a method that regularizes the inconsistency between policy evaluation and policy improvement, leading to a conflict averse GPI solution with reduced functional approximation error. To this end, we formulate a novel learning paradigm where taking the policy evaluation step is equivalent to some compensation of performing policy improvement, and thus effectively alleviates the gradient conflict between the two GPI steps. We also show that the form of our proposed solution is equivalent to performing entropy-regularized policy improvement and therefore prevents the policy from being trapped into suboptimal solutions. We conduct extensive experiments to evaluate our method on the Arcade Learning Environment (ALE). Empirical results show that our method outperforms several strong baselines in major evaluation domains." @default.
- W4287183716 created "2022-07-25" @default.
- W4287183716 creator A5043460546 @default.
- W4287183716 creator A5044410512 @default.
- W4287183716 creator A5079541585 @default.
- W4287183716 creator A5082814141 @default.
- W4287183716 creator A5085317869 @default.
- W4287183716 date "2021-05-09" @default.
- W4287183716 modified "2023-09-28" @default.
- W4287183716 title "CASA: Bridging the Gap between Policy Improvement and Policy Evaluation with Conflict Averse Policy Iteration" @default.
- W4287183716 doi "https://doi.org/10.48550/arxiv.2105.03923" @default.
- W4287183716 hasPublicationYear "2021" @default.
- W4287183716 type Work @default.
- W4287183716 citedByCount "0" @default.
- W4287183716 crossrefType "posted-content" @default.
- W4287183716 hasAuthorship W4287183716A5043460546 @default.
- W4287183716 hasAuthorship W4287183716A5044410512 @default.
- W4287183716 hasAuthorship W4287183716A5079541585 @default.
- W4287183716 hasAuthorship W4287183716A5082814141 @default.
- W4287183716 hasAuthorship W4287183716A5085317869 @default.
- W4287183716 hasBestOaLocation W42871837161 @default.
- W4287183716 hasConcept C105795698 @default.
- W4287183716 hasConcept C106301342 @default.
- W4287183716 hasConcept C111919701 @default.
- W4287183716 hasConcept C119857082 @default.
- W4287183716 hasConcept C121332964 @default.
- W4287183716 hasConcept C126255220 @default.
- W4287183716 hasConcept C154945302 @default.
- W4287183716 hasConcept C162324750 @default.
- W4287183716 hasConcept C174348530 @default.
- W4287183716 hasConcept C177774035 @default.
- W4287183716 hasConcept C21547014 @default.
- W4287183716 hasConcept C2778915421 @default.
- W4287183716 hasConcept C2779436431 @default.
- W4287183716 hasConcept C31258907 @default.
- W4287183716 hasConcept C33923547 @default.
- W4287183716 hasConcept C35651441 @default.
- W4287183716 hasConcept C41008148 @default.
- W4287183716 hasConcept C62520636 @default.
- W4287183716 hasConcept C97541855 @default.
- W4287183716 hasConceptScore W4287183716C105795698 @default.
- W4287183716 hasConceptScore W4287183716C106301342 @default.
- W4287183716 hasConceptScore W4287183716C111919701 @default.
- W4287183716 hasConceptScore W4287183716C119857082 @default.
- W4287183716 hasConceptScore W4287183716C121332964 @default.
- W4287183716 hasConceptScore W4287183716C126255220 @default.
- W4287183716 hasConceptScore W4287183716C154945302 @default.
- W4287183716 hasConceptScore W4287183716C162324750 @default.
- W4287183716 hasConceptScore W4287183716C174348530 @default.
- W4287183716 hasConceptScore W4287183716C177774035 @default.
- W4287183716 hasConceptScore W4287183716C21547014 @default.
- W4287183716 hasConceptScore W4287183716C2778915421 @default.
- W4287183716 hasConceptScore W4287183716C2779436431 @default.
- W4287183716 hasConceptScore W4287183716C31258907 @default.
- W4287183716 hasConceptScore W4287183716C33923547 @default.
- W4287183716 hasConceptScore W4287183716C35651441 @default.
- W4287183716 hasConceptScore W4287183716C41008148 @default.
- W4287183716 hasConceptScore W4287183716C62520636 @default.
- W4287183716 hasConceptScore W4287183716C97541855 @default.
- W4287183716 hasLocation W42871837161 @default.
- W4287183716 hasOpenAccess W4287183716 @default.
- W4287183716 hasPrimaryLocation W42871837161 @default.
- W4287183716 hasRelatedWork W1498070128 @default.
- W4287183716 hasRelatedWork W1565638106 @default.
- W4287183716 hasRelatedWork W1594844924 @default.
- W4287183716 hasRelatedWork W2006341133 @default.
- W4287183716 hasRelatedWork W2315101603 @default.
- W4287183716 hasRelatedWork W2909382770 @default.
- W4287183716 hasRelatedWork W2910166048 @default.
- W4287183716 hasRelatedWork W3040662175 @default.
- W4287183716 hasRelatedWork W4210912933 @default.
- W4287183716 hasRelatedWork W4287183716 @default.
- W4287183716 isParatext "false" @default.
- W4287183716 isRetracted "false" @default.
- W4287183716 workType "article" @default.