Matches in SemOpenAlex for { <https://semopenalex.org/work/W3206369550> ?p ?o ?g. }
Showing items 1 to 82 of
82
with 100 items per page.
- W3206369550 abstract "The recent remarkable progress of deep reinforcement learning (DRL) stands on regularization of policy for stable and efficient learning. A popular method, named proximal policy optimization (PPO), has been introduced for this purpose. PPO clips density ratio of the latest and baseline policies with a threshold, while its minimization target is unclear. As another problem of PPO, the symmetric threshold is given numerically while the density ratio itself is in asymmetric domain, thereby causing unbalanced regularization of the policy. This paper therefore proposes a new variant of PPO by considering a regularization problem of relative Pearson (RPE) divergence, so-called PPO-RPE. This regularization yields the clear minimization target, which constrains the latest policy to the baseline one. Through its analysis, the intuitive threshold-based design consistent with the asymmetry of the threshold and the domain of density ratio can be derived. Through four benchmark tasks, PPO-RPE performed as well as or better than the conventional methods in terms of the task performance by the learned policy." @default.
- W3206369550 created "2021-10-25" @default.
- W3206369550 creator A5051304187 @default.
- W3206369550 date "2021-05-30" @default.
- W3206369550 modified "2023-09-27" @default.
- W3206369550 title "Proximal Policy Optimization with Relative Pearson Divergence" @default.
- W3206369550 cites W1990970321 @default.
- W3206369550 cites W2022365837 @default.
- W3206369550 cites W2101895600 @default.
- W3206369550 cites W2131940723 @default.
- W3206369550 cites W2141559645 @default.
- W3206369550 cites W2145339207 @default.
- W3206369550 cites W2257979135 @default.
- W3206369550 cites W2900582619 @default.
- W3206369550 cites W2919115771 @default.
- W3206369550 cites W2952021385 @default.
- W3206369550 cites W2962834855 @default.
- W3206369550 cites W3080884797 @default.
- W3206369550 cites W3104595455 @default.
- W3206369550 cites W3111951952 @default.
- W3206369550 doi "https://doi.org/10.1109/icra48506.2021.9560856" @default.
- W3206369550 hasPublicationYear "2021" @default.
- W3206369550 type Work @default.
- W3206369550 sameAs 3206369550 @default.
- W3206369550 citedByCount "5" @default.
- W3206369550 countsByYear W32063695502022 @default.
- W3206369550 countsByYear W32063695502023 @default.
- W3206369550 crossrefType "proceedings-article" @default.
- W3206369550 hasAuthorship W3206369550A5051304187 @default.
- W3206369550 hasBestOaLocation W32063695502 @default.
- W3206369550 hasConcept C11413529 @default.
- W3206369550 hasConcept C121332964 @default.
- W3206369550 hasConcept C126255220 @default.
- W3206369550 hasConcept C13280743 @default.
- W3206369550 hasConcept C138885662 @default.
- W3206369550 hasConcept C147764199 @default.
- W3206369550 hasConcept C154945302 @default.
- W3206369550 hasConcept C185798385 @default.
- W3206369550 hasConcept C205649164 @default.
- W3206369550 hasConcept C207390915 @default.
- W3206369550 hasConcept C2776135515 @default.
- W3206369550 hasConcept C33923547 @default.
- W3206369550 hasConcept C38976095 @default.
- W3206369550 hasConcept C41008148 @default.
- W3206369550 hasConcept C41895202 @default.
- W3206369550 hasConcept C62520636 @default.
- W3206369550 hasConcept C97541855 @default.
- W3206369550 hasConceptScore W3206369550C11413529 @default.
- W3206369550 hasConceptScore W3206369550C121332964 @default.
- W3206369550 hasConceptScore W3206369550C126255220 @default.
- W3206369550 hasConceptScore W3206369550C13280743 @default.
- W3206369550 hasConceptScore W3206369550C138885662 @default.
- W3206369550 hasConceptScore W3206369550C147764199 @default.
- W3206369550 hasConceptScore W3206369550C154945302 @default.
- W3206369550 hasConceptScore W3206369550C185798385 @default.
- W3206369550 hasConceptScore W3206369550C205649164 @default.
- W3206369550 hasConceptScore W3206369550C207390915 @default.
- W3206369550 hasConceptScore W3206369550C2776135515 @default.
- W3206369550 hasConceptScore W3206369550C33923547 @default.
- W3206369550 hasConceptScore W3206369550C38976095 @default.
- W3206369550 hasConceptScore W3206369550C41008148 @default.
- W3206369550 hasConceptScore W3206369550C41895202 @default.
- W3206369550 hasConceptScore W3206369550C62520636 @default.
- W3206369550 hasConceptScore W3206369550C97541855 @default.
- W3206369550 hasLocation W32063695501 @default.
- W3206369550 hasLocation W32063695502 @default.
- W3206369550 hasOpenAccess W3206369550 @default.
- W3206369550 hasPrimaryLocation W32063695501 @default.
- W3206369550 hasRelatedWork W188948664 @default.
- W3206369550 hasRelatedWork W2001828740 @default.
- W3206369550 hasRelatedWork W2113068060 @default.
- W3206369550 hasRelatedWork W2973285469 @default.
- W3206369550 hasRelatedWork W2997970126 @default.
- W3206369550 hasRelatedWork W3009509472 @default.
- W3206369550 hasRelatedWork W3089548822 @default.
- W3206369550 hasRelatedWork W3115948200 @default.
- W3206369550 hasRelatedWork W4225311246 @default.
- W3206369550 hasRelatedWork W4226248274 @default.
- W3206369550 isParatext "false" @default.
- W3206369550 isRetracted "false" @default.
- W3206369550 magId "3206369550" @default.
- W3206369550 workType "article" @default.