Matches in SemOpenAlex for { <https://semopenalex.org/work/W4377866472> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4377866472 abstract "Meta-Reinforcement Learning (MRL) is a promising framework for training agents that can quickly adapt to new environments and tasks. In this work, we study the MRL problem under the policy gradient formulation, where we propose a novel algorithm that uses Moreau envelope surrogate regularizers to jointly learn a meta-policy that is adjustable to the environment of each individual task. Our algorithm, called Moreau Envelope Meta-Reinforcement Learning (MEMRL), learns a meta-policy that can adapt to a distribution of tasks by efficiently updating the policy parameters using a combination of gradient-based optimization and Moreau Envelope regularization. Moreau Envelopes provide a smooth approximation of the policy optimization problem, which enables us to apply standard optimization techniques and converge to an appropriate stationary point. We provide a detailed analysis of the MEMRL algorithm, where we show a sublinear convergence rate to a first-order stationary point for non-convex policy gradient optimization. We finally show the effectiveness of MEMRL on a multi-task 2D-navigation problem." @default.
- W4377866472 created "2023-05-24" @default.
- W4377866472 creator A5000289640 @default.
- W4377866472 creator A5020076936 @default.
- W4377866472 creator A5047076392 @default.
- W4377866472 date "2023-05-20" @default.
- W4377866472 modified "2023-10-17" @default.
- W4377866472 title "On First-Order Meta-Reinforcement Learning with Moreau Envelopes" @default.
- W4377866472 doi "https://doi.org/10.48550/arxiv.2305.12216" @default.
- W4377866472 hasPublicationYear "2023" @default.
- W4377866472 type Work @default.
- W4377866472 citedByCount "0" @default.
- W4377866472 crossrefType "posted-content" @default.
- W4377866472 hasAuthorship W4377866472A5000289640 @default.
- W4377866472 hasAuthorship W4377866472A5020076936 @default.
- W4377866472 hasAuthorship W4377866472A5047076392 @default.
- W4377866472 hasBestOaLocation W43778664721 @default.
- W4377866472 hasConcept C11413529 @default.
- W4377866472 hasConcept C115680565 @default.
- W4377866472 hasConcept C117160843 @default.
- W4377866472 hasConcept C126255220 @default.
- W4377866472 hasConcept C134306372 @default.
- W4377866472 hasConcept C137836250 @default.
- W4377866472 hasConcept C154945302 @default.
- W4377866472 hasConcept C162324750 @default.
- W4377866472 hasConcept C178635117 @default.
- W4377866472 hasConcept C187736073 @default.
- W4377866472 hasConcept C189237950 @default.
- W4377866472 hasConcept C2776135515 @default.
- W4377866472 hasConcept C2777303404 @default.
- W4377866472 hasConcept C2780451532 @default.
- W4377866472 hasConcept C33923547 @default.
- W4377866472 hasConcept C38652104 @default.
- W4377866472 hasConcept C41008148 @default.
- W4377866472 hasConcept C50522688 @default.
- W4377866472 hasConcept C554190296 @default.
- W4377866472 hasConcept C65155139 @default.
- W4377866472 hasConcept C76155785 @default.
- W4377866472 hasConcept C89109886 @default.
- W4377866472 hasConcept C97541855 @default.
- W4377866472 hasConceptScore W4377866472C11413529 @default.
- W4377866472 hasConceptScore W4377866472C115680565 @default.
- W4377866472 hasConceptScore W4377866472C117160843 @default.
- W4377866472 hasConceptScore W4377866472C126255220 @default.
- W4377866472 hasConceptScore W4377866472C134306372 @default.
- W4377866472 hasConceptScore W4377866472C137836250 @default.
- W4377866472 hasConceptScore W4377866472C154945302 @default.
- W4377866472 hasConceptScore W4377866472C162324750 @default.
- W4377866472 hasConceptScore W4377866472C178635117 @default.
- W4377866472 hasConceptScore W4377866472C187736073 @default.
- W4377866472 hasConceptScore W4377866472C189237950 @default.
- W4377866472 hasConceptScore W4377866472C2776135515 @default.
- W4377866472 hasConceptScore W4377866472C2777303404 @default.
- W4377866472 hasConceptScore W4377866472C2780451532 @default.
- W4377866472 hasConceptScore W4377866472C33923547 @default.
- W4377866472 hasConceptScore W4377866472C38652104 @default.
- W4377866472 hasConceptScore W4377866472C41008148 @default.
- W4377866472 hasConceptScore W4377866472C50522688 @default.
- W4377866472 hasConceptScore W4377866472C554190296 @default.
- W4377866472 hasConceptScore W4377866472C65155139 @default.
- W4377866472 hasConceptScore W4377866472C76155785 @default.
- W4377866472 hasConceptScore W4377866472C89109886 @default.
- W4377866472 hasConceptScore W4377866472C97541855 @default.
- W4377866472 hasLocation W43778664721 @default.
- W4377866472 hasOpenAccess W4377866472 @default.
- W4377866472 hasPrimaryLocation W43778664721 @default.
- W4377866472 hasRelatedWork W1981039871 @default.
- W4377866472 hasRelatedWork W2034295546 @default.
- W4377866472 hasRelatedWork W2102267274 @default.
- W4377866472 hasRelatedWork W2401692973 @default.
- W4377866472 hasRelatedWork W2515721268 @default.
- W4377866472 hasRelatedWork W2946319938 @default.
- W4377866472 hasRelatedWork W2970927156 @default.
- W4377866472 hasRelatedWork W2971336218 @default.
- W4377866472 hasRelatedWork W2981915592 @default.
- W4377866472 hasRelatedWork W4293352224 @default.
- W4377866472 isParatext "false" @default.
- W4377866472 isRetracted "false" @default.
- W4377866472 workType "article" @default.