Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287869881> ?p ?o ?g. }
Showing items 1 to 76 of
76
with 100 items per page.
- W4287869881 abstract "Exploration is widely regarded as one of the most challenging aspects of reinforcement learning (RL), with many naive approaches succumbing to exponential sample complexity. To isolate the challenges of exploration, we propose a new reward-free RL framework. In the exploration phase, the agent first collects trajectories from an MDP $mathcal{M}$ without a pre-specified reward function. After exploration, it is tasked with computing near-optimal policies under for $mathcal{M}$ for a collection of given reward functions. This framework is particularly suitable when there are many reward functions of interest, or when the reward function is shaped by an external agent to elicit desired behavior. We give an efficient algorithm that conducts $tilde{mathcal{O}}(S^2Amathrm{poly}(H)/epsilon^2)$ episodes of exploration and returns $epsilon$-suboptimal policies for an arbitrary number of reward functions. We achieve this by finding exploratory policies that visit each significant state with probability proportional to its maximum visitation probability under any possible policy. Moreover, our planning procedure can be instantiated by any black-box approximate planner, such as value iteration or natural policy gradient. We also give a nearly-matching $Omega(S^2AH^2/epsilon^2)$ lower bound, demonstrating the near-optimality of our algorithm in this setting." @default.
- W4287869881 created "2022-07-26" @default.
- W4287869881 creator A5015082848 @default.
- W4287869881 creator A5037154191 @default.
- W4287869881 creator A5037463304 @default.
- W4287869881 creator A5082311500 @default.
- W4287869881 date "2020-02-07" @default.
- W4287869881 modified "2023-09-29" @default.
- W4287869881 title "Reward-Free Exploration for Reinforcement Learning" @default.
- W4287869881 hasPublicationYear "2020" @default.
- W4287869881 type Work @default.
- W4287869881 citedByCount "0" @default.
- W4287869881 crossrefType "posted-content" @default.
- W4287869881 hasAuthorship W4287869881A5015082848 @default.
- W4287869881 hasAuthorship W4287869881A5037154191 @default.
- W4287869881 hasAuthorship W4287869881A5037463304 @default.
- W4287869881 hasAuthorship W4287869881A5082311500 @default.
- W4287869881 hasBestOaLocation W42878698811 @default.
- W4287869881 hasConcept C105795698 @default.
- W4287869881 hasConcept C11413529 @default.
- W4287869881 hasConcept C114614502 @default.
- W4287869881 hasConcept C121332964 @default.
- W4287869881 hasConcept C126255220 @default.
- W4287869881 hasConcept C134306372 @default.
- W4287869881 hasConcept C14036430 @default.
- W4287869881 hasConcept C14646407 @default.
- W4287869881 hasConcept C151376022 @default.
- W4287869881 hasConcept C154945302 @default.
- W4287869881 hasConcept C165064840 @default.
- W4287869881 hasConcept C2776999362 @default.
- W4287869881 hasConcept C2779557605 @default.
- W4287869881 hasConcept C33923547 @default.
- W4287869881 hasConcept C36686422 @default.
- W4287869881 hasConcept C41008148 @default.
- W4287869881 hasConcept C48103436 @default.
- W4287869881 hasConcept C62520636 @default.
- W4287869881 hasConcept C78458016 @default.
- W4287869881 hasConcept C86803240 @default.
- W4287869881 hasConcept C97541855 @default.
- W4287869881 hasConceptScore W4287869881C105795698 @default.
- W4287869881 hasConceptScore W4287869881C11413529 @default.
- W4287869881 hasConceptScore W4287869881C114614502 @default.
- W4287869881 hasConceptScore W4287869881C121332964 @default.
- W4287869881 hasConceptScore W4287869881C126255220 @default.
- W4287869881 hasConceptScore W4287869881C134306372 @default.
- W4287869881 hasConceptScore W4287869881C14036430 @default.
- W4287869881 hasConceptScore W4287869881C14646407 @default.
- W4287869881 hasConceptScore W4287869881C151376022 @default.
- W4287869881 hasConceptScore W4287869881C154945302 @default.
- W4287869881 hasConceptScore W4287869881C165064840 @default.
- W4287869881 hasConceptScore W4287869881C2776999362 @default.
- W4287869881 hasConceptScore W4287869881C2779557605 @default.
- W4287869881 hasConceptScore W4287869881C33923547 @default.
- W4287869881 hasConceptScore W4287869881C36686422 @default.
- W4287869881 hasConceptScore W4287869881C41008148 @default.
- W4287869881 hasConceptScore W4287869881C48103436 @default.
- W4287869881 hasConceptScore W4287869881C62520636 @default.
- W4287869881 hasConceptScore W4287869881C78458016 @default.
- W4287869881 hasConceptScore W4287869881C86803240 @default.
- W4287869881 hasConceptScore W4287869881C97541855 @default.
- W4287869881 hasLocation W42878698811 @default.
- W4287869881 hasOpenAccess W4287869881 @default.
- W4287869881 hasPrimaryLocation W42878698811 @default.
- W4287869881 hasRelatedWork W10377101 @default.
- W4287869881 hasRelatedWork W10520729 @default.
- W4287869881 hasRelatedWork W10913952 @default.
- W4287869881 hasRelatedWork W1279312 @default.
- W4287869881 hasRelatedWork W2191283 @default.
- W4287869881 hasRelatedWork W3551423 @default.
- W4287869881 hasRelatedWork W3746686 @default.
- W4287869881 hasRelatedWork W4776762 @default.
- W4287869881 hasRelatedWork W6242441 @default.
- W4287869881 hasRelatedWork W7318248 @default.
- W4287869881 isParatext "false" @default.
- W4287869881 isRetracted "false" @default.
- W4287869881 workType "article" @default.