Matches in SemOpenAlex for { <https://semopenalex.org/work/W4366459483> ?p ?o ?g. }
Showing items 1 to 77 of
77
with 100 items per page.
- W4366459483 abstract "An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to design a correct reward function for even simple tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. However, despite achieving great empirical successes, HiL RL usually requires too much feedback from a human teacher and also suffers from insufficient theoretical understanding. In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks. We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs. After that, the algorithm guarantees to provide a nearly optimal policy for the task with high probability. We show that, even with the presence of random noise in the feedback, the algorithm only takes $widetilde{O}(H{{dim_{R}^2}})$ queries on the reward function to provide an $epsilon$-optimal policy for any $epsilon > 0$. Here $H$ is the horizon of the RL environment, and $dim_{R}$ specifies the complexity of the function class representing the reward function. In contrast, standard RL algorithms require to query the reward function for at least $Omega(operatorname{poly}(d, 1/epsilon))$ state-action pairs where $d$ depends on the complexity of the environmental transition." @default.
- W4366459483 created "2023-04-22" @default.
- W4366459483 creator A5041144949 @default.
- W4366459483 creator A5072096775 @default.
- W4366459483 date "2023-04-18" @default.
- W4366459483 modified "2023-09-25" @default.
- W4366459483 title "Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning" @default.
- W4366459483 doi "https://doi.org/10.48550/arxiv.2304.08944" @default.
- W4366459483 hasPublicationYear "2023" @default.
- W4366459483 type Work @default.
- W4366459483 citedByCount "0" @default.
- W4366459483 crossrefType "posted-content" @default.
- W4366459483 hasAuthorship W4366459483A5041144949 @default.
- W4366459483 hasAuthorship W4366459483A5072096775 @default.
- W4366459483 hasBestOaLocation W43664594831 @default.
- W4366459483 hasConcept C111472728 @default.
- W4366459483 hasConcept C11413529 @default.
- W4366459483 hasConcept C119857082 @default.
- W4366459483 hasConcept C120665830 @default.
- W4366459483 hasConcept C121332964 @default.
- W4366459483 hasConcept C12713177 @default.
- W4366459483 hasConcept C138885662 @default.
- W4366459483 hasConcept C14036430 @default.
- W4366459483 hasConcept C154945302 @default.
- W4366459483 hasConcept C162324750 @default.
- W4366459483 hasConcept C187736073 @default.
- W4366459483 hasConcept C188116033 @default.
- W4366459483 hasConcept C192209626 @default.
- W4366459483 hasConcept C2777212361 @default.
- W4366459483 hasConcept C2780451532 @default.
- W4366459483 hasConcept C2780586882 @default.
- W4366459483 hasConcept C2780791683 @default.
- W4366459483 hasConcept C41008148 @default.
- W4366459483 hasConcept C48103436 @default.
- W4366459483 hasConcept C62520636 @default.
- W4366459483 hasConcept C78458016 @default.
- W4366459483 hasConcept C86803240 @default.
- W4366459483 hasConcept C97541855 @default.
- W4366459483 hasConceptScore W4366459483C111472728 @default.
- W4366459483 hasConceptScore W4366459483C11413529 @default.
- W4366459483 hasConceptScore W4366459483C119857082 @default.
- W4366459483 hasConceptScore W4366459483C120665830 @default.
- W4366459483 hasConceptScore W4366459483C121332964 @default.
- W4366459483 hasConceptScore W4366459483C12713177 @default.
- W4366459483 hasConceptScore W4366459483C138885662 @default.
- W4366459483 hasConceptScore W4366459483C14036430 @default.
- W4366459483 hasConceptScore W4366459483C154945302 @default.
- W4366459483 hasConceptScore W4366459483C162324750 @default.
- W4366459483 hasConceptScore W4366459483C187736073 @default.
- W4366459483 hasConceptScore W4366459483C188116033 @default.
- W4366459483 hasConceptScore W4366459483C192209626 @default.
- W4366459483 hasConceptScore W4366459483C2777212361 @default.
- W4366459483 hasConceptScore W4366459483C2780451532 @default.
- W4366459483 hasConceptScore W4366459483C2780586882 @default.
- W4366459483 hasConceptScore W4366459483C2780791683 @default.
- W4366459483 hasConceptScore W4366459483C41008148 @default.
- W4366459483 hasConceptScore W4366459483C48103436 @default.
- W4366459483 hasConceptScore W4366459483C62520636 @default.
- W4366459483 hasConceptScore W4366459483C78458016 @default.
- W4366459483 hasConceptScore W4366459483C86803240 @default.
- W4366459483 hasConceptScore W4366459483C97541855 @default.
- W4366459483 hasLocation W43664594831 @default.
- W4366459483 hasOpenAccess W4366459483 @default.
- W4366459483 hasPrimaryLocation W43664594831 @default.
- W4366459483 hasRelatedWork W2170607316 @default.
- W4366459483 hasRelatedWork W2416943787 @default.
- W4366459483 hasRelatedWork W2734912394 @default.
- W4366459483 hasRelatedWork W2923653485 @default.
- W4366459483 hasRelatedWork W2945115303 @default.
- W4366459483 hasRelatedWork W3022038857 @default.
- W4366459483 hasRelatedWork W4220782901 @default.
- W4366459483 hasRelatedWork W4288348115 @default.
- W4366459483 hasRelatedWork W4319083788 @default.
- W4366459483 hasRelatedWork W4327778759 @default.
- W4366459483 isParatext "false" @default.
- W4366459483 isRetracted "false" @default.
- W4366459483 workType "article" @default.