Matches in SemOpenAlex for { <https://semopenalex.org/work/W3036002380> ?p ?o ?g. }
- W3036002380 abstract "Efficient exploration is one of the main challenges in reinforcement learning (RL). Most existing sample-efficient algorithms assume the existence of a single reward function during exploration. In many practical scenarios, however, there is not a single underlying reward function to guide the exploration, for instance, when an agent needs to learn many skills simultaneously, or multiple conflicting objectives need to be balanced. To address these challenges, we propose the textit{task-agnostic RL} framework: In the exploration phase, the agent first collects trajectories by exploring the MDP without the guidance of a reward function. After exploration, it aims at finding near-optimal policies for $N$ tasks, given the collected trajectories augmented with textit{sampled rewards} for each task. We present an efficient task-agnostic RL algorithm, textsc{UCBZero}, that finds $epsilon$-optimal policies for $N$ arbitrary tasks after at most $tilde O(log(N)H^5SA/epsilon^2)$ exploration episodes. We also provide an $Omega(log (N)H^2SA/epsilon^2)$ lower bound, showing that the $log$ dependency on $N$ is unavoidable. Furthermore, we provide an $N$-independent sample complexity bound of textsc{UCBZero} in the statistically easier setting when the ground truth reward functions are known." @default.
- W3036002380 created "2020-06-25" @default.
- W3036002380 creator A5022094334 @default.
- W3036002380 creator A5027711113 @default.
- W3036002380 creator A5071995111 @default.
- W3036002380 date "2020-06-16" @default.
- W3036002380 modified "2023-09-24" @default.
- W3036002380 title "Task-agnostic Exploration in Reinforcement Learning" @default.
- W3036002380 cites W107583932 @default.
- W3036002380 cites W1850488217 @default.
- W3036002380 cites W2110144538 @default.
- W3036002380 cites W2129670787 @default.
- W3036002380 cites W2132876566 @default.
- W3036002380 cites W2227909145 @default.
- W3036002380 cites W2489939061 @default.
- W3036002380 cites W2528489519 @default.
- W3036002380 cites W2620290674 @default.
- W3036002380 cites W2769648743 @default.
- W3036002380 cites W2787933113 @default.
- W3036002380 cites W2788781499 @default.
- W3036002380 cites W2892490014 @default.
- W3036002380 cites W2899637793 @default.
- W3036002380 cites W2907502549 @default.
- W3036002380 cites W2914261249 @default.
- W3036002380 cites W2937206389 @default.
- W3036002380 cites W2963049774 @default.
- W3036002380 cites W2964001908 @default.
- W3036002380 cites W2964054583 @default.
- W3036002380 cites W2964118262 @default.
- W3036002380 cites W2964299116 @default.
- W3036002380 cites W3004977066 @default.
- W3036002380 cites W3020325294 @default.
- W3036002380 cites W567721252 @default.
- W3036002380 hasPublicationYear "2020" @default.
- W3036002380 type Work @default.
- W3036002380 sameAs 3036002380 @default.
- W3036002380 citedByCount "9" @default.
- W3036002380 countsByYear W30360023802020 @default.
- W3036002380 countsByYear W30360023802021 @default.
- W3036002380 crossrefType "posted-content" @default.
- W3036002380 hasAuthorship W3036002380A5022094334 @default.
- W3036002380 hasAuthorship W3036002380A5027711113 @default.
- W3036002380 hasAuthorship W3036002380A5071995111 @default.
- W3036002380 hasConcept C118615104 @default.
- W3036002380 hasConcept C121332964 @default.
- W3036002380 hasConcept C14036430 @default.
- W3036002380 hasConcept C154945302 @default.
- W3036002380 hasConcept C162324750 @default.
- W3036002380 hasConcept C187736073 @default.
- W3036002380 hasConcept C19768560 @default.
- W3036002380 hasConcept C198531522 @default.
- W3036002380 hasConcept C2778445095 @default.
- W3036002380 hasConcept C2779557605 @default.
- W3036002380 hasConcept C2780451532 @default.
- W3036002380 hasConcept C33923547 @default.
- W3036002380 hasConcept C36686422 @default.
- W3036002380 hasConcept C41008148 @default.
- W3036002380 hasConcept C62520636 @default.
- W3036002380 hasConcept C63553672 @default.
- W3036002380 hasConcept C78458016 @default.
- W3036002380 hasConcept C86803240 @default.
- W3036002380 hasConcept C97355855 @default.
- W3036002380 hasConcept C97541855 @default.
- W3036002380 hasConceptScore W3036002380C118615104 @default.
- W3036002380 hasConceptScore W3036002380C121332964 @default.
- W3036002380 hasConceptScore W3036002380C14036430 @default.
- W3036002380 hasConceptScore W3036002380C154945302 @default.
- W3036002380 hasConceptScore W3036002380C162324750 @default.
- W3036002380 hasConceptScore W3036002380C187736073 @default.
- W3036002380 hasConceptScore W3036002380C19768560 @default.
- W3036002380 hasConceptScore W3036002380C198531522 @default.
- W3036002380 hasConceptScore W3036002380C2778445095 @default.
- W3036002380 hasConceptScore W3036002380C2779557605 @default.
- W3036002380 hasConceptScore W3036002380C2780451532 @default.
- W3036002380 hasConceptScore W3036002380C33923547 @default.
- W3036002380 hasConceptScore W3036002380C36686422 @default.
- W3036002380 hasConceptScore W3036002380C41008148 @default.
- W3036002380 hasConceptScore W3036002380C62520636 @default.
- W3036002380 hasConceptScore W3036002380C63553672 @default.
- W3036002380 hasConceptScore W3036002380C78458016 @default.
- W3036002380 hasConceptScore W3036002380C86803240 @default.
- W3036002380 hasConceptScore W3036002380C97355855 @default.
- W3036002380 hasConceptScore W3036002380C97541855 @default.
- W3036002380 hasLocation W30360023801 @default.
- W3036002380 hasOpenAccess W3036002380 @default.
- W3036002380 hasPrimaryLocation W30360023801 @default.
- W3036002380 hasRelatedWork W1850488217 @default.
- W3036002380 hasRelatedWork W208428353 @default.
- W3036002380 hasRelatedWork W2112422203 @default.
- W3036002380 hasRelatedWork W2120678009 @default.
- W3036002380 hasRelatedWork W2962723383 @default.
- W3036002380 hasRelatedWork W2963049774 @default.
- W3036002380 hasRelatedWork W2964054583 @default.
- W3036002380 hasRelatedWork W2964299116 @default.
- W3036002380 hasRelatedWork W2972500268 @default.
- W3036002380 hasRelatedWork W3001490981 @default.
- W3036002380 hasRelatedWork W3004977066 @default.
- W3036002380 hasRelatedWork W3020949915 @default.
- W3036002380 hasRelatedWork W3034893468 @default.
- W3036002380 hasRelatedWork W3036498527 @default.