Matches in SemOpenAlex for { <https://semopenalex.org/work/W2206072005> ?p ?o ?g. }
Showing items 1 to 95 of
95
with 100 items per page.
- W2206072005 endingPage "440" @default.
- W2206072005 startingPage "435" @default.
- W2206072005 abstract "Recent Reinforcement Learning (RL) algorithms, such as R-MAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State TArgeted R-MAX (STAR-MAX) that explores a subset of the state space, called the exploration envelope ξ. When ξ equals the total state space, STAR-MAX behaves identically to R-MAX. When ξ is a subset of the state space, to keep exploration within ξ, a recovery rule β is needed. We compared existing algorithms with our algorithm employing various exploration envelopes. With an appropriate choice of ξ, STAR-MAX scales far better than existing RL algorithms as the number of states increases. A possible drawback of our algorithm is its dependence on a good choice of ξ and β. However, we show that an effective recovery rule β can be learned on-line and ξ can be learned from demonstrations. We also find that even randomly sampled exploration envelopes can improve cumulative rewards compared to R-MAX. We expect these results to lead to more efficient methods for RL in large-scale problems." @default.
- W2206072005 created "2016-06-24" @default.
- W2206072005 creator A5006380946 @default.
- W2206072005 creator A5011352362 @default.
- W2206072005 date "2011-08-04" @default.
- W2206072005 modified "2023-09-24" @default.
- W2206072005 title "Scaling Up Reinforcement Learning through Targeted Exploration" @default.
- W2206072005 cites W107583932 @default.
- W2206072005 cites W1505937442 @default.
- W2206072005 cites W1515851193 @default.
- W2206072005 cites W15411808 @default.
- W2206072005 cites W1769562380 @default.
- W2206072005 cites W183249136 @default.
- W2206072005 cites W1993711637 @default.
- W2206072005 cites W2121863487 @default.
- W2206072005 cites W2123447947 @default.
- W2206072005 cites W2162206751 @default.
- W2206072005 cites W2165792602 @default.
- W2206072005 cites W2166265228 @default.
- W2206072005 cites W2489939061 @default.
- W2206072005 doi "https://doi.org/10.1609/aaai.v25i1.7929" @default.
- W2206072005 hasPublicationYear "2011" @default.
- W2206072005 type Work @default.
- W2206072005 sameAs 2206072005 @default.
- W2206072005 citedByCount "3" @default.
- W2206072005 countsByYear W22060720052012 @default.
- W2206072005 countsByYear W22060720052013 @default.
- W2206072005 countsByYear W22060720052020 @default.
- W2206072005 crossrefType "journal-article" @default.
- W2206072005 hasAuthorship W2206072005A5006380946 @default.
- W2206072005 hasAuthorship W2206072005A5011352362 @default.
- W2206072005 hasBestOaLocation W22060720051 @default.
- W2206072005 hasConcept C105795698 @default.
- W2206072005 hasConcept C111919701 @default.
- W2206072005 hasConcept C11413529 @default.
- W2206072005 hasConcept C121332964 @default.
- W2206072005 hasConcept C126255220 @default.
- W2206072005 hasConcept C134306372 @default.
- W2206072005 hasConcept C154945302 @default.
- W2206072005 hasConcept C2524010 @default.
- W2206072005 hasConcept C2778572836 @default.
- W2206072005 hasConcept C2778755073 @default.
- W2206072005 hasConcept C2780897414 @default.
- W2206072005 hasConcept C33923547 @default.
- W2206072005 hasConcept C41008148 @default.
- W2206072005 hasConcept C48103436 @default.
- W2206072005 hasConcept C554190296 @default.
- W2206072005 hasConcept C62520636 @default.
- W2206072005 hasConcept C65155139 @default.
- W2206072005 hasConcept C72434380 @default.
- W2206072005 hasConcept C76155785 @default.
- W2206072005 hasConcept C97541855 @default.
- W2206072005 hasConcept C99844830 @default.
- W2206072005 hasConceptScore W2206072005C105795698 @default.
- W2206072005 hasConceptScore W2206072005C111919701 @default.
- W2206072005 hasConceptScore W2206072005C11413529 @default.
- W2206072005 hasConceptScore W2206072005C121332964 @default.
- W2206072005 hasConceptScore W2206072005C126255220 @default.
- W2206072005 hasConceptScore W2206072005C134306372 @default.
- W2206072005 hasConceptScore W2206072005C154945302 @default.
- W2206072005 hasConceptScore W2206072005C2524010 @default.
- W2206072005 hasConceptScore W2206072005C2778572836 @default.
- W2206072005 hasConceptScore W2206072005C2778755073 @default.
- W2206072005 hasConceptScore W2206072005C2780897414 @default.
- W2206072005 hasConceptScore W2206072005C33923547 @default.
- W2206072005 hasConceptScore W2206072005C41008148 @default.
- W2206072005 hasConceptScore W2206072005C48103436 @default.
- W2206072005 hasConceptScore W2206072005C554190296 @default.
- W2206072005 hasConceptScore W2206072005C62520636 @default.
- W2206072005 hasConceptScore W2206072005C65155139 @default.
- W2206072005 hasConceptScore W2206072005C72434380 @default.
- W2206072005 hasConceptScore W2206072005C76155785 @default.
- W2206072005 hasConceptScore W2206072005C97541855 @default.
- W2206072005 hasConceptScore W2206072005C99844830 @default.
- W2206072005 hasIssue "1" @default.
- W2206072005 hasLocation W22060720051 @default.
- W2206072005 hasOpenAccess W2206072005 @default.
- W2206072005 hasPrimaryLocation W22060720051 @default.
- W2206072005 hasRelatedWork W1492014007 @default.
- W2206072005 hasRelatedWork W2094557321 @default.
- W2206072005 hasRelatedWork W2923653485 @default.
- W2206072005 hasRelatedWork W2957776456 @default.
- W2206072005 hasRelatedWork W2990393949 @default.
- W2206072005 hasRelatedWork W3103643887 @default.
- W2206072005 hasRelatedWork W3170446423 @default.
- W2206072005 hasRelatedWork W3173185086 @default.
- W2206072005 hasRelatedWork W3196472998 @default.
- W2206072005 hasRelatedWork W4287598111 @default.
- W2206072005 hasVolume "25" @default.
- W2206072005 isParatext "false" @default.
- W2206072005 isRetracted "false" @default.
- W2206072005 magId "2206072005" @default.
- W2206072005 workType "article" @default.