Matches in SemOpenAlex for { <https://semopenalex.org/work/W2946300093> ?p ?o ?g. }
- W2946300093 abstract "Exploration in reinforcement learning (RL) suffers from the curse of dimensionality when the state-action space is large. A common practice is to parameterize the high-dimensional value and policy functions using given features. However existing methods either have no theoretical guarantee or suffer a regret that is exponential in the planning horizon $H$. In this paper, we propose an online RL algorithm, namely the MatrixRL, that leverages ideas from linear bandit to learn a low-dimensional representation of the probability transition model while carefully balancing the exploitation-exploration tradeoff. We show that MatrixRL achieves a regret bound ${O}big(H^2dlog Tsqrt{T}big)$ where $d$ is the number of features. MatrixRL has an equivalent kernelized version, which is able to work with an arbitrary kernel Hilbert space without using explicit features. In this case, the kernelized MatrixRL satisfies a regret bound ${O}big(H^2widetilde{d}log Tsqrt{T}big)$, where $widetilde{d}$ is the effective dimension of the kernel space. To our best knowledge, for RL using features or kernels, our results are the first regret bounds that are near-optimal in time $T$ and dimension $d$ (or $widetilde{d}$) and polynomial in the planning horizon $H$." @default.
- W2946300093 created "2019-05-29" @default.
- W2946300093 creator A5052683643 @default.
- W2946300093 creator A5072096775 @default.
- W2946300093 date "2019-05-24" @default.
- W2946300093 modified "2023-09-27" @default.
- W2946300093 title "Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound." @default.
- W2946300093 cites W107583932 @default.
- W2946300093 cites W1487320471 @default.
- W2946300093 cites W1510073064 @default.
- W2946300093 cites W1526654727 @default.
- W2946300093 cites W1646707810 @default.
- W2946300093 cites W1757796397 @default.
- W2946300093 cites W1850488217 @default.
- W2946300093 cites W1969276875 @default.
- W2946300093 cites W1977655452 @default.
- W2946300093 cites W2061753713 @default.
- W2946300093 cites W2112420033 @default.
- W2946300093 cites W2119738618 @default.
- W2946300093 cites W2120678009 @default.
- W2946300093 cites W2123447947 @default.
- W2946300093 cites W2123979492 @default.
- W2946300093 cites W2129670787 @default.
- W2946300093 cites W2137766593 @default.
- W2946300093 cites W2139418546 @default.
- W2946300093 cites W2144902422 @default.
- W2946300093 cites W2145339207 @default.
- W2946300093 cites W2341171179 @default.
- W2946300093 cites W2394933259 @default.
- W2946300093 cites W2518564545 @default.
- W2946300093 cites W2530849036 @default.
- W2946300093 cites W2604884452 @default.
- W2946300093 cites W2604960773 @default.
- W2946300093 cites W2766447205 @default.
- W2946300093 cites W2769648743 @default.
- W2946300093 cites W2792619169 @default.
- W2946300093 cites W2805861379 @default.
- W2946300093 cites W2899637793 @default.
- W2946300093 cites W2911793117 @default.
- W2946300093 cites W2921704027 @default.
- W2946300093 cites W2948677277 @default.
- W2946300093 cites W2950238385 @default.
- W2946300093 cites W2962937842 @default.
- W2946300093 cites W2963049774 @default.
- W2946300093 cites W2963271096 @default.
- W2946300093 cites W2963797557 @default.
- W2946300093 cites W2964178973 @default.
- W2946300093 cites W2964299116 @default.
- W2946300093 cites W3036081633 @default.
- W2946300093 cites W50486269 @default.
- W2946300093 hasPublicationYear "2019" @default.
- W2946300093 type Work @default.
- W2946300093 sameAs 2946300093 @default.
- W2946300093 citedByCount "27" @default.
- W2946300093 countsByYear W29463000932018 @default.
- W2946300093 countsByYear W29463000932019 @default.
- W2946300093 countsByYear W29463000932020 @default.
- W2946300093 countsByYear W29463000932021 @default.
- W2946300093 crossrefType "posted-content" @default.
- W2946300093 hasAuthorship W2946300093A5052683643 @default.
- W2946300093 hasAuthorship W2946300093A5072096775 @default.
- W2946300093 hasConcept C105795698 @default.
- W2946300093 hasConcept C106189395 @default.
- W2946300093 hasConcept C111030470 @default.
- W2946300093 hasConcept C111919701 @default.
- W2946300093 hasConcept C114614502 @default.
- W2946300093 hasConcept C118615104 @default.
- W2946300093 hasConcept C126255220 @default.
- W2946300093 hasConcept C134306372 @default.
- W2946300093 hasConcept C154945302 @default.
- W2946300093 hasConcept C159176650 @default.
- W2946300093 hasConcept C159886148 @default.
- W2946300093 hasConcept C2524010 @default.
- W2946300093 hasConcept C2778572836 @default.
- W2946300093 hasConcept C28761237 @default.
- W2946300093 hasConcept C33676613 @default.
- W2946300093 hasConcept C33923547 @default.
- W2946300093 hasConcept C41008148 @default.
- W2946300093 hasConcept C50817715 @default.
- W2946300093 hasConcept C74193536 @default.
- W2946300093 hasConcept C77553402 @default.
- W2946300093 hasConcept C97541855 @default.
- W2946300093 hasConceptScore W2946300093C105795698 @default.
- W2946300093 hasConceptScore W2946300093C106189395 @default.
- W2946300093 hasConceptScore W2946300093C111030470 @default.
- W2946300093 hasConceptScore W2946300093C111919701 @default.
- W2946300093 hasConceptScore W2946300093C114614502 @default.
- W2946300093 hasConceptScore W2946300093C118615104 @default.
- W2946300093 hasConceptScore W2946300093C126255220 @default.
- W2946300093 hasConceptScore W2946300093C134306372 @default.
- W2946300093 hasConceptScore W2946300093C154945302 @default.
- W2946300093 hasConceptScore W2946300093C159176650 @default.
- W2946300093 hasConceptScore W2946300093C159886148 @default.
- W2946300093 hasConceptScore W2946300093C2524010 @default.
- W2946300093 hasConceptScore W2946300093C2778572836 @default.
- W2946300093 hasConceptScore W2946300093C28761237 @default.
- W2946300093 hasConceptScore W2946300093C33676613 @default.
- W2946300093 hasConceptScore W2946300093C33923547 @default.
- W2946300093 hasConceptScore W2946300093C41008148 @default.
- W2946300093 hasConceptScore W2946300093C50817715 @default.