Matches in SemOpenAlex for { <https://semopenalex.org/work/W4311432352> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4311432352 abstract "We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition dynamic can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrt{H^3K})$, where $d$ is the dimension of the feature mapping, $H$ is the planning horizon, and $K$ is the number of episodes. Our algorithm is based on a weighted linear regression scheme with a carefully designed weight, which depends on a new variance estimator that (1) directly estimates the variance of the emph{optimal} value function, (2) monotonically decreases with respect to the number of episodes to ensure a better estimation accuracy, and (3) uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest." @default.
- W4311432352 created "2022-12-26" @default.
- W4311432352 creator A5047358306 @default.
- W4311432352 creator A5051448391 @default.
- W4311432352 creator A5063820180 @default.
- W4311432352 creator A5080386620 @default.
- W4311432352 date "2022-12-12" @default.
- W4311432352 modified "2023-10-17" @default.
- W4311432352 title "Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes" @default.
- W4311432352 doi "https://doi.org/10.48550/arxiv.2212.06132" @default.
- W4311432352 hasPublicationYear "2022" @default.
- W4311432352 type Work @default.
- W4311432352 citedByCount "0" @default.
- W4311432352 crossrefType "posted-content" @default.
- W4311432352 hasAuthorship W4311432352A5047358306 @default.
- W4311432352 hasAuthorship W4311432352A5051448391 @default.
- W4311432352 hasAuthorship W4311432352A5063820180 @default.
- W4311432352 hasAuthorship W4311432352A5080386620 @default.
- W4311432352 hasBestOaLocation W43114323521 @default.
- W4311432352 hasConcept C105795698 @default.
- W4311432352 hasConcept C106189395 @default.
- W4311432352 hasConcept C11413529 @default.
- W4311432352 hasConcept C114614502 @default.
- W4311432352 hasConcept C126255220 @default.
- W4311432352 hasConcept C14036430 @default.
- W4311432352 hasConcept C14646407 @default.
- W4311432352 hasConcept C149728462 @default.
- W4311432352 hasConcept C154945302 @default.
- W4311432352 hasConcept C159886148 @default.
- W4311432352 hasConcept C165464430 @default.
- W4311432352 hasConcept C185429906 @default.
- W4311432352 hasConcept C33676613 @default.
- W4311432352 hasConcept C33923547 @default.
- W4311432352 hasConcept C41008148 @default.
- W4311432352 hasConcept C50817715 @default.
- W4311432352 hasConcept C78458016 @default.
- W4311432352 hasConcept C86803240 @default.
- W4311432352 hasConcept C97541855 @default.
- W4311432352 hasConceptScore W4311432352C105795698 @default.
- W4311432352 hasConceptScore W4311432352C106189395 @default.
- W4311432352 hasConceptScore W4311432352C11413529 @default.
- W4311432352 hasConceptScore W4311432352C114614502 @default.
- W4311432352 hasConceptScore W4311432352C126255220 @default.
- W4311432352 hasConceptScore W4311432352C14036430 @default.
- W4311432352 hasConceptScore W4311432352C14646407 @default.
- W4311432352 hasConceptScore W4311432352C149728462 @default.
- W4311432352 hasConceptScore W4311432352C154945302 @default.
- W4311432352 hasConceptScore W4311432352C159886148 @default.
- W4311432352 hasConceptScore W4311432352C165464430 @default.
- W4311432352 hasConceptScore W4311432352C185429906 @default.
- W4311432352 hasConceptScore W4311432352C33676613 @default.
- W4311432352 hasConceptScore W4311432352C33923547 @default.
- W4311432352 hasConceptScore W4311432352C41008148 @default.
- W4311432352 hasConceptScore W4311432352C50817715 @default.
- W4311432352 hasConceptScore W4311432352C78458016 @default.
- W4311432352 hasConceptScore W4311432352C86803240 @default.
- W4311432352 hasConceptScore W4311432352C97541855 @default.
- W4311432352 hasLocation W43114323521 @default.
- W4311432352 hasOpenAccess W4311432352 @default.
- W4311432352 hasPrimaryLocation W43114323521 @default.
- W4311432352 hasRelatedWork W2103708221 @default.
- W4311432352 hasRelatedWork W2963713569 @default.
- W4311432352 hasRelatedWork W3091875946 @default.
- W4311432352 hasRelatedWork W3111617249 @default.
- W4311432352 hasRelatedWork W3176362036 @default.
- W4311432352 hasRelatedWork W3182614517 @default.
- W4311432352 hasRelatedWork W3212295265 @default.
- W4311432352 hasRelatedWork W4284890489 @default.
- W4311432352 hasRelatedWork W4287102143 @default.
- W4311432352 hasRelatedWork W4287555357 @default.
- W4311432352 isParatext "false" @default.
- W4311432352 isRetracted "false" @default.
- W4311432352 workType "article" @default.