Matches in SemOpenAlex for { <https://semopenalex.org/work/W3129608001> ?p ?o ?g. }
- W3129608001 abstract "We study reinforcement learning in an infinite-horizon average-reward setting with linear function approximation, where the transition probability function of the underlying Markov Decision Process (MDP) admits a linear form over a feature mapping of the current state, action, and next state. We propose a new algorithm UCRL2-VTR, which can be seen as an extension of the UCRL2 algorithm with linear function approximation. We show that UCRL2-VTR with Bernstein-type bonus can achieve a regret of $tilde{O}(dsqrt{DT})$, where $d$ is the dimension of the feature mapping, $T$ is the horizon, and $sqrt{D}$ is the diameter of the MDP. We also prove a matching lower bound $tilde{Omega}(dsqrt{DT})$, which suggests that the proposed UCRL2-VTR is minimax optimal up to logarithmic factors. To the best of our knowledge, our algorithm is the first nearly minimax optimal RL algorithm with function approximation in the infinite-horizon average-reward setting." @default.
- W3129608001 created "2021-03-01" @default.
- W3129608001 creator A5051448391 @default.
- W3129608001 creator A5076385060 @default.
- W3129608001 creator A5080386620 @default.
- W3129608001 date "2021-02-14" @default.
- W3129608001 modified "2023-09-23" @default.
- W3129608001 title "Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation" @default.
- W3129608001 cites W1512069656 @default.
- W3129608001 cites W1662803991 @default.
- W3129608001 cites W1850488217 @default.
- W3129608001 cites W2077902449 @default.
- W3129608001 cites W2119567691 @default.
- W3129608001 cites W2119738618 @default.
- W3129608001 cites W2120678009 @default.
- W3129608001 cites W2121863487 @default.
- W3129608001 cites W2145339207 @default.
- W3129608001 cites W2545659366 @default.
- W3129608001 cites W2769648743 @default.
- W3129608001 cites W2832404192 @default.
- W3129608001 cites W2945496654 @default.
- W3129608001 cites W2959895084 @default.
- W3129608001 cites W2963600139 @default.
- W3129608001 cites W2963767098 @default.
- W3129608001 cites W2964000194 @default.
- W3129608001 cites W2970355847 @default.
- W3129608001 cites W2970981002 @default.
- W3129608001 cites W2981322253 @default.
- W3129608001 cites W2991929641 @default.
- W3129608001 cites W3004970331 @default.
- W3129608001 cites W3029753614 @default.
- W3129608001 cites W3034871777 @default.
- W3129608001 cites W3035273634 @default.
- W3129608001 cites W3041070598 @default.
- W3129608001 cites W3129154373 @default.
- W3129608001 cites W3157247518 @default.
- W3129608001 cites W3196847620 @default.
- W3129608001 cites W2013866489 @default.
- W3129608001 cites W3197999671 @default.
- W3129608001 doi "https://doi.org/10.48550/arxiv.2102.07301" @default.
- W3129608001 hasPublicationYear "2021" @default.
- W3129608001 type Work @default.
- W3129608001 sameAs 3129608001 @default.
- W3129608001 citedByCount "3" @default.
- W3129608001 countsByYear W31296080012021 @default.
- W3129608001 crossrefType "posted-content" @default.
- W3129608001 hasAuthorship W3129608001A5051448391 @default.
- W3129608001 hasAuthorship W3129608001A5076385060 @default.
- W3129608001 hasAuthorship W3129608001A5080386620 @default.
- W3129608001 hasBestOaLocation W31296080011 @default.
- W3129608001 hasConcept C105795698 @default.
- W3129608001 hasConcept C106189395 @default.
- W3129608001 hasConcept C114614502 @default.
- W3129608001 hasConcept C117148685 @default.
- W3129608001 hasConcept C118615104 @default.
- W3129608001 hasConcept C126255220 @default.
- W3129608001 hasConcept C134306372 @default.
- W3129608001 hasConcept C14036430 @default.
- W3129608001 hasConcept C149728462 @default.
- W3129608001 hasConcept C154945302 @default.
- W3129608001 hasConcept C159176650 @default.
- W3129608001 hasConcept C159886148 @default.
- W3129608001 hasConcept C165064840 @default.
- W3129608001 hasConcept C2524010 @default.
- W3129608001 hasConcept C28761237 @default.
- W3129608001 hasConcept C33676613 @default.
- W3129608001 hasConcept C33923547 @default.
- W3129608001 hasConcept C39927690 @default.
- W3129608001 hasConcept C41008148 @default.
- W3129608001 hasConcept C50817715 @default.
- W3129608001 hasConcept C78458016 @default.
- W3129608001 hasConcept C86803240 @default.
- W3129608001 hasConcept C97541855 @default.
- W3129608001 hasConceptScore W3129608001C105795698 @default.
- W3129608001 hasConceptScore W3129608001C106189395 @default.
- W3129608001 hasConceptScore W3129608001C114614502 @default.
- W3129608001 hasConceptScore W3129608001C117148685 @default.
- W3129608001 hasConceptScore W3129608001C118615104 @default.
- W3129608001 hasConceptScore W3129608001C126255220 @default.
- W3129608001 hasConceptScore W3129608001C134306372 @default.
- W3129608001 hasConceptScore W3129608001C14036430 @default.
- W3129608001 hasConceptScore W3129608001C149728462 @default.
- W3129608001 hasConceptScore W3129608001C154945302 @default.
- W3129608001 hasConceptScore W3129608001C159176650 @default.
- W3129608001 hasConceptScore W3129608001C159886148 @default.
- W3129608001 hasConceptScore W3129608001C165064840 @default.
- W3129608001 hasConceptScore W3129608001C2524010 @default.
- W3129608001 hasConceptScore W3129608001C28761237 @default.
- W3129608001 hasConceptScore W3129608001C33676613 @default.
- W3129608001 hasConceptScore W3129608001C33923547 @default.
- W3129608001 hasConceptScore W3129608001C39927690 @default.
- W3129608001 hasConceptScore W3129608001C41008148 @default.
- W3129608001 hasConceptScore W3129608001C50817715 @default.
- W3129608001 hasConceptScore W3129608001C78458016 @default.
- W3129608001 hasConceptScore W3129608001C86803240 @default.
- W3129608001 hasConceptScore W3129608001C97541855 @default.
- W3129608001 hasLocation W31296080011 @default.
- W3129608001 hasOpenAccess W3129608001 @default.
- W3129608001 hasPrimaryLocation W31296080011 @default.
- W3129608001 hasRelatedWork W1850488217 @default.