Matches in SemOpenAlex for { <https://semopenalex.org/work/W3131233483> ?p ?o ?g. }
- W3131233483 abstract "We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback, where the unknown transition probability function is a linear function of a given feature mapping. We propose an optimistic policy optimization algorithm with Bernstein bonus and show that it can achieve $tilde{O}(dHsqrt{T})$ regret, where $H$ is the length of the episode, $T$ is the number of interaction with the MDP and $d$ is the dimension of the feature mapping. Furthermore, we also prove a matching lower bound of $tilde{Omega}(dHsqrt{T})$ up to logarithmic factors. To the best of our knowledge, this is the first computationally efficient, nearly minimax optimal algorithm for adversarial Markov decision processes with linear function approximation." @default.
- W3131233483 created "2021-03-01" @default.
- W3131233483 creator A5051448391 @default.
- W3131233483 creator A5057534674 @default.
- W3131233483 creator A5080386620 @default.
- W3131233483 date "2021-02-17" @default.
- W3131233483 modified "2023-09-27" @default.
- W3131233483 title "Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation." @default.
- W3131233483 cites W100039866 @default.
- W3131233483 cites W107583932 @default.
- W3131233483 cites W1564755532 @default.
- W3131233483 cites W1570963478 @default.
- W3131233483 cites W1575592356 @default.
- W3131233483 cites W1771410628 @default.
- W3131233483 cites W2074680702 @default.
- W3131233483 cites W2119567691 @default.
- W3131233483 cites W2119717200 @default.
- W3131233483 cites W2119738618 @default.
- W3131233483 cites W2128477394 @default.
- W3131233483 cites W2130801532 @default.
- W3131233483 cites W2150234726 @default.
- W3131233483 cites W2155027007 @default.
- W3131233483 cites W2156211713 @default.
- W3131233483 cites W2157016390 @default.
- W3131233483 cites W2241126168 @default.
- W3131233483 cites W2545659366 @default.
- W3131233483 cites W2619268125 @default.
- W3131233483 cites W2736601468 @default.
- W3131233483 cites W2912399346 @default.
- W3131233483 cites W2945496654 @default.
- W3131233483 cites W2946284958 @default.
- W3131233483 cites W2949608212 @default.
- W3131233483 cites W2956123884 @default.
- W3131233483 cites W2963049774 @default.
- W3131233483 cites W2963866298 @default.
- W3131233483 cites W2965004202 @default.
- W3131233483 cites W2970355847 @default.
- W3131233483 cites W2971085818 @default.
- W3131233483 cites W2981322253 @default.
- W3131233483 cites W2991929641 @default.
- W3131233483 cites W3004970331 @default.
- W3131233483 cites W3008250277 @default.
- W3131233483 cites W3029753614 @default.
- W3131233483 cites W3034871777 @default.
- W3131233483 cites W3035273634 @default.
- W3131233483 cites W3035759338 @default.
- W3131233483 cites W3036849408 @default.
- W3131233483 cites W3037341018 @default.
- W3131233483 cites W3046395471 @default.
- W3131233483 cites W3107747824 @default.
- W3131233483 cites W3111437863 @default.
- W3131233483 cites W3129154373 @default.
- W3131233483 cites W2161853425 @default.
- W3131233483 hasPublicationYear "2021" @default.
- W3131233483 type Work @default.
- W3131233483 sameAs 3131233483 @default.
- W3131233483 citedByCount "1" @default.
- W3131233483 countsByYear W31312334832021 @default.
- W3131233483 crossrefType "posted-content" @default.
- W3131233483 hasAuthorship W3131233483A5051448391 @default.
- W3131233483 hasAuthorship W3131233483A5057534674 @default.
- W3131233483 hasAuthorship W3131233483A5080386620 @default.
- W3131233483 hasConcept C105795698 @default.
- W3131233483 hasConcept C106189395 @default.
- W3131233483 hasConcept C114614502 @default.
- W3131233483 hasConcept C118615104 @default.
- W3131233483 hasConcept C124101348 @default.
- W3131233483 hasConcept C126255220 @default.
- W3131233483 hasConcept C134306372 @default.
- W3131233483 hasConcept C14036430 @default.
- W3131233483 hasConcept C149728462 @default.
- W3131233483 hasConcept C154945302 @default.
- W3131233483 hasConcept C159886148 @default.
- W3131233483 hasConcept C165064840 @default.
- W3131233483 hasConcept C33676613 @default.
- W3131233483 hasConcept C33923547 @default.
- W3131233483 hasConcept C37736160 @default.
- W3131233483 hasConcept C39927690 @default.
- W3131233483 hasConcept C41008148 @default.
- W3131233483 hasConcept C50817715 @default.
- W3131233483 hasConcept C78458016 @default.
- W3131233483 hasConcept C86803240 @default.
- W3131233483 hasConcept C87117476 @default.
- W3131233483 hasConcept C97541855 @default.
- W3131233483 hasConceptScore W3131233483C105795698 @default.
- W3131233483 hasConceptScore W3131233483C106189395 @default.
- W3131233483 hasConceptScore W3131233483C114614502 @default.
- W3131233483 hasConceptScore W3131233483C118615104 @default.
- W3131233483 hasConceptScore W3131233483C124101348 @default.
- W3131233483 hasConceptScore W3131233483C126255220 @default.
- W3131233483 hasConceptScore W3131233483C134306372 @default.
- W3131233483 hasConceptScore W3131233483C14036430 @default.
- W3131233483 hasConceptScore W3131233483C149728462 @default.
- W3131233483 hasConceptScore W3131233483C154945302 @default.
- W3131233483 hasConceptScore W3131233483C159886148 @default.
- W3131233483 hasConceptScore W3131233483C165064840 @default.
- W3131233483 hasConceptScore W3131233483C33676613 @default.
- W3131233483 hasConceptScore W3131233483C33923547 @default.
- W3131233483 hasConceptScore W3131233483C37736160 @default.
- W3131233483 hasConceptScore W3131233483C39927690 @default.