Matches in SemOpenAlex for { <https://semopenalex.org/work/W2188623979> ?p ?o ?g. }
Showing items 1 to 91 of
91
with 100 items per page.
- W2188623979 abstract "We consider the learning problem under an online Markov decision process (MDP), which is aimed at learning the time-dependent decision-making policy of an agent that minimizes the regret — the difference from the best fixed policy. The difficulty of online MDP learning is that the reward function changes over time. In this paper, we show that a simple online policy gradient algorithm achieves regret O(√T) for T steps under a certain concavity assumption and O(logT) under a strong concavity assumption. To the best of our knowledge, this is the first work to give an online MDP algorithm that can handle continuous state, action, and parameter spaces with guarantee. We also illustrate the behavior of the online policy gradient method through experiments." @default.
- W2188623979 created "2016-06-24" @default.
- W2188623979 creator A5018051479 @default.
- W2188623979 creator A5025048437 @default.
- W2188623979 creator A5063407210 @default.
- W2188623979 creator A5072009881 @default.
- W2188623979 date "2014-01-01" @default.
- W2188623979 modified "2023-09-27" @default.
- W2188623979 title "An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions" @default.
- W2188623979 cites W2074680702 @default.
- W2188623979 cites W2119717200 @default.
- W2188623979 cites W2127107099 @default.
- W2188623979 cites W2129160848 @default.
- W2188623979 cites W2156211713 @default.
- W2188623979 cites W4214717370 @default.
- W2188623979 doi "https://doi.org/10.1007/978-3-662-44851-9_23" @default.
- W2188623979 hasPublicationYear "2014" @default.
- W2188623979 type Work @default.
- W2188623979 sameAs 2188623979 @default.
- W2188623979 citedByCount "1" @default.
- W2188623979 countsByYear W21886239792016 @default.
- W2188623979 crossrefType "book-chapter" @default.
- W2188623979 hasAuthorship W2188623979A5018051479 @default.
- W2188623979 hasAuthorship W2188623979A5025048437 @default.
- W2188623979 hasAuthorship W2188623979A5063407210 @default.
- W2188623979 hasAuthorship W2188623979A5072009881 @default.
- W2188623979 hasConcept C105795698 @default.
- W2188623979 hasConcept C106189395 @default.
- W2188623979 hasConcept C111919701 @default.
- W2188623979 hasConcept C11413529 @default.
- W2188623979 hasConcept C115680565 @default.
- W2188623979 hasConcept C119857082 @default.
- W2188623979 hasConcept C126255220 @default.
- W2188623979 hasConcept C136764020 @default.
- W2188623979 hasConcept C154945302 @default.
- W2188623979 hasConcept C159886148 @default.
- W2188623979 hasConcept C188116033 @default.
- W2188623979 hasConcept C196921405 @default.
- W2188623979 hasConcept C2986087404 @default.
- W2188623979 hasConcept C33923547 @default.
- W2188623979 hasConcept C41008148 @default.
- W2188623979 hasConcept C50817715 @default.
- W2188623979 hasConcept C97541855 @default.
- W2188623979 hasConcept C98045186 @default.
- W2188623979 hasConcept C98763669 @default.
- W2188623979 hasConceptScore W2188623979C105795698 @default.
- W2188623979 hasConceptScore W2188623979C106189395 @default.
- W2188623979 hasConceptScore W2188623979C111919701 @default.
- W2188623979 hasConceptScore W2188623979C11413529 @default.
- W2188623979 hasConceptScore W2188623979C115680565 @default.
- W2188623979 hasConceptScore W2188623979C119857082 @default.
- W2188623979 hasConceptScore W2188623979C126255220 @default.
- W2188623979 hasConceptScore W2188623979C136764020 @default.
- W2188623979 hasConceptScore W2188623979C154945302 @default.
- W2188623979 hasConceptScore W2188623979C159886148 @default.
- W2188623979 hasConceptScore W2188623979C188116033 @default.
- W2188623979 hasConceptScore W2188623979C196921405 @default.
- W2188623979 hasConceptScore W2188623979C2986087404 @default.
- W2188623979 hasConceptScore W2188623979C33923547 @default.
- W2188623979 hasConceptScore W2188623979C41008148 @default.
- W2188623979 hasConceptScore W2188623979C50817715 @default.
- W2188623979 hasConceptScore W2188623979C97541855 @default.
- W2188623979 hasConceptScore W2188623979C98045186 @default.
- W2188623979 hasConceptScore W2188623979C98763669 @default.
- W2188623979 hasLocation W21886239791 @default.
- W2188623979 hasOpenAccess W2188623979 @default.
- W2188623979 hasPrimaryLocation W21886239791 @default.
- W2188623979 hasRelatedWork W1515425343 @default.
- W2188623979 hasRelatedWork W1531725372 @default.
- W2188623979 hasRelatedWork W1587845729 @default.
- W2188623979 hasRelatedWork W1824315332 @default.
- W2188623979 hasRelatedWork W191780540 @default.
- W2188623979 hasRelatedWork W1964650011 @default.
- W2188623979 hasRelatedWork W199494732 @default.
- W2188623979 hasRelatedWork W2012045703 @default.
- W2188623979 hasRelatedWork W2141892028 @default.
- W2188623979 hasRelatedWork W2149943599 @default.
- W2188623979 hasRelatedWork W2321325624 @default.
- W2188623979 hasRelatedWork W2559970185 @default.
- W2188623979 hasRelatedWork W2744215939 @default.
- W2188623979 hasRelatedWork W2768394776 @default.
- W2188623979 hasRelatedWork W3004876891 @default.
- W2188623979 hasRelatedWork W3004976379 @default.
- W2188623979 hasRelatedWork W3006087719 @default.
- W2188623979 hasRelatedWork W3045247449 @default.
- W2188623979 hasRelatedWork W3105876063 @default.
- W2188623979 hasRelatedWork W70930922 @default.
- W2188623979 isParatext "false" @default.
- W2188623979 isRetracted "false" @default.
- W2188623979 magId "2188623979" @default.
- W2188623979 workType "book-chapter" @default.