Matches in SemOpenAlex for { <https://semopenalex.org/work/W3094564300> ?p ?o ?g. }
- W3094564300 abstract "We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs. The method is closely related to the classic Relative Entropy Policy Search (REPS) algorithm of Peters et al. (2010), with the key difference that our method introduces a Q-function that enables efficient exact model-free implementation. The main feature of our algorithm (called QREPS) is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error. We provide a practical saddle-point optimization method for minimizing this loss function and provide an error-propagation analysis that relates the quality of the individual updates to the performance of the output policy. Finally, we demonstrate the effectiveness of our method on a range of benchmark problems." @default.
- W3094564300 created "2020-10-29" @default.
- W3094564300 creator A5003040843 @default.
- W3094564300 creator A5038306778 @default.
- W3094564300 creator A5045489410 @default.
- W3094564300 creator A5077167635 @default.
- W3094564300 date "2020-10-21" @default.
- W3094564300 modified "2023-10-18" @default.
- W3094564300 title "Logistic Q-Learning" @default.
- W3094564300 cites W1499669280 @default.
- W3094564300 cites W1564755532 @default.
- W3094564300 cites W1570963478 @default.
- W3094564300 cites W1575592356 @default.
- W3094564300 cites W166862392 @default.
- W3094564300 cites W1745373831 @default.
- W3094564300 cites W1771410628 @default.
- W3094564300 cites W1889629917 @default.
- W3094564300 cites W1941248864 @default.
- W3094564300 cites W1964535365 @default.
- W3094564300 cites W1988526405 @default.
- W3094564300 cites W1994616650 @default.
- W3094564300 cites W2012587148 @default.
- W3094564300 cites W2016384870 @default.
- W3094564300 cites W2038497950 @default.
- W3094564300 cites W2040766536 @default.
- W3094564300 cites W2098774185 @default.
- W3094564300 cites W2119567691 @default.
- W3094564300 cites W2121863487 @default.
- W3094564300 cites W2130304665 @default.
- W3094564300 cites W2130801532 @default.
- W3094564300 cites W2145339207 @default.
- W3094564300 cites W2150234726 @default.
- W3094564300 cites W2153874061 @default.
- W3094564300 cites W2160698719 @default.
- W3094564300 cites W2170233819 @default.
- W3094564300 cites W2222696761 @default.
- W3094564300 cites W2605818517 @default.
- W3094564300 cites W2619268125 @default.
- W3094564300 cites W2736601468 @default.
- W3094564300 cites W2787938642 @default.
- W3094564300 cites W2806985155 @default.
- W3094564300 cites W2899771611 @default.
- W3094564300 cites W2907626093 @default.
- W3094564300 cites W2914920107 @default.
- W3094564300 cites W2945496654 @default.
- W3094564300 cites W2945624305 @default.
- W3094564300 cites W2949608212 @default.
- W3094564300 cites W2956123884 @default.
- W3094564300 cites W2962902376 @default.
- W3094564300 cites W2963325394 @default.
- W3094564300 cites W2963477884 @default.
- W3094564300 cites W2963884015 @default.
- W3094564300 cites W2964043796 @default.
- W3094564300 cites W2964121744 @default.
- W3094564300 cites W2995181668 @default.
- W3094564300 cites W3014137283 @default.
- W3094564300 cites W3029753614 @default.
- W3094564300 cites W3034335560 @default.
- W3094564300 cites W3034871777 @default.
- W3094564300 cites W3037435714 @default.
- W3094564300 cites W3038258253 @default.
- W3094564300 cites W3046395471 @default.
- W3094564300 cites W3046626913 @default.
- W3094564300 cites W3048357344 @default.
- W3094564300 cites W3091123329 @default.
- W3094564300 cites W3099050578 @default.
- W3094564300 cites W3101487584 @default.
- W3094564300 cites W3101679384 @default.
- W3094564300 cites W3101940057 @default.
- W3094564300 cites W3102715494 @default.
- W3094564300 cites W3157563228 @default.
- W3094564300 cites W568673721 @default.
- W3094564300 hasPublicationYear "2020" @default.
- W3094564300 type Work @default.
- W3094564300 sameAs 3094564300 @default.
- W3094564300 citedByCount "1" @default.
- W3094564300 countsByYear W30945643002021 @default.
- W3094564300 crossrefType "posted-content" @default.
- W3094564300 hasAuthorship W3094564300A5003040843 @default.
- W3094564300 hasAuthorship W3094564300A5038306778 @default.
- W3094564300 hasAuthorship W3094564300A5045489410 @default.
- W3094564300 hasAuthorship W3094564300A5077167635 @default.
- W3094564300 hasConcept C105795698 @default.
- W3094564300 hasConcept C106301342 @default.
- W3094564300 hasConcept C112680207 @default.
- W3094564300 hasConcept C11413529 @default.
- W3094564300 hasConcept C121332964 @default.
- W3094564300 hasConcept C126255220 @default.
- W3094564300 hasConcept C13280743 @default.
- W3094564300 hasConcept C139945424 @default.
- W3094564300 hasConcept C14036430 @default.
- W3094564300 hasConcept C145446738 @default.
- W3094564300 hasConcept C154945302 @default.
- W3094564300 hasConcept C159985019 @default.
- W3094564300 hasConcept C185798385 @default.
- W3094564300 hasConcept C192562407 @default.
- W3094564300 hasConcept C204323151 @default.
- W3094564300 hasConcept C205649164 @default.
- W3094564300 hasConcept C2524010 @default.
- W3094564300 hasConcept C26517878 @default.