Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386649063> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4386649063 abstract "The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense; (iii) The approach can be generalized to a range of performance criteria, and it is found that variance can be reduced by considering ``relative'' dynamic programming equations; (iv) The theory is illustrated with an application to a classical inventory control problem." @default.
- W4386649063 created "2023-09-13" @default.
- W4386649063 creator A5047988825 @default.
- W4386649063 creator A5080310406 @default.
- W4386649063 date "2023-09-10" @default.
- W4386649063 modified "2023-10-18" @default.
- W4386649063 title "Convex Q Learning in a Stochastic Environment: Extended Version" @default.
- W4386649063 doi "https://doi.org/10.48550/arxiv.2309.05105" @default.
- W4386649063 hasPublicationYear "2023" @default.
- W4386649063 type Work @default.
- W4386649063 citedByCount "0" @default.
- W4386649063 crossrefType "posted-content" @default.
- W4386649063 hasAuthorship W4386649063A5047988825 @default.
- W4386649063 hasAuthorship W4386649063A5080310406 @default.
- W4386649063 hasBestOaLocation W43866490631 @default.
- W4386649063 hasConcept C112680207 @default.
- W4386649063 hasConcept C12108790 @default.
- W4386649063 hasConcept C126255220 @default.
- W4386649063 hasConcept C134306372 @default.
- W4386649063 hasConcept C145446738 @default.
- W4386649063 hasConcept C15744967 @default.
- W4386649063 hasConcept C157972887 @default.
- W4386649063 hasConcept C159985019 @default.
- W4386649063 hasConcept C162324750 @default.
- W4386649063 hasConcept C192562407 @default.
- W4386649063 hasConcept C204323151 @default.
- W4386649063 hasConcept C2524010 @default.
- W4386649063 hasConcept C2776029896 @default.
- W4386649063 hasConcept C2777303404 @default.
- W4386649063 hasConcept C28826006 @default.
- W4386649063 hasConcept C33923547 @default.
- W4386649063 hasConcept C34388435 @default.
- W4386649063 hasConcept C41008148 @default.
- W4386649063 hasConcept C50522688 @default.
- W4386649063 hasConcept C77805123 @default.
- W4386649063 hasConceptScore W4386649063C112680207 @default.
- W4386649063 hasConceptScore W4386649063C12108790 @default.
- W4386649063 hasConceptScore W4386649063C126255220 @default.
- W4386649063 hasConceptScore W4386649063C134306372 @default.
- W4386649063 hasConceptScore W4386649063C145446738 @default.
- W4386649063 hasConceptScore W4386649063C15744967 @default.
- W4386649063 hasConceptScore W4386649063C157972887 @default.
- W4386649063 hasConceptScore W4386649063C159985019 @default.
- W4386649063 hasConceptScore W4386649063C162324750 @default.
- W4386649063 hasConceptScore W4386649063C192562407 @default.
- W4386649063 hasConceptScore W4386649063C204323151 @default.
- W4386649063 hasConceptScore W4386649063C2524010 @default.
- W4386649063 hasConceptScore W4386649063C2776029896 @default.
- W4386649063 hasConceptScore W4386649063C2777303404 @default.
- W4386649063 hasConceptScore W4386649063C28826006 @default.
- W4386649063 hasConceptScore W4386649063C33923547 @default.
- W4386649063 hasConceptScore W4386649063C34388435 @default.
- W4386649063 hasConceptScore W4386649063C41008148 @default.
- W4386649063 hasConceptScore W4386649063C50522688 @default.
- W4386649063 hasConceptScore W4386649063C77805123 @default.
- W4386649063 hasLocation W43866490631 @default.
- W4386649063 hasOpenAccess W4386649063 @default.
- W4386649063 hasPrimaryLocation W43866490631 @default.
- W4386649063 hasRelatedWork W1995757014 @default.
- W4386649063 hasRelatedWork W2018944617 @default.
- W4386649063 hasRelatedWork W2081395119 @default.
- W4386649063 hasRelatedWork W2589063463 @default.
- W4386649063 hasRelatedWork W2620834727 @default.
- W4386649063 hasRelatedWork W2811008754 @default.
- W4386649063 hasRelatedWork W2953109382 @default.
- W4386649063 hasRelatedWork W3039239075 @default.
- W4386649063 hasRelatedWork W4297899479 @default.
- W4386649063 hasRelatedWork W4300055019 @default.
- W4386649063 isParatext "false" @default.
- W4386649063 isRetracted "false" @default.
- W4386649063 workType "article" @default.