Matches in SemOpenAlex for { <https://semopenalex.org/work/W3199422992> ?p ?o ?g. }
- W3199422992 abstract "We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints. Various learning applications with constraints, such as robotics, do not allow for policies that can violate constraints. To this end, we propose a model-based learning algorithm that achieves zero constraint violations. To obtain this result, we assume that the concave objective and the convex constraints have a solution interior to the set of feasible occupation measures. We then solve a tighter optimization problem to ensure that the constraints are never violated despite the imprecise model knowledge and model stochasticity. We also propose a novel Bellman error based analysis for tabular infinite-horizon setups which allows to analyse stochastic policies. Combining the Bellman error based analysis and tighter optimization equation, for $T$ interactions with the environment, we obtain a regret guarantee for objective which grows as $Tilde{O}(1/sqrt{T})$, excluding other factors." @default.
- W3199422992 created "2021-09-27" @default.
- W3199422992 creator A5051677938 @default.
- W3199422992 creator A5064822688 @default.
- W3199422992 creator A5075904309 @default.
- W3199422992 date "2021-09-12" @default.
- W3199422992 modified "2023-09-27" @default.
- W3199422992 title "Concave Utility Reinforcement Learning with Zero-Constraint Violations." @default.
- W3199422992 cites W1518931405 @default.
- W3199422992 cites W1575592356 @default.
- W3199422992 cites W1850488217 @default.
- W3199422992 cites W2119567691 @default.
- W3199422992 cites W2121863487 @default.
- W3199422992 cites W2145296449 @default.
- W3199422992 cites W2145339207 @default.
- W3199422992 cites W2160690679 @default.
- W3199422992 cites W21934178 @default.
- W3199422992 cites W2293415235 @default.
- W3199422992 cites W2736601468 @default.
- W3199422992 cites W2768908787 @default.
- W3199422992 cites W2769648743 @default.
- W3199422992 cites W2822752092 @default.
- W3199422992 cites W2902298341 @default.
- W3199422992 cites W2962734844 @default.
- W3199422992 cites W2962803570 @default.
- W3199422992 cites W2962902376 @default.
- W3199422992 cites W2963049774 @default.
- W3199422992 cites W2963092340 @default.
- W3199422992 cites W2963326510 @default.
- W3199422992 cites W2963767098 @default.
- W3199422992 cites W2964222567 @default.
- W3199422992 cites W2971604276 @default.
- W3199422992 cites W2979211489 @default.
- W3199422992 cites W2990118529 @default.
- W3199422992 cites W3008910712 @default.
- W3199422992 cites W3009922106 @default.
- W3199422992 cites W3028821797 @default.
- W3199422992 cites W3034608738 @default.
- W3199422992 cites W3035939465 @default.
- W3199422992 cites W3098983831 @default.
- W3199422992 cites W3101517963 @default.
- W3199422992 cites W3105702366 @default.
- W3199422992 cites W3128962298 @default.
- W3199422992 cites W3136466665 @default.
- W3199422992 cites W3158548275 @default.
- W3199422992 cites W3169315127 @default.
- W3199422992 cites W3171291125 @default.
- W3199422992 cites W3174110104 @default.
- W3199422992 hasPublicationYear "2021" @default.
- W3199422992 type Work @default.
- W3199422992 sameAs 3199422992 @default.
- W3199422992 citedByCount "0" @default.
- W3199422992 crossrefType "posted-content" @default.
- W3199422992 hasAuthorship W3199422992A5051677938 @default.
- W3199422992 hasAuthorship W3199422992A5064822688 @default.
- W3199422992 hasAuthorship W3199422992A5075904309 @default.
- W3199422992 hasConcept C112680207 @default.
- W3199422992 hasConcept C119857082 @default.
- W3199422992 hasConcept C126255220 @default.
- W3199422992 hasConcept C138885662 @default.
- W3199422992 hasConcept C154945302 @default.
- W3199422992 hasConcept C159176650 @default.
- W3199422992 hasConcept C177264268 @default.
- W3199422992 hasConcept C199360897 @default.
- W3199422992 hasConcept C2524010 @default.
- W3199422992 hasConcept C2776036281 @default.
- W3199422992 hasConcept C2780813799 @default.
- W3199422992 hasConcept C33923547 @default.
- W3199422992 hasConcept C41008148 @default.
- W3199422992 hasConcept C41895202 @default.
- W3199422992 hasConcept C50817715 @default.
- W3199422992 hasConcept C97541855 @default.
- W3199422992 hasConceptScore W3199422992C112680207 @default.
- W3199422992 hasConceptScore W3199422992C119857082 @default.
- W3199422992 hasConceptScore W3199422992C126255220 @default.
- W3199422992 hasConceptScore W3199422992C138885662 @default.
- W3199422992 hasConceptScore W3199422992C154945302 @default.
- W3199422992 hasConceptScore W3199422992C159176650 @default.
- W3199422992 hasConceptScore W3199422992C177264268 @default.
- W3199422992 hasConceptScore W3199422992C199360897 @default.
- W3199422992 hasConceptScore W3199422992C2524010 @default.
- W3199422992 hasConceptScore W3199422992C2776036281 @default.
- W3199422992 hasConceptScore W3199422992C2780813799 @default.
- W3199422992 hasConceptScore W3199422992C33923547 @default.
- W3199422992 hasConceptScore W3199422992C41008148 @default.
- W3199422992 hasConceptScore W3199422992C41895202 @default.
- W3199422992 hasConceptScore W3199422992C50817715 @default.
- W3199422992 hasConceptScore W3199422992C97541855 @default.
- W3199422992 hasLocation W31994229921 @default.
- W3199422992 hasOpenAccess W3199422992 @default.
- W3199422992 hasPrimaryLocation W31994229921 @default.
- W3199422992 hasRelatedWork W2062167629 @default.
- W3199422992 hasRelatedWork W2080717363 @default.
- W3199422992 hasRelatedWork W2122656899 @default.
- W3199422992 hasRelatedWork W2289655975 @default.
- W3199422992 hasRelatedWork W2735622867 @default.
- W3199422992 hasRelatedWork W2796829859 @default.
- W3199422992 hasRelatedWork W2889185350 @default.
- W3199422992 hasRelatedWork W2889634868 @default.
- W3199422992 hasRelatedWork W2947843463 @default.