Matches in SemOpenAlex for { <https://semopenalex.org/work/W4319323297> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4319323297 abstract "Many policy-based reinforcement learning (RL) algorithms can be viewed as instantiations of approximate policy iteration (PI), i.e., where policy improvement and policy evaluation are both performed approximately. In applications where the average reward objective is the meaningful performance metric, discounted reward formulations are often used with the discount factor being close to $1,$ which is equivalent to making the expected horizon very large. However, the corresponding theoretical bounds for error performance scale with the square of the horizon. Thus, even after dividing the total reward by the length of the horizon, the corresponding performance bounds for average reward problems go to infinity. Therefore, an open problem has been to obtain meaningful performance bounds for approximate PI and RL algorithms for the average-reward setting. In this paper, we solve this open problem by obtaining the first finite-time error bounds for average-reward MDPs, and show that the asymptotic error goes to zero in the limit as policy evaluation and policy improvement errors go to zero." @default.
- W4319323297 created "2023-02-08" @default.
- W4319323297 creator A5001386822 @default.
- W4319323297 creator A5013561789 @default.
- W4319323297 creator A5089114577 @default.
- W4319323297 date "2023-02-02" @default.
- W4319323297 modified "2023-09-27" @default.
- W4319323297 title "Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms" @default.
- W4319323297 doi "https://doi.org/10.48550/arxiv.2302.01450" @default.
- W4319323297 hasPublicationYear "2023" @default.
- W4319323297 type Work @default.
- W4319323297 citedByCount "0" @default.
- W4319323297 crossrefType "posted-content" @default.
- W4319323297 hasAuthorship W4319323297A5001386822 @default.
- W4319323297 hasAuthorship W4319323297A5013561789 @default.
- W4319323297 hasAuthorship W4319323297A5089114577 @default.
- W4319323297 hasBestOaLocation W43193232971 @default.
- W4319323297 hasConcept C10138342 @default.
- W4319323297 hasConcept C11413529 @default.
- W4319323297 hasConcept C126255220 @default.
- W4319323297 hasConcept C134306372 @default.
- W4319323297 hasConcept C138885662 @default.
- W4319323297 hasConcept C151201525 @default.
- W4319323297 hasConcept C154945302 @default.
- W4319323297 hasConcept C159176650 @default.
- W4319323297 hasConcept C162324750 @default.
- W4319323297 hasConcept C176217482 @default.
- W4319323297 hasConcept C187736073 @default.
- W4319323297 hasConcept C21547014 @default.
- W4319323297 hasConcept C2524010 @default.
- W4319323297 hasConcept C2780813799 @default.
- W4319323297 hasConcept C2780898871 @default.
- W4319323297 hasConcept C33923547 @default.
- W4319323297 hasConcept C41008148 @default.
- W4319323297 hasConcept C41895202 @default.
- W4319323297 hasConcept C6177178 @default.
- W4319323297 hasConcept C7321624 @default.
- W4319323297 hasConcept C97541855 @default.
- W4319323297 hasConceptScore W4319323297C10138342 @default.
- W4319323297 hasConceptScore W4319323297C11413529 @default.
- W4319323297 hasConceptScore W4319323297C126255220 @default.
- W4319323297 hasConceptScore W4319323297C134306372 @default.
- W4319323297 hasConceptScore W4319323297C138885662 @default.
- W4319323297 hasConceptScore W4319323297C151201525 @default.
- W4319323297 hasConceptScore W4319323297C154945302 @default.
- W4319323297 hasConceptScore W4319323297C159176650 @default.
- W4319323297 hasConceptScore W4319323297C162324750 @default.
- W4319323297 hasConceptScore W4319323297C176217482 @default.
- W4319323297 hasConceptScore W4319323297C187736073 @default.
- W4319323297 hasConceptScore W4319323297C21547014 @default.
- W4319323297 hasConceptScore W4319323297C2524010 @default.
- W4319323297 hasConceptScore W4319323297C2780813799 @default.
- W4319323297 hasConceptScore W4319323297C2780898871 @default.
- W4319323297 hasConceptScore W4319323297C33923547 @default.
- W4319323297 hasConceptScore W4319323297C41008148 @default.
- W4319323297 hasConceptScore W4319323297C41895202 @default.
- W4319323297 hasConceptScore W4319323297C6177178 @default.
- W4319323297 hasConceptScore W4319323297C7321624 @default.
- W4319323297 hasConceptScore W4319323297C97541855 @default.
- W4319323297 hasLocation W43193232971 @default.
- W4319323297 hasOpenAccess W4319323297 @default.
- W4319323297 hasPrimaryLocation W43193232971 @default.
- W4319323297 hasRelatedWork W1549353711 @default.
- W4319323297 hasRelatedWork W2361111626 @default.
- W4319323297 hasRelatedWork W2363168345 @default.
- W4319323297 hasRelatedWork W2949618782 @default.
- W4319323297 hasRelatedWork W3124126274 @default.
- W4319323297 hasRelatedWork W4280532363 @default.
- W4319323297 hasRelatedWork W4281635306 @default.
- W4319323297 hasRelatedWork W4285791936 @default.
- W4319323297 hasRelatedWork W4304144362 @default.
- W4319323297 hasRelatedWork W4319323297 @default.
- W4319323297 isParatext "false" @default.
- W4319323297 isRetracted "false" @default.
- W4319323297 workType "article" @default.