Matches in SemOpenAlex for { <https://semopenalex.org/work/W4200424214> ?p ?o ?g. }
- W4200424214 endingPage "793" @default.
- W4200424214 startingPage "765" @default.
- W4200424214 abstract "Abstract Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision-making problems. The goodness of a policy is measured by its value function starting from some initial state. The focus of this paper was to construct confidence intervals (CIs) for a policy’s value in infinite horizon settings where the number of decision points diverges to infinity. We propose to model the action-value state function (Q-function) associated with a policy based on series/sieve method to derive its confidence interval. When the target policy depends on the observed data as well, we propose a SequentiAl Value Evaluation (SAVE) method to recursively update the estimated policy and its value estimator. As long as either the number of trajectories or the number of decision points diverges to infinity, we show that the proposed CI achieves nominal coverage even in cases where the optimal policy is not unique. Simulation studies are conducted to back up our theoretical findings. We apply the proposed method to a dataset from mobile health studies and find that reinforcement learning algorithms could help improve patient’s health status. A Python implementation of the proposed procedure is available at https://github.com/shengzhang37/SAVE." @default.
- W4200424214 created "2021-12-31" @default.
- W4200424214 creator A5010775608 @default.
- W4200424214 creator A5025970743 @default.
- W4200424214 creator A5059203764 @default.
- W4200424214 creator A5089012283 @default.
- W4200424214 date "2021-12-22" @default.
- W4200424214 modified "2023-10-12" @default.
- W4200424214 title "Statistical Inference of the Value Function for Reinforcement Learning in Infinite-Horizon Settings" @default.
- W4200424214 cites W166862392 @default.
- W4200424214 cites W1996437515 @default.
- W4200424214 cites W2038146650 @default.
- W4200424214 cites W2043919728 @default.
- W4200424214 cites W2044974898 @default.
- W4200424214 cites W2046262211 @default.
- W4200424214 cites W2076051662 @default.
- W4200424214 cites W2100967164 @default.
- W4200424214 cites W2116034036 @default.
- W4200424214 cites W2124267516 @default.
- W4200424214 cites W2145339207 @default.
- W4200424214 cites W2165612525 @default.
- W4200424214 cites W2207219039 @default.
- W4200424214 cites W2257979135 @default.
- W4200424214 cites W2273088453 @default.
- W4200424214 cites W2291713832 @default.
- W4200424214 cites W2334782222 @default.
- W4200424214 cites W2411613392 @default.
- W4200424214 cites W2788125442 @default.
- W4200424214 cites W2802230479 @default.
- W4200424214 cites W2802737765 @default.
- W4200424214 cites W2808810245 @default.
- W4200424214 cites W2889748820 @default.
- W4200424214 cites W2910754418 @default.
- W4200424214 cites W2963735256 @default.
- W4200424214 cites W3098635105 @default.
- W4200424214 cites W3106017199 @default.
- W4200424214 doi "https://doi.org/10.1111/rssb.12465" @default.
- W4200424214 hasPublicationYear "2021" @default.
- W4200424214 type Work @default.
- W4200424214 citedByCount "5" @default.
- W4200424214 countsByYear W42004242142022 @default.
- W4200424214 countsByYear W42004242142023 @default.
- W4200424214 crossrefType "journal-article" @default.
- W4200424214 hasAuthorship W4200424214A5010775608 @default.
- W4200424214 hasAuthorship W4200424214A5025970743 @default.
- W4200424214 hasAuthorship W4200424214A5059203764 @default.
- W4200424214 hasAuthorship W4200424214A5089012283 @default.
- W4200424214 hasBestOaLocation W42004242142 @default.
- W4200424214 hasConcept C105795698 @default.
- W4200424214 hasConcept C106189395 @default.
- W4200424214 hasConcept C119857082 @default.
- W4200424214 hasConcept C126255220 @default.
- W4200424214 hasConcept C14036430 @default.
- W4200424214 hasConcept C14646407 @default.
- W4200424214 hasConcept C154945302 @default.
- W4200424214 hasConcept C159886148 @default.
- W4200424214 hasConcept C185429906 @default.
- W4200424214 hasConcept C2776214188 @default.
- W4200424214 hasConcept C2776291640 @default.
- W4200424214 hasConcept C33923547 @default.
- W4200424214 hasConcept C41008148 @default.
- W4200424214 hasConcept C78458016 @default.
- W4200424214 hasConcept C86803240 @default.
- W4200424214 hasConcept C97541855 @default.
- W4200424214 hasConceptScore W4200424214C105795698 @default.
- W4200424214 hasConceptScore W4200424214C106189395 @default.
- W4200424214 hasConceptScore W4200424214C119857082 @default.
- W4200424214 hasConceptScore W4200424214C126255220 @default.
- W4200424214 hasConceptScore W4200424214C14036430 @default.
- W4200424214 hasConceptScore W4200424214C14646407 @default.
- W4200424214 hasConceptScore W4200424214C154945302 @default.
- W4200424214 hasConceptScore W4200424214C159886148 @default.
- W4200424214 hasConceptScore W4200424214C185429906 @default.
- W4200424214 hasConceptScore W4200424214C2776214188 @default.
- W4200424214 hasConceptScore W4200424214C2776291640 @default.
- W4200424214 hasConceptScore W4200424214C33923547 @default.
- W4200424214 hasConceptScore W4200424214C41008148 @default.
- W4200424214 hasConceptScore W4200424214C78458016 @default.
- W4200424214 hasConceptScore W4200424214C86803240 @default.
- W4200424214 hasConceptScore W4200424214C97541855 @default.
- W4200424214 hasFunder F4320306076 @default.
- W4200424214 hasIssue "3" @default.
- W4200424214 hasLocation W42004242141 @default.
- W4200424214 hasLocation W42004242142 @default.
- W4200424214 hasLocation W42004242143 @default.
- W4200424214 hasOpenAccess W4200424214 @default.
- W4200424214 hasPrimaryLocation W42004242141 @default.
- W4200424214 hasRelatedWork W2152670157 @default.
- W4200424214 hasRelatedWork W2172425052 @default.
- W4200424214 hasRelatedWork W2341346307 @default.
- W4200424214 hasRelatedWork W2373808749 @default.
- W4200424214 hasRelatedWork W2386410636 @default.
- W4200424214 hasRelatedWork W2808418668 @default.
- W4200424214 hasRelatedWork W3045510440 @default.
- W4200424214 hasRelatedWork W3115089987 @default.
- W4200424214 hasRelatedWork W3213838085 @default.
- W4200424214 hasRelatedWork W4308702637 @default.
- W4200424214 hasVolume "84" @default.