Matches in SemOpenAlex for { <https://semopenalex.org/work/W4320560898> ?p ?o ?g. }
Showing items 1 to 65 of
65
with 100 items per page.
- W4320560898 abstract "This paper aims at presenting a new application of information geometry to reinforcement learning focusing on dynamic treatment resumes. In a standard framework of reinforcement learning, a Q-function is defined as the conditional expectation of a reward given a state and an action for a single-stage situation. We introduce an equivalence relation, called the policy equivalence, in the space of all the Q-functions. A class of information divergence is defined in the Q-function space for every stage. The main objective is to propose an estimator of the optimal policy function by a method of minimum information divergence based on a dataset of trajectories. In particular, we discuss the $gamma$-power divergence that is shown to have an advantageous property such that the $gamma$-power divergence between policy-equivalent Q-functions vanishes. This property essentially works to seek the optimal policy, which is discussed in a framework of a semiparametric model for the Q-function. The specific choices of power index $gamma$ give interesting relationships of the value function, and the geometric and harmonic means of the Q-function. A numerical experiment demonstrates the performance of the minimum $gamma$-power divergence method in the context of dynamic treatment regimes." @default.
- W4320560898 created "2023-02-15" @default.
- W4320560898 creator A5060169206 @default.
- W4320560898 date "2022-11-16" @default.
- W4320560898 modified "2023-09-28" @default.
- W4320560898 title "Minimum information divergence of Q-functions for dynamic treatment resumes" @default.
- W4320560898 doi "https://doi.org/10.48550/arxiv.2211.08741" @default.
- W4320560898 hasPublicationYear "2022" @default.
- W4320560898 type Work @default.
- W4320560898 citedByCount "0" @default.
- W4320560898 crossrefType "posted-content" @default.
- W4320560898 hasAuthorship W4320560898A5060169206 @default.
- W4320560898 hasBestOaLocation W43205608981 @default.
- W4320560898 hasConcept C105795698 @default.
- W4320560898 hasConcept C118615104 @default.
- W4320560898 hasConcept C126255220 @default.
- W4320560898 hasConcept C138885662 @default.
- W4320560898 hasConcept C14036430 @default.
- W4320560898 hasConcept C151730666 @default.
- W4320560898 hasConcept C154945302 @default.
- W4320560898 hasConcept C185429906 @default.
- W4320560898 hasConcept C207390915 @default.
- W4320560898 hasConcept C2779343474 @default.
- W4320560898 hasConcept C2780069185 @default.
- W4320560898 hasConcept C28826006 @default.
- W4320560898 hasConcept C33923547 @default.
- W4320560898 hasConcept C41008148 @default.
- W4320560898 hasConcept C41895202 @default.
- W4320560898 hasConcept C78458016 @default.
- W4320560898 hasConcept C86803240 @default.
- W4320560898 hasConcept C97541855 @default.
- W4320560898 hasConceptScore W4320560898C105795698 @default.
- W4320560898 hasConceptScore W4320560898C118615104 @default.
- W4320560898 hasConceptScore W4320560898C126255220 @default.
- W4320560898 hasConceptScore W4320560898C138885662 @default.
- W4320560898 hasConceptScore W4320560898C14036430 @default.
- W4320560898 hasConceptScore W4320560898C151730666 @default.
- W4320560898 hasConceptScore W4320560898C154945302 @default.
- W4320560898 hasConceptScore W4320560898C185429906 @default.
- W4320560898 hasConceptScore W4320560898C207390915 @default.
- W4320560898 hasConceptScore W4320560898C2779343474 @default.
- W4320560898 hasConceptScore W4320560898C2780069185 @default.
- W4320560898 hasConceptScore W4320560898C28826006 @default.
- W4320560898 hasConceptScore W4320560898C33923547 @default.
- W4320560898 hasConceptScore W4320560898C41008148 @default.
- W4320560898 hasConceptScore W4320560898C41895202 @default.
- W4320560898 hasConceptScore W4320560898C78458016 @default.
- W4320560898 hasConceptScore W4320560898C86803240 @default.
- W4320560898 hasConceptScore W4320560898C97541855 @default.
- W4320560898 hasLocation W43205608981 @default.
- W4320560898 hasOpenAccess W4320560898 @default.
- W4320560898 hasPrimaryLocation W43205608981 @default.
- W4320560898 hasRelatedWork W1576985819 @default.
- W4320560898 hasRelatedWork W1968589176 @default.
- W4320560898 hasRelatedWork W1999189895 @default.
- W4320560898 hasRelatedWork W2034640422 @default.
- W4320560898 hasRelatedWork W2152704622 @default.
- W4320560898 hasRelatedWork W2160184181 @default.
- W4320560898 hasRelatedWork W2166944917 @default.
- W4320560898 hasRelatedWork W2397896019 @default.
- W4320560898 hasRelatedWork W3090443406 @default.
- W4320560898 hasRelatedWork W4289791743 @default.
- W4320560898 isParatext "false" @default.
- W4320560898 isRetracted "false" @default.
- W4320560898 workType "article" @default.