Matches in SemOpenAlex for { <https://semopenalex.org/work/W1621708194> ?p ?o ?g. }
- W1621708194 abstract "In this theoretical paper we are concerned with the problem of learning a value function by a smooth general function approximator, to solve a deterministic episodic control problem in a large continuous state space. It is shown that learning the gradient of the value-function at every point along a trajectory generated by a greedy policy is a sufficient condition for the trajectory to be locally extremal, and often locally optimal, and we argue that this brings greater efficiency to value-function learning. This contrasts to traditional value-function learning in which the value-function must be learnt over the whole of state space. It is also proven that policy-gradient learning applied to a greedy policy on a value-function produces a weight update equivalent to a value-gradient weight update, which provides a surprising connection between these two alternative paradigms of reinforcement learning, and a convergence proof for control problems with a value function represented by a general smooth function approximator." @default.
- W1621708194 created "2016-06-24" @default.
- W1621708194 creator A5031842436 @default.
- W1621708194 creator A5074388845 @default.
- W1621708194 date "2011-01-02" @default.
- W1621708194 modified "2023-10-01" @default.
- W1621708194 title "The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning" @default.
- W1621708194 cites W1550922999 @default.
- W1621708194 cites W1646707810 @default.
- W1621708194 cites W1665890689 @default.
- W1621708194 cites W1949804828 @default.
- W1621708194 cites W2006903949 @default.
- W1621708194 cites W2088176067 @default.
- W1621708194 cites W2100677568 @default.
- W1621708194 cites W2113501460 @default.
- W1621708194 cites W2119717200 @default.
- W1621708194 cites W2121863487 @default.
- W1621708194 cites W2127107099 @default.
- W1621708194 cites W2139418546 @default.
- W1621708194 cites W2140219596 @default.
- W1621708194 cites W2143803054 @default.
- W1621708194 cites W2150355110 @default.
- W1621708194 cites W2155027007 @default.
- W1621708194 cites W2341171179 @default.
- W1621708194 cites W3011120880 @default.
- W1621708194 cites W3105500761 @default.
- W1621708194 hasPublicationYear "2011" @default.
- W1621708194 type Work @default.
- W1621708194 sameAs 1621708194 @default.
- W1621708194 citedByCount "4" @default.
- W1621708194 countsByYear W16217081942012 @default.
- W1621708194 countsByYear W16217081942013 @default.
- W1621708194 crossrefType "posted-content" @default.
- W1621708194 hasAuthorship W1621708194A5031842436 @default.
- W1621708194 hasAuthorship W1621708194A5074388845 @default.
- W1621708194 hasConcept C105795698 @default.
- W1621708194 hasConcept C106189395 @default.
- W1621708194 hasConcept C119857082 @default.
- W1621708194 hasConcept C121332964 @default.
- W1621708194 hasConcept C126255220 @default.
- W1621708194 hasConcept C1276947 @default.
- W1621708194 hasConcept C13662910 @default.
- W1621708194 hasConcept C14036430 @default.
- W1621708194 hasConcept C14646407 @default.
- W1621708194 hasConcept C153258448 @default.
- W1621708194 hasConcept C154945302 @default.
- W1621708194 hasConcept C159886148 @default.
- W1621708194 hasConcept C162324750 @default.
- W1621708194 hasConcept C188116033 @default.
- W1621708194 hasConcept C2776291640 @default.
- W1621708194 hasConcept C2777303404 @default.
- W1621708194 hasConcept C33923547 @default.
- W1621708194 hasConcept C41008148 @default.
- W1621708194 hasConcept C50522688 @default.
- W1621708194 hasConcept C50644808 @default.
- W1621708194 hasConcept C72434380 @default.
- W1621708194 hasConcept C78458016 @default.
- W1621708194 hasConcept C86803240 @default.
- W1621708194 hasConcept C97541855 @default.
- W1621708194 hasConceptScore W1621708194C105795698 @default.
- W1621708194 hasConceptScore W1621708194C106189395 @default.
- W1621708194 hasConceptScore W1621708194C119857082 @default.
- W1621708194 hasConceptScore W1621708194C121332964 @default.
- W1621708194 hasConceptScore W1621708194C126255220 @default.
- W1621708194 hasConceptScore W1621708194C1276947 @default.
- W1621708194 hasConceptScore W1621708194C13662910 @default.
- W1621708194 hasConceptScore W1621708194C14036430 @default.
- W1621708194 hasConceptScore W1621708194C14646407 @default.
- W1621708194 hasConceptScore W1621708194C153258448 @default.
- W1621708194 hasConceptScore W1621708194C154945302 @default.
- W1621708194 hasConceptScore W1621708194C159886148 @default.
- W1621708194 hasConceptScore W1621708194C162324750 @default.
- W1621708194 hasConceptScore W1621708194C188116033 @default.
- W1621708194 hasConceptScore W1621708194C2776291640 @default.
- W1621708194 hasConceptScore W1621708194C2777303404 @default.
- W1621708194 hasConceptScore W1621708194C33923547 @default.
- W1621708194 hasConceptScore W1621708194C41008148 @default.
- W1621708194 hasConceptScore W1621708194C50522688 @default.
- W1621708194 hasConceptScore W1621708194C50644808 @default.
- W1621708194 hasConceptScore W1621708194C72434380 @default.
- W1621708194 hasConceptScore W1621708194C78458016 @default.
- W1621708194 hasConceptScore W1621708194C86803240 @default.
- W1621708194 hasConceptScore W1621708194C97541855 @default.
- W1621708194 hasLocation W16217081941 @default.
- W1621708194 hasOpenAccess W1621708194 @default.
- W1621708194 hasPrimaryLocation W16217081941 @default.
- W1621708194 hasRelatedWork W1552148478 @default.
- W1621708194 hasRelatedWork W1665890689 @default.
- W1621708194 hasRelatedWork W1828381662 @default.
- W1621708194 hasRelatedWork W1971934487 @default.
- W1621708194 hasRelatedWork W1983518308 @default.
- W1621708194 hasRelatedWork W2031067035 @default.
- W1621708194 hasRelatedWork W2050838777 @default.
- W1621708194 hasRelatedWork W2100677568 @default.
- W1621708194 hasRelatedWork W2113501460 @default.
- W1621708194 hasRelatedWork W2120968583 @default.
- W1621708194 hasRelatedWork W2121863487 @default.
- W1621708194 hasRelatedWork W2132787074 @default.
- W1621708194 hasRelatedWork W2136064843 @default.
- W1621708194 hasRelatedWork W2154549708 @default.