Matches in SemOpenAlex for { <https://semopenalex.org/work/W2293432107> ?p ?o ?g. }
Showing items 1 to 82 of
82
with 100 items per page.
- W2293432107 abstract "Off-policy learning refers to the problem of learning the value function of a way of behaving, or policy, while following a different policy. Gradient-based off-policy learning algorithms, such as GTD and TDC/GQ, converge even when using function approximation and incremental updates. However, they have been developed for the case of a fixed behavior policy. In control problems, one would like to adapt the behavior policy over time to become more greedy with respect to the existing value function. In this paper, we present the first gradient-based learning algorithms for this problem, which rely on the framework of policy gradient in order to modify the behavior policy. We present derivations of the algorithms, a convergence theorem, and empirical evidence showing that they compare favorably to existing approaches." @default.
- W2293432107 created "2016-06-24" @default.
- W2293432107 creator A5006414380 @default.
- W2293432107 creator A5065836447 @default.
- W2293432107 date "2015-12-13" @default.
- W2293432107 modified "2023-09-27" @default.
- W2293432107 title "Policy Gradient Methods for Off-policy Control." @default.
- W2293432107 cites W1646707810 @default.
- W2293432107 cites W2075268401 @default.
- W2293432107 cites W2100677568 @default.
- W2293432107 cites W2137466452 @default.
- W2293432107 cites W2155027007 @default.
- W2293432107 cites W2312609093 @default.
- W2293432107 hasPublicationYear "2015" @default.
- W2293432107 type Work @default.
- W2293432107 sameAs 2293432107 @default.
- W2293432107 citedByCount "4" @default.
- W2293432107 countsByYear W22934321072016 @default.
- W2293432107 countsByYear W22934321072019 @default.
- W2293432107 countsByYear W22934321072021 @default.
- W2293432107 crossrefType "posted-content" @default.
- W2293432107 hasAuthorship W2293432107A5006414380 @default.
- W2293432107 hasAuthorship W2293432107A5065836447 @default.
- W2293432107 hasConcept C10138342 @default.
- W2293432107 hasConcept C119857082 @default.
- W2293432107 hasConcept C126255220 @default.
- W2293432107 hasConcept C139719470 @default.
- W2293432107 hasConcept C14036430 @default.
- W2293432107 hasConcept C14646407 @default.
- W2293432107 hasConcept C154945302 @default.
- W2293432107 hasConcept C162324750 @default.
- W2293432107 hasConcept C182306322 @default.
- W2293432107 hasConcept C2775924081 @default.
- W2293432107 hasConcept C2776291640 @default.
- W2293432107 hasConcept C2777303404 @default.
- W2293432107 hasConcept C33923547 @default.
- W2293432107 hasConcept C41008148 @default.
- W2293432107 hasConcept C78458016 @default.
- W2293432107 hasConcept C86803240 @default.
- W2293432107 hasConceptScore W2293432107C10138342 @default.
- W2293432107 hasConceptScore W2293432107C119857082 @default.
- W2293432107 hasConceptScore W2293432107C126255220 @default.
- W2293432107 hasConceptScore W2293432107C139719470 @default.
- W2293432107 hasConceptScore W2293432107C14036430 @default.
- W2293432107 hasConceptScore W2293432107C14646407 @default.
- W2293432107 hasConceptScore W2293432107C154945302 @default.
- W2293432107 hasConceptScore W2293432107C162324750 @default.
- W2293432107 hasConceptScore W2293432107C182306322 @default.
- W2293432107 hasConceptScore W2293432107C2775924081 @default.
- W2293432107 hasConceptScore W2293432107C2776291640 @default.
- W2293432107 hasConceptScore W2293432107C2777303404 @default.
- W2293432107 hasConceptScore W2293432107C33923547 @default.
- W2293432107 hasConceptScore W2293432107C41008148 @default.
- W2293432107 hasConceptScore W2293432107C78458016 @default.
- W2293432107 hasConceptScore W2293432107C86803240 @default.
- W2293432107 hasLocation W22934321071 @default.
- W2293432107 hasOpenAccess W2293432107 @default.
- W2293432107 hasPrimaryLocation W22934321071 @default.
- W2293432107 hasRelatedWork W1757796397 @default.
- W2293432107 hasRelatedWork W1951936954 @default.
- W2293432107 hasRelatedWork W2105675791 @default.
- W2293432107 hasRelatedWork W2186769294 @default.
- W2293432107 hasRelatedWork W2411690432 @default.
- W2293432107 hasRelatedWork W2511462892 @default.
- W2293432107 hasRelatedWork W2616964725 @default.
- W2293432107 hasRelatedWork W2784825028 @default.
- W2293432107 hasRelatedWork W2788366696 @default.
- W2293432107 hasRelatedWork W2963744705 @default.
- W2293432107 hasRelatedWork W2964986650 @default.
- W2293432107 hasRelatedWork W2971587637 @default.
- W2293432107 hasRelatedWork W2980404711 @default.
- W2293432107 hasRelatedWork W2995081787 @default.
- W2293432107 hasRelatedWork W3035928649 @default.
- W2293432107 hasRelatedWork W3038733325 @default.
- W2293432107 hasRelatedWork W3107926684 @default.
- W2293432107 hasRelatedWork W3121094130 @default.
- W2293432107 hasRelatedWork W3130449816 @default.
- W2293432107 hasRelatedWork W3212744543 @default.
- W2293432107 isParatext "false" @default.
- W2293432107 isRetracted "false" @default.
- W2293432107 magId "2293432107" @default.
- W2293432107 workType "article" @default.