Matches in SemOpenAlex for { <https://semopenalex.org/work/W3198568807> ?p ?o ?g. }
Showing items 1 to 94 of
94
with 100 items per page.
- W3198568807 abstract "Deterministic policies demonstrate substantial empirical success over their stochastic counterparts as they remove a level of randomness in Policy Gradient (PG) methods when applied to stochastic search problems involving Markov decision processes. However, current implementations require the use of state-action value ( <tex xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>$Q$</tex> -function) approximators, also known as critics, to obtain estimates of the associated policy-reward gradient. In this work, we propose the use of two-point stochastic evaluations to obtain gradient estimates of a smoothed <tex xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>$Q$</tex> -function surrogate, constructed by evaluating pairs of the <tex xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>$Q$</tex> -function at low-dimensional, randomized initial action perturbations. This procedure lifts the dependence on a critic and restores true model-free policy learning, and with provable algorithmic stability. In fact, our finite complexity bounds improve upon existing results by up to 2 orders of magnitude in terms of iteration complexity, and by up to 3/2 orders of magnitude in terms of sample complexity. Simulation results on an agent navigation problem showcase the effectiveness of our proposed algorithm in a practical setting, as well." @default.
- W3198568807 created "2021-09-13" @default.
- W3198568807 creator A5029243115 @default.
- W3198568807 creator A5055323351 @default.
- W3198568807 creator A5078862959 @default.
- W3198568807 creator A5091493591 @default.
- W3198568807 date "2021-07-12" @default.
- W3198568807 modified "2023-09-26" @default.
- W3198568807 title "Actor-only Deterministic Policy Gradient via Zeroth-order Gradient Oracles in Action Space" @default.
- W3198568807 cites W1771410628 @default.
- W3198568807 cites W1777239053 @default.
- W3198568807 cites W1977655452 @default.
- W3198568807 cites W2070315235 @default.
- W3198568807 cites W2108682071 @default.
- W3198568807 cites W2149479912 @default.
- W3198568807 cites W2155027007 @default.
- W3198568807 cites W2165150801 @default.
- W3198568807 cites W2171830216 @default.
- W3198568807 cites W2173248099 @default.
- W3198568807 cites W2575705757 @default.
- W3198568807 cites W2611484353 @default.
- W3198568807 cites W2696558042 @default.
- W3198568807 cites W2771144749 @default.
- W3198568807 cites W2787938642 @default.
- W3198568807 cites W2883094398 @default.
- W3198568807 cites W2886474253 @default.
- W3198568807 cites W2912541037 @default.
- W3198568807 cites W2951915386 @default.
- W3198568807 cites W2981237928 @default.
- W3198568807 cites W2983476703 @default.
- W3198568807 cites W2996611387 @default.
- W3198568807 cites W2998481680 @default.
- W3198568807 cites W3034675169 @default.
- W3198568807 cites W3092621452 @default.
- W3198568807 cites W3136903997 @default.
- W3198568807 doi "https://doi.org/10.1109/isit45174.2021.9518023" @default.
- W3198568807 hasPublicationYear "2021" @default.
- W3198568807 type Work @default.
- W3198568807 sameAs 3198568807 @default.
- W3198568807 citedByCount "0" @default.
- W3198568807 crossrefType "proceedings-article" @default.
- W3198568807 hasAuthorship W3198568807A5029243115 @default.
- W3198568807 hasAuthorship W3198568807A5055323351 @default.
- W3198568807 hasAuthorship W3198568807A5078862959 @default.
- W3198568807 hasAuthorship W3198568807A5091493591 @default.
- W3198568807 hasConcept C105795698 @default.
- W3198568807 hasConcept C106189395 @default.
- W3198568807 hasConcept C112972136 @default.
- W3198568807 hasConcept C11413529 @default.
- W3198568807 hasConcept C119857082 @default.
- W3198568807 hasConcept C121332964 @default.
- W3198568807 hasConcept C125112378 @default.
- W3198568807 hasConcept C126255220 @default.
- W3198568807 hasConcept C14036430 @default.
- W3198568807 hasConcept C159886148 @default.
- W3198568807 hasConcept C2780791683 @default.
- W3198568807 hasConcept C33923547 @default.
- W3198568807 hasConcept C41008148 @default.
- W3198568807 hasConcept C62520636 @default.
- W3198568807 hasConcept C78458016 @default.
- W3198568807 hasConcept C86803240 @default.
- W3198568807 hasConceptScore W3198568807C105795698 @default.
- W3198568807 hasConceptScore W3198568807C106189395 @default.
- W3198568807 hasConceptScore W3198568807C112972136 @default.
- W3198568807 hasConceptScore W3198568807C11413529 @default.
- W3198568807 hasConceptScore W3198568807C119857082 @default.
- W3198568807 hasConceptScore W3198568807C121332964 @default.
- W3198568807 hasConceptScore W3198568807C125112378 @default.
- W3198568807 hasConceptScore W3198568807C126255220 @default.
- W3198568807 hasConceptScore W3198568807C14036430 @default.
- W3198568807 hasConceptScore W3198568807C159886148 @default.
- W3198568807 hasConceptScore W3198568807C2780791683 @default.
- W3198568807 hasConceptScore W3198568807C33923547 @default.
- W3198568807 hasConceptScore W3198568807C41008148 @default.
- W3198568807 hasConceptScore W3198568807C62520636 @default.
- W3198568807 hasConceptScore W3198568807C78458016 @default.
- W3198568807 hasConceptScore W3198568807C86803240 @default.
- W3198568807 hasLocation W31985688071 @default.
- W3198568807 hasOpenAccess W3198568807 @default.
- W3198568807 hasPrimaryLocation W31985688071 @default.
- W3198568807 hasRelatedWork W1990452411 @default.
- W3198568807 hasRelatedWork W1994682696 @default.
- W3198568807 hasRelatedWork W1995828591 @default.
- W3198568807 hasRelatedWork W1996326480 @default.
- W3198568807 hasRelatedWork W2096989594 @default.
- W3198568807 hasRelatedWork W2156021013 @default.
- W3198568807 hasRelatedWork W2156992384 @default.
- W3198568807 hasRelatedWork W2161367706 @default.
- W3198568807 hasRelatedWork W2743281190 @default.
- W3198568807 hasRelatedWork W3013781205 @default.
- W3198568807 isParatext "false" @default.
- W3198568807 isRetracted "false" @default.
- W3198568807 magId "3198568807" @default.
- W3198568807 workType "article" @default.