Matches in SemOpenAlex for { <https://semopenalex.org/work/W3044692912> ?p ?o ?g. }
Showing items 1 to 95 of
95
with 100 items per page.
- W3044692912 abstract "Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance, resulting in unstable training and poor sampling efficiency. Stochastic variance-reduced gradient methods such as SVRG have been applied to reduce the estimation variance (Zhao et al. 2019). However, due to the online instance generation nature of reinforcement learning, directly applying SVRG to deep Q-learning is facing the problem of the inaccurate estimation of the anchor points, which dramatically limits the potentials of SVRG. To address this issue and inspired by the recursive gradient variance reduction algorithm SARAH (Nguyen et al. 2017), this paper proposes to introduce the recursive framework for updating the stochastic gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN. Unlike the SVRG-based algorithms, SRG-DQN designs a recursive update of the stochastic gradient estimate. The parameter update is along an accumulated direction using the past stochastic gradient information, and therefore can get rid of the estimation of the full gradients as the anchors. Additionally, SRG-DQN involves the Adam process for further accelerating the training process. Theoretical analysis and the experimental results on well-known reinforcement learning tasks demonstrate the efficiency and effectiveness of the proposed SRG-DQN algorithm." @default.
- W3044692912 created "2020-07-29" @default.
- W3044692912 creator A5000427235 @default.
- W3044692912 creator A5002318539 @default.
- W3044692912 creator A5004503144 @default.
- W3044692912 creator A5004548666 @default.
- W3044692912 creator A5007537187 @default.
- W3044692912 creator A5028218428 @default.
- W3044692912 creator A5028609507 @default.
- W3044692912 date "2020-07-24" @default.
- W3044692912 modified "2023-09-22" @default.
- W3044692912 title "Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient" @default.
- W3044692912 cites W1522301498 @default.
- W3044692912 cites W1757796397 @default.
- W3044692912 cites W1778935443 @default.
- W3044692912 cites W2105875671 @default.
- W3044692912 cites W2107438106 @default.
- W3044692912 cites W2145339207 @default.
- W3044692912 cites W2155968351 @default.
- W3044692912 cites W2173564293 @default.
- W3044692912 cites W2201581102 @default.
- W3044692912 cites W2594203335 @default.
- W3044692912 cites W2596758708 @default.
- W3044692912 cites W2724169821 @default.
- W3044692912 cites W2767133776 @default.
- W3044692912 cites W2942057781 @default.
- W3044692912 cites W2946584621 @default.
- W3044692912 cites W2948815070 @default.
- W3044692912 cites W2951368390 @default.
- W3044692912 cites W2952215077 @default.
- W3044692912 cites W2962696654 @default.
- W3044692912 cites W2963457007 @default.
- W3044692912 cites W2963572325 @default.
- W3044692912 cites W2964291307 @default.
- W3044692912 cites W2980125442 @default.
- W3044692912 doi "https://doi.org/10.48550/arxiv.2007.12817" @default.
- W3044692912 hasPublicationYear "2020" @default.
- W3044692912 type Work @default.
- W3044692912 sameAs 3044692912 @default.
- W3044692912 citedByCount "3" @default.
- W3044692912 countsByYear W30446929122021 @default.
- W3044692912 crossrefType "posted-content" @default.
- W3044692912 hasAuthorship W3044692912A5000427235 @default.
- W3044692912 hasAuthorship W3044692912A5002318539 @default.
- W3044692912 hasAuthorship W3044692912A5004503144 @default.
- W3044692912 hasAuthorship W3044692912A5004548666 @default.
- W3044692912 hasAuthorship W3044692912A5007537187 @default.
- W3044692912 hasAuthorship W3044692912A5028218428 @default.
- W3044692912 hasAuthorship W3044692912A5028609507 @default.
- W3044692912 hasBestOaLocation W30446929121 @default.
- W3044692912 hasConcept C111335779 @default.
- W3044692912 hasConcept C111919701 @default.
- W3044692912 hasConcept C11413529 @default.
- W3044692912 hasConcept C119857082 @default.
- W3044692912 hasConcept C121955636 @default.
- W3044692912 hasConcept C144133560 @default.
- W3044692912 hasConcept C154945302 @default.
- W3044692912 hasConcept C196083921 @default.
- W3044692912 hasConcept C2524010 @default.
- W3044692912 hasConcept C33923547 @default.
- W3044692912 hasConcept C41008148 @default.
- W3044692912 hasConcept C62644790 @default.
- W3044692912 hasConcept C97541855 @default.
- W3044692912 hasConcept C98045186 @default.
- W3044692912 hasConceptScore W3044692912C111335779 @default.
- W3044692912 hasConceptScore W3044692912C111919701 @default.
- W3044692912 hasConceptScore W3044692912C11413529 @default.
- W3044692912 hasConceptScore W3044692912C119857082 @default.
- W3044692912 hasConceptScore W3044692912C121955636 @default.
- W3044692912 hasConceptScore W3044692912C144133560 @default.
- W3044692912 hasConceptScore W3044692912C154945302 @default.
- W3044692912 hasConceptScore W3044692912C196083921 @default.
- W3044692912 hasConceptScore W3044692912C2524010 @default.
- W3044692912 hasConceptScore W3044692912C33923547 @default.
- W3044692912 hasConceptScore W3044692912C41008148 @default.
- W3044692912 hasConceptScore W3044692912C62644790 @default.
- W3044692912 hasConceptScore W3044692912C97541855 @default.
- W3044692912 hasConceptScore W3044692912C98045186 @default.
- W3044692912 hasLocation W30446929121 @default.
- W3044692912 hasOpenAccess W3044692912 @default.
- W3044692912 hasPrimaryLocation W30446929121 @default.
- W3044692912 hasRelatedWork W1576418712 @default.
- W3044692912 hasRelatedWork W1980968898 @default.
- W3044692912 hasRelatedWork W1981992409 @default.
- W3044692912 hasRelatedWork W1995242492 @default.
- W3044692912 hasRelatedWork W1997989683 @default.
- W3044692912 hasRelatedWork W2039050075 @default.
- W3044692912 hasRelatedWork W2084271566 @default.
- W3044692912 hasRelatedWork W2767133776 @default.
- W3044692912 hasRelatedWork W2886060011 @default.
- W3044692912 hasRelatedWork W3022038857 @default.
- W3044692912 isParatext "false" @default.
- W3044692912 isRetracted "false" @default.
- W3044692912 magId "3044692912" @default.
- W3044692912 workType "article" @default.