Matches in SemOpenAlex for { <https://semopenalex.org/work/W3103474976> ?p ?o ?g. }
- W3103474976 abstract "Model-free reinforcement learning (RL), in particular Q-learning is widely used to learn optimal policies for a variety of planning and control problems. However, when the underlying state-transition dynamics are stochastic and high-dimensional, Q-learning requires a large amount of data and incurs a prohibitively high computational cost. In this paper, we introduce Hamiltonian Q-Learning, a data efficient modification of the Q-learning approach, which adopts an importance-sampling based technique for computing the Q function. To exploit stochastic structure of the state-transition dynamics, we employ Hamiltonian Monte Carlo to update Q function estimates by approximating the expected future rewards using Q values associated with a subset of next states. Further, to exploit the latent low-rank structure of the dynamic system, Hamiltonian Q-Learning uses a matrix completion algorithm to reconstruct the updated Q function from Q value updates over a much smaller subset of state-action pairs. By providing an efficient way to apply Q-learning in stochastic, high-dimensional problems, the proposed approach broadens the scope of RL algorithms for real-world applications, including classical control tasks and environmental monitoring." @default.
- W3103474976 created "2020-11-23" @default.
- W3103474976 creator A5012957727 @default.
- W3103474976 creator A5032304399 @default.
- W3103474976 creator A5032507309 @default.
- W3103474976 creator A5048043105 @default.
- W3103474976 date "2021-05-04" @default.
- W3103474976 modified "2023-09-27" @default.
- W3103474976 title "Hamiltonian Q-Learning: Leveraging Importance-sampling for Data Efficient RL" @default.
- W3103474976 cites W1511694993 @default.
- W3103474976 cites W1573860137 @default.
- W3103474976 cites W1591803298 @default.
- W3103474976 cites W1596195796 @default.
- W3103474976 cites W1977655452 @default.
- W3103474976 cites W1999912147 @default.
- W3103474976 cites W2020677283 @default.
- W3103474976 cites W2025800439 @default.
- W3103474976 cites W2028599032 @default.
- W3103474976 cites W2033057584 @default.
- W3103474976 cites W2038891887 @default.
- W3103474976 cites W2047071281 @default.
- W3103474976 cites W2060204507 @default.
- W3103474976 cites W2121863487 @default.
- W3103474976 cites W2130913800 @default.
- W3103474976 cites W2140135625 @default.
- W3103474976 cites W2145339207 @default.
- W3103474976 cites W2165485253 @default.
- W3103474976 cites W2203207606 @default.
- W3103474976 cites W2478027467 @default.
- W3103474976 cites W2539193981 @default.
- W3103474976 cites W2546571074 @default.
- W3103474976 cites W2751621024 @default.
- W3103474976 cites W2754985060 @default.
- W3103474976 cites W2766447205 @default.
- W3103474976 cites W2774977638 @default.
- W3103474976 cites W2785424877 @default.
- W3103474976 cites W2789256807 @default.
- W3103474976 cites W2798989470 @default.
- W3103474976 cites W2810541051 @default.
- W3103474976 cites W2900545699 @default.
- W3103474976 cites W2962703949 @default.
- W3103474976 cites W2962821147 @default.
- W3103474976 cites W2963641140 @default.
- W3103474976 cites W2963789045 @default.
- W3103474976 cites W2963809569 @default.
- W3103474976 cites W2971204130 @default.
- W3103474976 cites W2971213186 @default.
- W3103474976 cites W2995290757 @default.
- W3103474976 cites W3011120880 @default.
- W3103474976 cites W3029509103 @default.
- W3103474976 cites W3035067130 @default.
- W3103474976 cites W3106222965 @default.
- W3103474976 cites W3117266674 @default.
- W3103474976 hasPublicationYear "2021" @default.
- W3103474976 type Work @default.
- W3103474976 sameAs 3103474976 @default.
- W3103474976 citedByCount "0" @default.
- W3103474976 crossrefType "journal-article" @default.
- W3103474976 hasAuthorship W3103474976A5012957727 @default.
- W3103474976 hasAuthorship W3103474976A5032304399 @default.
- W3103474976 hasAuthorship W3103474976A5032507309 @default.
- W3103474976 hasAuthorship W3103474976A5048043105 @default.
- W3103474976 hasConcept C105795698 @default.
- W3103474976 hasConcept C119857082 @default.
- W3103474976 hasConcept C126255220 @default.
- W3103474976 hasConcept C130787639 @default.
- W3103474976 hasConcept C14646407 @default.
- W3103474976 hasConcept C154945302 @default.
- W3103474976 hasConcept C165696696 @default.
- W3103474976 hasConcept C188116033 @default.
- W3103474976 hasConcept C19499675 @default.
- W3103474976 hasConcept C33923547 @default.
- W3103474976 hasConcept C38652104 @default.
- W3103474976 hasConcept C41008148 @default.
- W3103474976 hasConcept C49555168 @default.
- W3103474976 hasConcept C52740198 @default.
- W3103474976 hasConcept C80444323 @default.
- W3103474976 hasConcept C97541855 @default.
- W3103474976 hasConcept C98763669 @default.
- W3103474976 hasConceptScore W3103474976C105795698 @default.
- W3103474976 hasConceptScore W3103474976C119857082 @default.
- W3103474976 hasConceptScore W3103474976C126255220 @default.
- W3103474976 hasConceptScore W3103474976C130787639 @default.
- W3103474976 hasConceptScore W3103474976C14646407 @default.
- W3103474976 hasConceptScore W3103474976C154945302 @default.
- W3103474976 hasConceptScore W3103474976C165696696 @default.
- W3103474976 hasConceptScore W3103474976C188116033 @default.
- W3103474976 hasConceptScore W3103474976C19499675 @default.
- W3103474976 hasConceptScore W3103474976C33923547 @default.
- W3103474976 hasConceptScore W3103474976C38652104 @default.
- W3103474976 hasConceptScore W3103474976C41008148 @default.
- W3103474976 hasConceptScore W3103474976C49555168 @default.
- W3103474976 hasConceptScore W3103474976C52740198 @default.
- W3103474976 hasConceptScore W3103474976C80444323 @default.
- W3103474976 hasConceptScore W3103474976C97541855 @default.
- W3103474976 hasConceptScore W3103474976C98763669 @default.
- W3103474976 hasLocation W31034749761 @default.
- W3103474976 hasOpenAccess W3103474976 @default.
- W3103474976 hasPrimaryLocation W31034749761 @default.
- W3103474976 hasRelatedWork W15354828 @default.