Matches in SemOpenAlex for { <https://semopenalex.org/work/W3204117917> ?p ?o ?g. }
Showing items 1 to 84 of
84
with 100 items per page.
- W3204117917 endingPage "1733" @default.
- W3204117917 startingPage "1713" @default.
- W3204117917 abstract "Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them." @default.
- W3204117917 created "2021-10-11" @default.
- W3204117917 creator A5002443369 @default.
- W3204117917 creator A5012932239 @default.
- W3204117917 creator A5013402425 @default.
- W3204117917 creator A5032749222 @default.
- W3204117917 creator A5080095540 @default.
- W3204117917 date "2021-10-01" @default.
- W3204117917 modified "2023-10-18" @default.
- W3204117917 title "Discrete-to-deep reinforcement learning methods" @default.
- W3204117917 cites W16011919 @default.
- W3204117917 cites W166862392 @default.
- W3204117917 cites W1973334977 @default.
- W3204117917 cites W2012612381 @default.
- W3204117917 cites W2020633882 @default.
- W3204117917 cites W2045212778 @default.
- W3204117917 cites W2075110355 @default.
- W3204117917 cites W2126909264 @default.
- W3204117917 cites W2145339207 @default.
- W3204117917 cites W2341882963 @default.
- W3204117917 cites W2746553466 @default.
- W3204117917 cites W2891076394 @default.
- W3204117917 cites W2897664140 @default.
- W3204117917 cites W2963561234 @default.
- W3204117917 cites W2972705364 @default.
- W3204117917 cites W2989916339 @default.
- W3204117917 cites W3103262232 @default.
- W3204117917 cites W3139866229 @default.
- W3204117917 doi "https://doi.org/10.1007/s00521-021-06270-6" @default.
- W3204117917 hasPublicationYear "2021" @default.
- W3204117917 type Work @default.
- W3204117917 sameAs 3204117917 @default.
- W3204117917 citedByCount "1" @default.
- W3204117917 countsByYear W32041179172022 @default.
- W3204117917 crossrefType "journal-article" @default.
- W3204117917 hasAuthorship W3204117917A5002443369 @default.
- W3204117917 hasAuthorship W3204117917A5012932239 @default.
- W3204117917 hasAuthorship W3204117917A5013402425 @default.
- W3204117917 hasAuthorship W3204117917A5032749222 @default.
- W3204117917 hasAuthorship W3204117917A5080095540 @default.
- W3204117917 hasBestOaLocation W32041179172 @default.
- W3204117917 hasConcept C119857082 @default.
- W3204117917 hasConcept C136389625 @default.
- W3204117917 hasConcept C151730666 @default.
- W3204117917 hasConcept C154945302 @default.
- W3204117917 hasConcept C2779343474 @default.
- W3204117917 hasConcept C41008148 @default.
- W3204117917 hasConcept C50644808 @default.
- W3204117917 hasConcept C86803240 @default.
- W3204117917 hasConcept C95623464 @default.
- W3204117917 hasConcept C97541855 @default.
- W3204117917 hasConceptScore W3204117917C119857082 @default.
- W3204117917 hasConceptScore W3204117917C136389625 @default.
- W3204117917 hasConceptScore W3204117917C151730666 @default.
- W3204117917 hasConceptScore W3204117917C154945302 @default.
- W3204117917 hasConceptScore W3204117917C2779343474 @default.
- W3204117917 hasConceptScore W3204117917C41008148 @default.
- W3204117917 hasConceptScore W3204117917C50644808 @default.
- W3204117917 hasConceptScore W3204117917C86803240 @default.
- W3204117917 hasConceptScore W3204117917C95623464 @default.
- W3204117917 hasConceptScore W3204117917C97541855 @default.
- W3204117917 hasFunder F4320335334 @default.
- W3204117917 hasIssue "3" @default.
- W3204117917 hasLocation W32041179171 @default.
- W3204117917 hasLocation W32041179172 @default.
- W3204117917 hasOpenAccess W3204117917 @default.
- W3204117917 hasPrimaryLocation W32041179171 @default.
- W3204117917 hasRelatedWork W2556319748 @default.
- W3204117917 hasRelatedWork W2961085424 @default.
- W3204117917 hasRelatedWork W3022038857 @default.
- W3204117917 hasRelatedWork W3162567751 @default.
- W3204117917 hasRelatedWork W3200179079 @default.
- W3204117917 hasRelatedWork W3210156800 @default.
- W3204117917 hasRelatedWork W4226172683 @default.
- W3204117917 hasRelatedWork W4249229055 @default.
- W3204117917 hasRelatedWork W4319083788 @default.
- W3204117917 hasRelatedWork W1629725936 @default.
- W3204117917 hasVolume "34" @default.
- W3204117917 isParatext "false" @default.
- W3204117917 isRetracted "false" @default.
- W3204117917 magId "3204117917" @default.
- W3204117917 workType "article" @default.