Matches in SemOpenAlex for { <https://semopenalex.org/work/W2903445514> ?p ?o ?g. }
- W2903445514 abstract "Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. emph{Terminal Prediction}, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game." @default.
- W2903445514 created "2018-12-11" @default.
- W2903445514 creator A5005259414 @default.
- W2903445514 creator A5070914351 @default.
- W2903445514 creator A5071731471 @default.
- W2903445514 date "2018-11-30" @default.
- W2903445514 modified "2023-09-27" @default.
- W2903445514 title "Using Monte Carlo Tree Search as a Demonstrator within Asynchronous Deep RL" @default.
- W2903445514 cites W1625390266 @default.
- W2903445514 cites W1714211023 @default.
- W2903445514 cites W1980035368 @default.
- W2903445514 cites W2119717200 @default.
- W2903445514 cites W2126316555 @default.
- W2903445514 cites W2133552775 @default.
- W2903445514 cites W2145339207 @default.
- W2903445514 cites W2151210636 @default.
- W2903445514 cites W2156737235 @default.
- W2903445514 cites W2257979135 @default.
- W2903445514 cites W2415726935 @default.
- W2903445514 cites W2563830277 @default.
- W2903445514 cites W2573637607 @default.
- W2903445514 cites W2580175322 @default.
- W2903445514 cites W2618097077 @default.
- W2903445514 cites W2626804490 @default.
- W2903445514 cites W2729964169 @default.
- W2903445514 cites W2736601468 @default.
- W2903445514 cites W2747605198 @default.
- W2903445514 cites W2766447205 @default.
- W2903445514 cites W2778917778 @default.
- W2903445514 cites W2786036274 @default.
- W2903445514 cites W2788862220 @default.
- W2903445514 cites W2886265050 @default.
- W2903445514 cites W2889698555 @default.
- W2903445514 cites W2903552445 @default.
- W2903445514 cites W2904157920 @default.
- W2903445514 cites W2919115771 @default.
- W2903445514 cites W2949899112 @default.
- W2903445514 cites W2949980113 @default.
- W2903445514 cites W2950735232 @default.
- W2903445514 cites W2950872548 @default.
- W2903445514 cites W2962938178 @default.
- W2903445514 cites W2962957031 @default.
- W2903445514 cites W2963099939 @default.
- W2903445514 cites W2964043796 @default.
- W2903445514 cites W3100789280 @default.
- W2903445514 cites W745775011 @default.
- W2903445514 cites W2754799999 @default.
- W2903445514 hasPublicationYear "2018" @default.
- W2903445514 type Work @default.
- W2903445514 sameAs 2903445514 @default.
- W2903445514 citedByCount "4" @default.
- W2903445514 countsByYear W29034455142018 @default.
- W2903445514 countsByYear W29034455142019 @default.
- W2903445514 crossrefType "posted-content" @default.
- W2903445514 hasAuthorship W2903445514A5005259414 @default.
- W2903445514 hasAuthorship W2903445514A5070914351 @default.
- W2903445514 hasAuthorship W2903445514A5071731471 @default.
- W2903445514 hasConcept C105795698 @default.
- W2903445514 hasConcept C113174947 @default.
- W2903445514 hasConcept C119857082 @default.
- W2903445514 hasConcept C126255220 @default.
- W2903445514 hasConcept C127413603 @default.
- W2903445514 hasConcept C134306372 @default.
- W2903445514 hasConcept C151319957 @default.
- W2903445514 hasConcept C154945302 @default.
- W2903445514 hasConcept C162324750 @default.
- W2903445514 hasConcept C19499675 @default.
- W2903445514 hasConcept C201995342 @default.
- W2903445514 hasConcept C2777303404 @default.
- W2903445514 hasConcept C2779545769 @default.
- W2903445514 hasConcept C2779664074 @default.
- W2903445514 hasConcept C2780451532 @default.
- W2903445514 hasConcept C31258907 @default.
- W2903445514 hasConcept C33923547 @default.
- W2903445514 hasConcept C41008148 @default.
- W2903445514 hasConcept C46149586 @default.
- W2903445514 hasConcept C50522688 @default.
- W2903445514 hasConcept C80444323 @default.
- W2903445514 hasConcept C97541855 @default.
- W2903445514 hasConceptScore W2903445514C105795698 @default.
- W2903445514 hasConceptScore W2903445514C113174947 @default.
- W2903445514 hasConceptScore W2903445514C119857082 @default.
- W2903445514 hasConceptScore W2903445514C126255220 @default.
- W2903445514 hasConceptScore W2903445514C127413603 @default.
- W2903445514 hasConceptScore W2903445514C134306372 @default.
- W2903445514 hasConceptScore W2903445514C151319957 @default.
- W2903445514 hasConceptScore W2903445514C154945302 @default.
- W2903445514 hasConceptScore W2903445514C162324750 @default.
- W2903445514 hasConceptScore W2903445514C19499675 @default.
- W2903445514 hasConceptScore W2903445514C201995342 @default.
- W2903445514 hasConceptScore W2903445514C2777303404 @default.
- W2903445514 hasConceptScore W2903445514C2779545769 @default.
- W2903445514 hasConceptScore W2903445514C2779664074 @default.
- W2903445514 hasConceptScore W2903445514C2780451532 @default.
- W2903445514 hasConceptScore W2903445514C31258907 @default.
- W2903445514 hasConceptScore W2903445514C33923547 @default.
- W2903445514 hasConceptScore W2903445514C41008148 @default.
- W2903445514 hasConceptScore W2903445514C46149586 @default.
- W2903445514 hasConceptScore W2903445514C50522688 @default.
- W2903445514 hasConceptScore W2903445514C80444323 @default.