Matches in SemOpenAlex for { <https://semopenalex.org/work/W3174408055> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W3174408055 abstract "We propose a novel algorithm named Expert Q-learning. Expert Q-learning was inspired by Dueling Q-learning and aimed at incorporating the ideas from semi-supervised learning into reinforcement learning through splitting Q-values into state values and action advantages. Different from Generative Adversarial Imitation Learning and Deep Q-Learning from Demonstrations, the offline expert we have used only predicts the value of a state from {-1, 0, 1}, indicating whether this is a bad, neutral or good state. An expert network was designed in addition to the Q-network, which updates each time following the regular offline minibatch update whenever the expert example buffer is not empty. The Q-network plays the role of the advantage function only during the update. Our algorithm also keeps asynchronous copies of the Q-network and expert network, predicting the target values using the same manner as of Double Q-learning. We compared on the game of Othello our algorithm with the state-of-the-art Q-learning algorithm, which was a combination of Double Q-learning and Dueling Q-learning. The results showed that Expert Q-learning was indeed useful and more resistant to the overestimation bias of Q-learning. The baseline Q-learning algorithm exhibited unstable and suboptimal behavior, especially when playing against a stochastic player, whereas Expert Q-learning demonstrated more robust performance with higher scores. Expert Q-learning without using examples has also gained better results than the baseline algorithm when trained and tested against a fixed player. On the other hand, Expert Q-learning without examples cannot win against the baseline Q-learning algorithm in direct game competitions despite the fact that it has also shown the strength of reducing the overestimation bias." @default.
- W3174408055 created "2021-07-05" @default.
- W3174408055 creator A5002394922 @default.
- W3174408055 creator A5032770006 @default.
- W3174408055 creator A5072161268 @default.
- W3174408055 creator A5090446312 @default.
- W3174408055 date "2021-06-28" @default.
- W3174408055 modified "2023-09-28" @default.
- W3174408055 title "Expert Q-learning: Deep Q-learning With State Values From Expert Examples" @default.
- W3174408055 cites W169931978 @default.
- W3174408055 cites W1757796397 @default.
- W3174408055 cites W2051228319 @default.
- W3174408055 cites W2061562262 @default.
- W3174408055 cites W2070469928 @default.
- W3174408055 cites W2098774185 @default.
- W3174408055 cites W2099471712 @default.
- W3174408055 cites W2107726111 @default.
- W3174408055 cites W2132994929 @default.
- W3174408055 cites W2134089414 @default.
- W3174408055 cites W2168405694 @default.
- W3174408055 cites W2173520492 @default.
- W3174408055 cites W2173564293 @default.
- W3174408055 cites W2257979135 @default.
- W3174408055 cites W2902907165 @default.
- W3174408055 cites W2943152387 @default.
- W3174408055 cites W2963014947 @default.
- W3174408055 cites W2963277051 @default.
- W3174408055 cites W2963376229 @default.
- W3174408055 cites W2982316857 @default.
- W3174408055 cites W3035160371 @default.
- W3174408055 cites W3105874611 @default.
- W3174408055 cites W3131944163 @default.
- W3174408055 cites W3139377883 @default.
- W3174408055 cites W51508254 @default.
- W3174408055 cites W1599347336 @default.
- W3174408055 hasPublicationYear "2021" @default.
- W3174408055 type Work @default.
- W3174408055 sameAs 3174408055 @default.
- W3174408055 citedByCount "0" @default.
- W3174408055 crossrefType "posted-content" @default.
- W3174408055 hasAuthorship W3174408055A5002394922 @default.
- W3174408055 hasAuthorship W3174408055A5032770006 @default.
- W3174408055 hasAuthorship W3174408055A5072161268 @default.
- W3174408055 hasAuthorship W3174408055A5090446312 @default.
- W3174408055 hasConcept C108583219 @default.
- W3174408055 hasConcept C119857082 @default.
- W3174408055 hasConcept C154945302 @default.
- W3174408055 hasConcept C188116033 @default.
- W3174408055 hasConcept C24138899 @default.
- W3174408055 hasConcept C41008148 @default.
- W3174408055 hasConcept C58973888 @default.
- W3174408055 hasConcept C97541855 @default.
- W3174408055 hasConceptScore W3174408055C108583219 @default.
- W3174408055 hasConceptScore W3174408055C119857082 @default.
- W3174408055 hasConceptScore W3174408055C154945302 @default.
- W3174408055 hasConceptScore W3174408055C188116033 @default.
- W3174408055 hasConceptScore W3174408055C24138899 @default.
- W3174408055 hasConceptScore W3174408055C41008148 @default.
- W3174408055 hasConceptScore W3174408055C58973888 @default.
- W3174408055 hasConceptScore W3174408055C97541855 @default.
- W3174408055 hasLocation W31744080551 @default.
- W3174408055 hasOpenAccess W3174408055 @default.
- W3174408055 hasPrimaryLocation W31744080551 @default.
- W3174408055 hasRelatedWork W114528961 @default.
- W3174408055 hasRelatedWork W1812824539 @default.
- W3174408055 hasRelatedWork W2030191131 @default.
- W3174408055 hasRelatedWork W2180887124 @default.
- W3174408055 hasRelatedWork W2340989005 @default.
- W3174408055 hasRelatedWork W2348266642 @default.
- W3174408055 hasRelatedWork W2356358603 @default.
- W3174408055 hasRelatedWork W2365363856 @default.
- W3174408055 hasRelatedWork W2549225575 @default.
- W3174408055 hasRelatedWork W2890169813 @default.
- W3174408055 hasRelatedWork W2891315147 @default.
- W3174408055 hasRelatedWork W2895095767 @default.
- W3174408055 hasRelatedWork W2981452746 @default.
- W3174408055 hasRelatedWork W2982339063 @default.
- W3174408055 hasRelatedWork W2998557984 @default.
- W3174408055 hasRelatedWork W300947857 @default.
- W3174408055 hasRelatedWork W3107195816 @default.
- W3174408055 hasRelatedWork W3195322794 @default.
- W3174408055 hasRelatedWork W3203001245 @default.
- W3174408055 hasRelatedWork W3203418126 @default.
- W3174408055 isParatext "false" @default.
- W3174408055 isRetracted "false" @default.
- W3174408055 magId "3174408055" @default.
- W3174408055 workType "article" @default.