SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W2592496730> ?p ?o ?g. }

Showing items 1 to 64 of 64 with 100 items per page.

W2592496730 endingPage "1481" @default.
W2592496730 startingPage "1470" @default.
W2592496730 abstract "A model-based reinforcement learning (RL) method which interplays direct and indirect learning to update ${Q}$ functions is proposed. The environment is approximated by a virtual model that can predict the transition to the next state and the reward of the domain. This virtual model is used to train ${Q}$ functions to accelerate policy learning. Lookup table methods are usually used to establish such environmental models, but these methods need to collect tremendous amounts of experiences to enumerate responses of the environment. In this paper, a stochastic model learning method based on tree structures is presented. To model the transition probability, an online clustering method is applied to equip the model learning method with the abilities to evaluate the transition probability. By the virtual model, the RL method produces simulated experience in the stage of indirect learning. Since simulated transitions and backups are more usefully focused by working backward from the state-action, the pair estimated ${Q}$ value of which changes significantly, the useful one-step backups are actions that lead directly into the one state whose value has already obviously been changed. This, however, may induce a false positive; that is, a backup state may be an invalid state, such as an absorbing or terminal state, especially in cases where the changes of ${Q}$ values at the planning stage are still needed to put back for ranking even though they are based on a simulated experience and are possibly erroneous. It is obvious that when the agent is attracted to generate simulated experience around the area of these absorbing states, the learning efficiency is deteriorated. This paper proposes three detecting methods to solve this problem. Moreover, the policy learning can speed up. The effectiveness and generality of our method is further demonstrated in three numerical simulations. The simulation results demonstrate that the training rate of our method is obviously improved." @default.
W2592496730 created "2017-03-16" @default.
W2592496730 creator A5001081733 @default.
W2592496730 creator A5061189209 @default.
W2592496730 creator A5064655192 @default.
W2592496730 creator A5082113600 @default.
W2592496730 date "2018-09-01" @default.
W2592496730 modified "2023-09-27" @default.
W2592496730 title "Model Learning for Multistep Backward Prediction in Dyna-${Q}$ Learning" @default.
W2592496730 cites W1491843047 @default.
W2592496730 cites W1975987482 @default.
W2592496730 cites W1987725948 @default.
W2592496730 cites W1988707322 @default.
W2592496730 cites W1989962452 @default.
W2592496730 cites W1997733123 @default.
W2592496730 cites W1997880753 @default.
W2592496730 cites W2078196735 @default.
W2592496730 cites W2096001037 @default.
W2592496730 cites W2097082695 @default.
W2592496730 cites W2107726111 @default.
W2592496730 cites W2151250975 @default.
W2592496730 cites W2155044741 @default.
W2592496730 cites W32403112 @default.
W2592496730 cites W4240328096 @default.
W2592496730 cites W4245108548 @default.
W2592496730 doi "https://doi.org/10.1109/tsmc.2017.2671848" @default.
W2592496730 hasPublicationYear "2018" @default.
W2592496730 type Work @default.
W2592496730 sameAs 2592496730 @default.
W2592496730 citedByCount "6" @default.
W2592496730 countsByYear W25924967302019 @default.
W2592496730 countsByYear W25924967302021 @default.
W2592496730 countsByYear W25924967302022 @default.
W2592496730 countsByYear W25924967302023 @default.
W2592496730 crossrefType "journal-article" @default.
W2592496730 hasAuthorship W2592496730A5001081733 @default.
W2592496730 hasAuthorship W2592496730A5061189209 @default.
W2592496730 hasAuthorship W2592496730A5064655192 @default.
W2592496730 hasAuthorship W2592496730A5082113600 @default.
W2592496730 hasConcept C154945302 @default.
W2592496730 hasConcept C41008148 @default.
W2592496730 hasConceptScore W2592496730C154945302 @default.
W2592496730 hasConceptScore W2592496730C41008148 @default.
W2592496730 hasIssue "9" @default.
W2592496730 hasLocation W25924967301 @default.
W2592496730 hasOpenAccess W2592496730 @default.
W2592496730 hasPrimaryLocation W25924967301 @default.
W2592496730 hasRelatedWork W2049775471 @default.
W2592496730 hasRelatedWork W2093578348 @default.
W2592496730 hasRelatedWork W2350741829 @default.
W2592496730 hasRelatedWork W2358668433 @default.
W2592496730 hasRelatedWork W2376932109 @default.
W2592496730 hasRelatedWork W2382290278 @default.
W2592496730 hasRelatedWork W2390279801 @default.
W2592496730 hasRelatedWork W2748952813 @default.
W2592496730 hasRelatedWork W2899084033 @default.
W2592496730 hasRelatedWork W3004735627 @default.
W2592496730 hasVolume "48" @default.
W2592496730 isParatext "false" @default.
W2592496730 isRetracted "false" @default.
W2592496730 magId "2592496730" @default.
W2592496730 workType "article" @default.