Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385819655> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4385819655 endingPage "13" @default.
- W4385819655 startingPage "1" @default.
- W4385819655 abstract "Artificial Intelligence (AI) has seen several breakthroughs in some perfect- and imperfect-information games, such as Go, Texas Hold'em, and StarCraft II. However, the Chinese poker game, DouDiZhu presents new challenges for AI systems to overcome, including infering imperfect information, training with sparse rewards, and handling a large state-action space. This article describes our proposed DouDiZhu AI system, RARSMSDou, based on Deep Reinforcement Learning (DRL) algorithms that combines Proximal Policy Optimization (PPO), Relative Advantage Reward Shaping with Minimum Splits (RARSMS), and Deep Monte-Carlo (DMC) into a self-play framework. In RARSMSDou, we propose RARSMS as a novel intrinsic reward to guide the training for PPO in a sparse reward environment. We treat the imperfect information as observable information and feed it into the critic-network of PPO, and we propose abstract actions to simplify the large-action space (27,472 actions) to a low-dimensional action space (309 actions contain 189 specific actions and 120 abstract actions) which is output by the policy network of PPO. When the policy is an abstract action, DMC (DouZeroX) maps this abstract action to its specific action as a policy for training or execution. We compare the performance of RARSMSDou with its four variants (PPO, PPO+RARSMS, PPO+DMC, DMC (DouZeroX)) and five state-of-the-art DouDiZhu AI programs. The experiment results show that after 30 days of self-play and training, RARSMSDou outperforms its variants and DouZero (with a WP of 0.582 and an ADP of 0.414), which is the best DouDiZhu baseline." @default.
- W4385819655 created "2023-08-15" @default.
- W4385819655 creator A5025745463 @default.
- W4385819655 creator A5076053789 @default.
- W4385819655 date "2023-01-01" @default.
- W4385819655 modified "2023-09-26" @default.
- W4385819655 title "RARSMSDou: Master the Game of DouDiZhu With Deep Reinforcement Learning Algorithms" @default.
- W4385819655 doi "https://doi.org/10.1109/tetci.2023.3303251" @default.
- W4385819655 hasPublicationYear "2023" @default.
- W4385819655 type Work @default.
- W4385819655 citedByCount "0" @default.
- W4385819655 crossrefType "journal-article" @default.
- W4385819655 hasAuthorship W4385819655A5025745463 @default.
- W4385819655 hasAuthorship W4385819655A5076053789 @default.
- W4385819655 hasConcept C105795698 @default.
- W4385819655 hasConcept C111919701 @default.
- W4385819655 hasConcept C11413529 @default.
- W4385819655 hasConcept C119857082 @default.
- W4385819655 hasConcept C121332964 @default.
- W4385819655 hasConcept C123676819 @default.
- W4385819655 hasConcept C138885662 @default.
- W4385819655 hasConcept C144237770 @default.
- W4385819655 hasConcept C154945302 @default.
- W4385819655 hasConcept C2778572836 @default.
- W4385819655 hasConcept C2780310539 @default.
- W4385819655 hasConcept C2780791683 @default.
- W4385819655 hasConcept C33923547 @default.
- W4385819655 hasConcept C41008148 @default.
- W4385819655 hasConcept C41895202 @default.
- W4385819655 hasConcept C48103436 @default.
- W4385819655 hasConcept C62520636 @default.
- W4385819655 hasConcept C72434380 @default.
- W4385819655 hasConcept C97541855 @default.
- W4385819655 hasConceptScore W4385819655C105795698 @default.
- W4385819655 hasConceptScore W4385819655C111919701 @default.
- W4385819655 hasConceptScore W4385819655C11413529 @default.
- W4385819655 hasConceptScore W4385819655C119857082 @default.
- W4385819655 hasConceptScore W4385819655C121332964 @default.
- W4385819655 hasConceptScore W4385819655C123676819 @default.
- W4385819655 hasConceptScore W4385819655C138885662 @default.
- W4385819655 hasConceptScore W4385819655C144237770 @default.
- W4385819655 hasConceptScore W4385819655C154945302 @default.
- W4385819655 hasConceptScore W4385819655C2778572836 @default.
- W4385819655 hasConceptScore W4385819655C2780310539 @default.
- W4385819655 hasConceptScore W4385819655C2780791683 @default.
- W4385819655 hasConceptScore W4385819655C33923547 @default.
- W4385819655 hasConceptScore W4385819655C41008148 @default.
- W4385819655 hasConceptScore W4385819655C41895202 @default.
- W4385819655 hasConceptScore W4385819655C48103436 @default.
- W4385819655 hasConceptScore W4385819655C62520636 @default.
- W4385819655 hasConceptScore W4385819655C72434380 @default.
- W4385819655 hasConceptScore W4385819655C97541855 @default.
- W4385819655 hasFunder F4320321147 @default.
- W4385819655 hasLocation W43858196551 @default.
- W4385819655 hasOpenAccess W4385819655 @default.
- W4385819655 hasPrimaryLocation W43858196551 @default.
- W4385819655 hasRelatedWork W2094557321 @default.
- W4385819655 hasRelatedWork W2395852064 @default.
- W4385819655 hasRelatedWork W3044994704 @default.
- W4385819655 hasRelatedWork W3103643887 @default.
- W4385819655 hasRelatedWork W3173185086 @default.
- W4385819655 hasRelatedWork W4205489146 @default.
- W4385819655 hasRelatedWork W4240895364 @default.
- W4385819655 hasRelatedWork W4281661226 @default.
- W4385819655 hasRelatedWork W4319083788 @default.
- W4385819655 hasRelatedWork W4362598698 @default.
- W4385819655 isParatext "false" @default.
- W4385819655 isRetracted "false" @default.
- W4385819655 workType "article" @default.