Matches in SemOpenAlex for { <https://semopenalex.org/work/W2898895787> ?p ?o ?g. }
- W2898895787 abstract "When observing the actions of others, humans make inferences about why they acted as they did, and what this implies about the world; humans also use the fact that their actions will be interpreted in this manner, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player, zero-sum games, scalable multi-agent reinforcement learning algorithms that can discover effective strategies and conventions in complex, partially observable settings have proven elusive. We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment. BAD introduces a new Markov decision process, the public belief MDP, in which the action space consists of all deterministic partial policies, and exploits the fact that an agent acting only on this public belief state can still learn to use its private information if the action space is augmented to be over all partial policies mapping private information into environment actions. The Bayesian update is closely related to the theory of mind reasoning that humans carry out when observing others' actions. We first validate BAD on a proof-of-principle two-step matrix game, where it outperforms policy gradient methods; we then evaluate BAD on the challenging, cooperative partial-information card game Hanabi, where, in the two-player setting, it surpasses all previously published learning and hand-coded approaches, establishing a new state of the art." @default.
- W2898895787 created "2018-11-09" @default.
- W2898895787 creator A5006947993 @default.
- W2898895787 creator A5018555885 @default.
- W2898895787 creator A5031035504 @default.
- W2898895787 creator A5056879203 @default.
- W2898895787 creator A5059094093 @default.
- W2898895787 creator A5081163135 @default.
- W2898895787 creator A5083771180 @default.
- W2898895787 creator A5089479740 @default.
- W2898895787 date "2018-11-04" @default.
- W2898895787 modified "2023-10-16" @default.
- W2898895787 title "Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning" @default.
- W2898895787 cites W1275367343 @default.
- W2898895787 cites W1542941925 @default.
- W2898895787 cites W1934021597 @default.
- W2898895787 cites W1991888757 @default.
- W2898895787 cites W1993979041 @default.
- W2898895787 cites W2103581399 @default.
- W2898895787 cites W2149551746 @default.
- W2898895787 cites W2168359464 @default.
- W2898895787 cites W2255045308 @default.
- W2898895787 cites W2264742718 @default.
- W2898895787 cites W2294388386 @default.
- W2898895787 cites W2315975839 @default.
- W2898895787 cites W2395575420 @default.
- W2898895787 cites W2574978968 @default.
- W2898895787 cites W2594035753 @default.
- W2898895787 cites W2623431351 @default.
- W2898895787 cites W2739364177 @default.
- W2898895787 cites W2765782498 @default.
- W2898895787 cites W2773381986 @default.
- W2898895787 cites W2776831310 @default.
- W2898895787 cites W2786036274 @default.
- W2898895787 cites W2810602713 @default.
- W2898895787 cites W2913781869 @default.
- W2898895787 cites W2951213811 @default.
- W2898895787 cites W2962938168 @default.
- W2898895787 cites W2963000099 @default.
- W2898895787 cites W2963407617 @default.
- W2898895787 cites W2964338167 @default.
- W2898895787 cites W2770298516 @default.
- W2898895787 hasPublicationYear "2018" @default.
- W2898895787 type Work @default.
- W2898895787 sameAs 2898895787 @default.
- W2898895787 citedByCount "14" @default.
- W2898895787 countsByYear W28988957872018 @default.
- W2898895787 countsByYear W28988957872019 @default.
- W2898895787 countsByYear W28988957872020 @default.
- W2898895787 countsByYear W28988957872021 @default.
- W2898895787 crossrefType "posted-content" @default.
- W2898895787 hasAuthorship W2898895787A5006947993 @default.
- W2898895787 hasAuthorship W2898895787A5018555885 @default.
- W2898895787 hasAuthorship W2898895787A5031035504 @default.
- W2898895787 hasAuthorship W2898895787A5056879203 @default.
- W2898895787 hasAuthorship W2898895787A5059094093 @default.
- W2898895787 hasAuthorship W2898895787A5081163135 @default.
- W2898895787 hasAuthorship W2898895787A5083771180 @default.
- W2898895787 hasAuthorship W2898895787A5089479740 @default.
- W2898895787 hasConcept C105795698 @default.
- W2898895787 hasConcept C106189395 @default.
- W2898895787 hasConcept C107673813 @default.
- W2898895787 hasConcept C119857082 @default.
- W2898895787 hasConcept C121332964 @default.
- W2898895787 hasConcept C154945302 @default.
- W2898895787 hasConcept C159886148 @default.
- W2898895787 hasConcept C2780791683 @default.
- W2898895787 hasConcept C33923547 @default.
- W2898895787 hasConcept C41008148 @default.
- W2898895787 hasConcept C62520636 @default.
- W2898895787 hasConcept C72434380 @default.
- W2898895787 hasConcept C97541855 @default.
- W2898895787 hasConceptScore W2898895787C105795698 @default.
- W2898895787 hasConceptScore W2898895787C106189395 @default.
- W2898895787 hasConceptScore W2898895787C107673813 @default.
- W2898895787 hasConceptScore W2898895787C119857082 @default.
- W2898895787 hasConceptScore W2898895787C121332964 @default.
- W2898895787 hasConceptScore W2898895787C154945302 @default.
- W2898895787 hasConceptScore W2898895787C159886148 @default.
- W2898895787 hasConceptScore W2898895787C2780791683 @default.
- W2898895787 hasConceptScore W2898895787C33923547 @default.
- W2898895787 hasConceptScore W2898895787C41008148 @default.
- W2898895787 hasConceptScore W2898895787C62520636 @default.
- W2898895787 hasConceptScore W2898895787C72434380 @default.
- W2898895787 hasConceptScore W2898895787C97541855 @default.
- W2898895787 hasLocation W28988957871 @default.
- W2898895787 hasOpenAccess W2898895787 @default.
- W2898895787 hasPrimaryLocation W28988957871 @default.
- W2898895787 hasRelatedWork W1480196266 @default.
- W2898895787 hasRelatedWork W1578228498 @default.
- W2898895787 hasRelatedWork W2003784190 @default.
- W2898895787 hasRelatedWork W2127244197 @default.
- W2898895787 hasRelatedWork W2285910602 @default.
- W2898895787 hasRelatedWork W2368504874 @default.
- W2898895787 hasRelatedWork W2395575420 @default.
- W2898895787 hasRelatedWork W2410288126 @default.
- W2898895787 hasRelatedWork W2621642732 @default.
- W2898895787 hasRelatedWork W2913781869 @default.
- W2898895787 hasRelatedWork W2950472486 @default.
- W2898895787 hasRelatedWork W2953029891 @default.