Matches in SemOpenAlex for { <https://semopenalex.org/work/W3177652717> ?p ?o ?g. }
- W3177652717 abstract "Reinforcement learning (RL) in partially observable, fully cooperative multi-agent settings (Dec-POMDPs) can in principle be used to address many real-world challenges such as controlling a swarm of rescue robots or a team of quadcopters. However, Dec-POMDPs are significantly harder to solve than single-agent problems, with the former being NEXP-complete and the latter, MDPs, being just P-complete. Hence, current RL algorithms for Dec-POMDPs suffer from poor sample complexity, which greatly reduces their applicability to practical problems where environment interaction is costly. Our key insight is that using just a polynomial number of samples, one can learn a centralized model that generalizes across different policies. We can then optimize the policy within the learned model instead of the true system, without requiring additional environment interactions. We also learn a centralized exploration policy within our model that learns to collect additional data in state-action regions with high model uncertainty. We empirically evaluate the proposed model-based algorithm, MARCO, in three cooperative communication tasks, where it improves sample efficiency by up to 20x. Finally, to investigate the theoretical sample complexity, we adapt an existing model-based method for tabular MDPs to Dec-POMDPs, and prove that it achieves polynomial sample complexity." @default.
- W3177652717 created "2021-07-19" @default.
- W3177652717 creator A5030505775 @default.
- W3177652717 creator A5039131712 @default.
- W3177652717 creator A5059094093 @default.
- W3177652717 creator A5061193324 @default.
- W3177652717 date "2021-07-13" @default.
- W3177652717 modified "2023-09-26" @default.
- W3177652717 title "Centralized Model and Exploration Policy for Multi-Agent RL" @default.
- W3177652717 cites W1491843047 @default.
- W3177652717 cites W1641379095 @default.
- W3177652717 cites W1980035368 @default.
- W3177652717 cites W2004391169 @default.
- W3177652717 cites W2032100464 @default.
- W3177652717 cites W2088956500 @default.
- W3177652717 cites W2089593831 @default.
- W3177652717 cites W2120678009 @default.
- W3177652717 cites W2121092017 @default.
- W3177652717 cites W2145339207 @default.
- W3177652717 cites W2154533441 @default.
- W3177652717 cites W2158782408 @default.
- W3177652717 cites W2192625879 @default.
- W3177652717 cites W2292533394 @default.
- W3177652717 cites W2395575420 @default.
- W3177652717 cites W2623431351 @default.
- W3177652717 cites W2626637010 @default.
- W3177652717 cites W2747213132 @default.
- W3177652717 cites W2774354230 @default.
- W3177652717 cites W2785389871 @default.
- W3177652717 cites W2810754397 @default.
- W3177652717 cites W2913781869 @default.
- W3177652717 cites W2951984055 @default.
- W3177652717 cites W2962938168 @default.
- W3177652717 cites W2963658727 @default.
- W3177652717 cites W2964204672 @default.
- W3177652717 cites W2992977009 @default.
- W3177652717 cites W3028766998 @default.
- W3177652717 cites W3034973310 @default.
- W3177652717 cites W3042871037 @default.
- W3177652717 cites W3093287223 @default.
- W3177652717 cites W3103780890 @default.
- W3177652717 cites W3104860527 @default.
- W3177652717 cites W3113994363 @default.
- W3177652717 cites W3118210634 @default.
- W3177652717 cites W3135457936 @default.
- W3177652717 cites W3158227719 @default.
- W3177652717 cites W3208476402 @default.
- W3177652717 doi "https://doi.org/10.48550/arxiv.2107.06434" @default.
- W3177652717 hasPublicationYear "2021" @default.
- W3177652717 type Work @default.
- W3177652717 sameAs 3177652717 @default.
- W3177652717 citedByCount "0" @default.
- W3177652717 crossrefType "posted-content" @default.
- W3177652717 hasAuthorship W3177652717A5030505775 @default.
- W3177652717 hasAuthorship W3177652717A5039131712 @default.
- W3177652717 hasAuthorship W3177652717A5059094093 @default.
- W3177652717 hasAuthorship W3177652717A5061193324 @default.
- W3177652717 hasBestOaLocation W31776527171 @default.
- W3177652717 hasConcept C120314980 @default.
- W3177652717 hasConcept C121332964 @default.
- W3177652717 hasConcept C126255220 @default.
- W3177652717 hasConcept C134306372 @default.
- W3177652717 hasConcept C154945302 @default.
- W3177652717 hasConcept C185592680 @default.
- W3177652717 hasConcept C198531522 @default.
- W3177652717 hasConcept C26517878 @default.
- W3177652717 hasConcept C2778445095 @default.
- W3177652717 hasConcept C32848918 @default.
- W3177652717 hasConcept C33923547 @default.
- W3177652717 hasConcept C38652104 @default.
- W3177652717 hasConcept C41008148 @default.
- W3177652717 hasConcept C43617362 @default.
- W3177652717 hasConcept C62520636 @default.
- W3177652717 hasConcept C90119067 @default.
- W3177652717 hasConcept C90509273 @default.
- W3177652717 hasConcept C97541855 @default.
- W3177652717 hasConceptScore W3177652717C120314980 @default.
- W3177652717 hasConceptScore W3177652717C121332964 @default.
- W3177652717 hasConceptScore W3177652717C126255220 @default.
- W3177652717 hasConceptScore W3177652717C134306372 @default.
- W3177652717 hasConceptScore W3177652717C154945302 @default.
- W3177652717 hasConceptScore W3177652717C185592680 @default.
- W3177652717 hasConceptScore W3177652717C198531522 @default.
- W3177652717 hasConceptScore W3177652717C26517878 @default.
- W3177652717 hasConceptScore W3177652717C2778445095 @default.
- W3177652717 hasConceptScore W3177652717C32848918 @default.
- W3177652717 hasConceptScore W3177652717C33923547 @default.
- W3177652717 hasConceptScore W3177652717C38652104 @default.
- W3177652717 hasConceptScore W3177652717C41008148 @default.
- W3177652717 hasConceptScore W3177652717C43617362 @default.
- W3177652717 hasConceptScore W3177652717C62520636 @default.
- W3177652717 hasConceptScore W3177652717C90119067 @default.
- W3177652717 hasConceptScore W3177652717C90509273 @default.
- W3177652717 hasConceptScore W3177652717C97541855 @default.
- W3177652717 hasLocation W31776527171 @default.
- W3177652717 hasOpenAccess W3177652717 @default.
- W3177652717 hasPrimaryLocation W31776527171 @default.
- W3177652717 hasRelatedWork W1515308897 @default.
- W3177652717 hasRelatedWork W2973438361 @default.
- W3177652717 hasRelatedWork W3135457936 @default.