Matches in SemOpenAlex for { <https://semopenalex.org/work/W3123636359> ?p ?o ?g. }
Showing items 1 to 96 of
96
with 100 items per page.
- W3123636359 abstract "Recent advances in multi-agent reinforcement learning have been largely limited in training one model from scratch for every new task. The limitation is due to the restricted model architecture related to fixed input and output dimensions. This hinders the experience accumulation and transfer of the learned agent over tasks with diverse levels of difficulty (e.g. 3 vs 3 or 5 vs 6 multi-agent games). In this paper, we make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks with the requirement of different observation and action configurations. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy by decoupling the policy distribution from the intertwined input observation with an importance weight measured by the merits of the self-attention mechanism. Compared to a standard transformer block, the proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable. UPDeT is general enough to be plugged into any multi-agent reinforcement learning pipeline and equip them with strong generalization abilities that enables the handling of multiple tasks at a time. Extensive experiments on large-scale SMAC multi-agent competitive games demonstrate that the proposed UPDeT-based multi-agent reinforcement learning achieves significant results relative to state-of-the-art approaches, demonstrating advantageous transfer capability in terms of both performance and training speed (10 times faster)." @default.
- W3123636359 created "2021-02-01" @default.
- W3123636359 creator A5027233873 @default.
- W3123636359 creator A5034967388 @default.
- W3123636359 creator A5047878798 @default.
- W3123636359 creator A5068949474 @default.
- W3123636359 date "2021-01-20" @default.
- W3123636359 modified "2023-09-27" @default.
- W3123636359 title "UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers" @default.
- W3123636359 cites W110451278 @default.
- W3123636359 cites W1457482454 @default.
- W3123636359 cites W1641379095 @default.
- W3123636359 cites W1924770834 @default.
- W3123636359 cites W2064675550 @default.
- W3123636359 cites W2096001037 @default.
- W3123636359 cites W2097381042 @default.
- W3123636359 cites W2121092017 @default.
- W3123636359 cites W2145339207 @default.
- W3123636359 cites W2174786457 @default.
- W3123636359 cites W2342840547 @default.
- W3123636359 cites W2413794162 @default.
- W3123636359 cites W2617547828 @default.
- W3123636359 cites W2626637010 @default.
- W3123636359 cites W2747213132 @default.
- W3123636359 cites W2756196406 @default.
- W3123636359 cites W2771201900 @default.
- W3123636359 cites W2785315072 @default.
- W3123636359 cites W2895675617 @default.
- W3123636359 cites W2921955147 @default.
- W3123636359 cites W2946606218 @default.
- W3123636359 cites W2949600457 @default.
- W3123636359 cites W2951682727 @default.
- W3123636359 cites W2951984055 @default.
- W3123636359 cites W2963091558 @default.
- W3123636359 cites W2963403868 @default.
- W3123636359 cites W2963890729 @default.
- W3123636359 cites W2970272688 @default.
- W3123636359 cites W2970514967 @default.
- W3123636359 cites W2982316857 @default.
- W3123636359 cites W2997536466 @default.
- W3123636359 cites W3033481391 @default.
- W3123636359 cites W3039208705 @default.
- W3123636359 cites W3093287223 @default.
- W3123636359 cites W3094349299 @default.
- W3123636359 doi "https://doi.org/10.48550/arxiv.2101.08001" @default.
- W3123636359 hasPublicationYear "2021" @default.
- W3123636359 type Work @default.
- W3123636359 sameAs 3123636359 @default.
- W3123636359 citedByCount "3" @default.
- W3123636359 countsByYear W31236363592021 @default.
- W3123636359 crossrefType "posted-content" @default.
- W3123636359 hasAuthorship W3123636359A5027233873 @default.
- W3123636359 hasAuthorship W3123636359A5034967388 @default.
- W3123636359 hasAuthorship W3123636359A5047878798 @default.
- W3123636359 hasAuthorship W3123636359A5068949474 @default.
- W3123636359 hasBestOaLocation W31236363591 @default.
- W3123636359 hasConcept C119599485 @default.
- W3123636359 hasConcept C127413603 @default.
- W3123636359 hasConcept C133731056 @default.
- W3123636359 hasConcept C154945302 @default.
- W3123636359 hasConcept C165801399 @default.
- W3123636359 hasConcept C205606062 @default.
- W3123636359 hasConcept C41008148 @default.
- W3123636359 hasConcept C66322947 @default.
- W3123636359 hasConcept C66938386 @default.
- W3123636359 hasConcept C67203356 @default.
- W3123636359 hasConcept C97541855 @default.
- W3123636359 hasConceptScore W3123636359C119599485 @default.
- W3123636359 hasConceptScore W3123636359C127413603 @default.
- W3123636359 hasConceptScore W3123636359C133731056 @default.
- W3123636359 hasConceptScore W3123636359C154945302 @default.
- W3123636359 hasConceptScore W3123636359C165801399 @default.
- W3123636359 hasConceptScore W3123636359C205606062 @default.
- W3123636359 hasConceptScore W3123636359C41008148 @default.
- W3123636359 hasConceptScore W3123636359C66322947 @default.
- W3123636359 hasConceptScore W3123636359C66938386 @default.
- W3123636359 hasConceptScore W3123636359C67203356 @default.
- W3123636359 hasConceptScore W3123636359C97541855 @default.
- W3123636359 hasLocation W31236363591 @default.
- W3123636359 hasLocation W31236363592 @default.
- W3123636359 hasOpenAccess W3123636359 @default.
- W3123636359 hasPrimaryLocation W31236363591 @default.
- W3123636359 hasRelatedWork W2923653485 @default.
- W3123636359 hasRelatedWork W2952472710 @default.
- W3123636359 hasRelatedWork W2957776456 @default.
- W3123636359 hasRelatedWork W3005560120 @default.
- W3123636359 hasRelatedWork W3037422413 @default.
- W3123636359 hasRelatedWork W4206669594 @default.
- W3123636359 hasRelatedWork W4224287422 @default.
- W3123636359 hasRelatedWork W4255994452 @default.
- W3123636359 hasRelatedWork W4319773215 @default.
- W3123636359 hasRelatedWork W4361026739 @default.
- W3123636359 isParatext "false" @default.
- W3123636359 isRetracted "false" @default.
- W3123636359 magId "3123636359" @default.
- W3123636359 workType "article" @default.