Matches in SemOpenAlex for { <https://semopenalex.org/work/W89123072> ?p ?o ?g. }
Showing items 1 to 96 of
96
with 100 items per page.
- W89123072 endingPage "34" @default.
- W89123072 startingPage "23" @default.
- W89123072 abstract "We investigate repeated matrix games with stochastic players as a microcosm for studying dynamic, multi-agent interactions using the Stochastic Direct Reinforcement (SDR) policy gradient algorithm. SDR is a generalization of Recurrent Reinforcement Learning (RRL) that supports stochastic policies. Unlike other RL algorithms, SDR and RRL use recurrent policy gradients to properly address temporal credit assignment resulting from recurrent structure. Our main goals in this paper are to (1) distinguish recurrent memory from standard, non-recurrent memory for policy gradient RL, (2) compare SDR with Q-type learning methods for simple games, (3) distinguish reactive from endogenous dynamical agent behavior and (4) explore the use of recurrent learning for interacting, dynamic agents. We find that SDR players learn much faster and hence outperform recently-proposed Q-type learners for the simple game Rock, Paper, Scissors (RPS). With more complex, dynamic SDR players and opponents, we demonstrate that recurrent representations and SDR’s recurrent policy gradients yield better performance than non-recurrent players. For the Iterated Prisoners Dilemma, we show that non-recurrent SDR agents learn only to defect (Nash equilibrium), while SDR agents with recurrent gradients can learn a variety of interesting behaviors, including cooperation." @default.
- W89123072 created "2016-06-24" @default.
- W89123072 creator A5036144965 @default.
- W89123072 creator A5049793936 @default.
- W89123072 creator A5056861449 @default.
- W89123072 creator A5073841545 @default.
- W89123072 date "2004-01-01" @default.
- W89123072 modified "2023-10-18" @default.
- W89123072 title "Stochastic Direct Reinforcement: Application to Simple Games with Recurrence." @default.
- W89123072 cites W1513468570 @default.
- W89123072 cites W1514115759 @default.
- W89123072 cites W1542941925 @default.
- W89123072 cites W1579312135 @default.
- W89123072 cites W1594297126 @default.
- W89123072 cites W1601974704 @default.
- W89123072 cites W1607392272 @default.
- W89123072 cites W1977051850 @default.
- W89123072 cites W2014748701 @default.
- W89123072 cites W2018042870 @default.
- W89123072 cites W2053616263 @default.
- W89123072 cites W2097498347 @default.
- W89123072 cites W2104602264 @default.
- W89123072 cites W2119717200 @default.
- W89123072 cites W2120327309 @default.
- W89123072 cites W2145397675 @default.
- W89123072 cites W2159813604 @default.
- W89123072 cites W2169015875 @default.
- W89123072 cites W2345941287 @default.
- W89123072 hasPublicationYear "2004" @default.
- W89123072 type Work @default.
- W89123072 sameAs 89123072 @default.
- W89123072 citedByCount "10" @default.
- W89123072 crossrefType "proceedings-article" @default.
- W89123072 hasAuthorship W89123072A5036144965 @default.
- W89123072 hasAuthorship W89123072A5049793936 @default.
- W89123072 hasAuthorship W89123072A5056861449 @default.
- W89123072 hasAuthorship W89123072A5073841545 @default.
- W89123072 hasConcept C111472728 @default.
- W89123072 hasConcept C126255220 @default.
- W89123072 hasConcept C134306372 @default.
- W89123072 hasConcept C138885662 @default.
- W89123072 hasConcept C140479938 @default.
- W89123072 hasConcept C144237770 @default.
- W89123072 hasConcept C147168706 @default.
- W89123072 hasConcept C154945302 @default.
- W89123072 hasConcept C177148314 @default.
- W89123072 hasConcept C2780586882 @default.
- W89123072 hasConcept C33923547 @default.
- W89123072 hasConcept C41008148 @default.
- W89123072 hasConcept C46814582 @default.
- W89123072 hasConcept C50644808 @default.
- W89123072 hasConcept C97541855 @default.
- W89123072 hasConceptScore W89123072C111472728 @default.
- W89123072 hasConceptScore W89123072C126255220 @default.
- W89123072 hasConceptScore W89123072C134306372 @default.
- W89123072 hasConceptScore W89123072C138885662 @default.
- W89123072 hasConceptScore W89123072C140479938 @default.
- W89123072 hasConceptScore W89123072C144237770 @default.
- W89123072 hasConceptScore W89123072C147168706 @default.
- W89123072 hasConceptScore W89123072C154945302 @default.
- W89123072 hasConceptScore W89123072C177148314 @default.
- W89123072 hasConceptScore W89123072C2780586882 @default.
- W89123072 hasConceptScore W89123072C33923547 @default.
- W89123072 hasConceptScore W89123072C41008148 @default.
- W89123072 hasConceptScore W89123072C46814582 @default.
- W89123072 hasConceptScore W89123072C50644808 @default.
- W89123072 hasConceptScore W89123072C97541855 @default.
- W89123072 hasLocation W891230721 @default.
- W89123072 hasOpenAccess W89123072 @default.
- W89123072 hasPrimaryLocation W891230721 @default.
- W89123072 hasRelatedWork W1515851193 @default.
- W89123072 hasRelatedWork W1538558539 @default.
- W89123072 hasRelatedWork W1592847719 @default.
- W89123072 hasRelatedWork W1964886424 @default.
- W89123072 hasRelatedWork W1973039793 @default.
- W89123072 hasRelatedWork W1977261815 @default.
- W89123072 hasRelatedWork W2082244867 @default.
- W89123072 hasRelatedWork W2087486726 @default.
- W89123072 hasRelatedWork W2121863487 @default.
- W89123072 hasRelatedWork W2126172312 @default.
- W89123072 hasRelatedWork W2164823136 @default.
- W89123072 hasRelatedWork W2249082738 @default.
- W89123072 hasRelatedWork W2289194510 @default.
- W89123072 hasRelatedWork W2783154620 @default.
- W89123072 hasRelatedWork W2910246453 @default.
- W89123072 hasRelatedWork W2921843114 @default.
- W89123072 hasRelatedWork W2941095974 @default.
- W89123072 hasRelatedWork W3093099641 @default.
- W89123072 hasRelatedWork W3105688066 @default.
- W89123072 hasRelatedWork W3207544809 @default.
- W89123072 isParatext "false" @default.
- W89123072 isRetracted "false" @default.
- W89123072 magId "89123072" @default.
- W89123072 workType "article" @default.