Matches in SemOpenAlex for { <https://semopenalex.org/work/W2144672231> ?p ?o ?g. }
- W2144672231 endingPage "1274" @default.
- W2144672231 startingPage "1260" @default.
- W2144672231 abstract "We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space)." @default.
- W2144672231 created "2016-06-24" @default.
- W2144672231 creator A5008317106 @default.
- W2144672231 creator A5031787043 @default.
- W2144672231 creator A5088339142 @default.
- W2144672231 creator A5091103447 @default.
- W2144672231 date "2015-05-01" @default.
- W2144672231 modified "2023-10-01" @default.
- W2144672231 title "Distributed Policy Evaluation Under Multiple Behavior Strategies" @default.
- W2144672231 cites W1646707810 @default.
- W2144672231 cites W1863534978 @default.
- W2144672231 cites W1918371733 @default.
- W2144672231 cites W1971048324 @default.
- W2144672231 cites W1998172110 @default.
- W2144672231 cites W2007208291 @default.
- W2144672231 cites W2024307985 @default.
- W2144672231 cites W2029080014 @default.
- W2144672231 cites W2042664989 @default.
- W2144672231 cites W2044212084 @default.
- W2144672231 cites W2056182476 @default.
- W2144672231 cites W2071983464 @default.
- W2144672231 cites W2072054128 @default.
- W2144672231 cites W2086502731 @default.
- W2144672231 cites W2107396783 @default.
- W2144672231 cites W2118776392 @default.
- W2144672231 cites W2119596571 @default.
- W2144672231 cites W2121820607 @default.
- W2144672231 cites W2123183226 @default.
- W2144672231 cites W2139418546 @default.
- W2144672231 cites W2141870784 @default.
- W2144672231 cites W2153267861 @default.
- W2144672231 cites W2153368486 @default.
- W2144672231 cites W2154834860 @default.
- W2144672231 cites W2159752377 @default.
- W2144672231 cites W2334782222 @default.
- W2144672231 cites W2963157432 @default.
- W2144672231 cites W4238850922 @default.
- W2144672231 cites W4239240501 @default.
- W2144672231 cites W4250589301 @default.
- W2144672231 cites W5385147 @default.
- W2144672231 doi "https://doi.org/10.1109/tac.2014.2368731" @default.
- W2144672231 hasPublicationYear "2015" @default.
- W2144672231 type Work @default.
- W2144672231 sameAs 2144672231 @default.
- W2144672231 citedByCount "87" @default.
- W2144672231 countsByYear W21446722312013 @default.
- W2144672231 countsByYear W21446722312014 @default.
- W2144672231 countsByYear W21446722312015 @default.
- W2144672231 countsByYear W21446722312016 @default.
- W2144672231 countsByYear W21446722312017 @default.
- W2144672231 countsByYear W21446722312018 @default.
- W2144672231 countsByYear W21446722312019 @default.
- W2144672231 countsByYear W21446722312020 @default.
- W2144672231 countsByYear W21446722312021 @default.
- W2144672231 countsByYear W21446722312022 @default.
- W2144672231 countsByYear W21446722312023 @default.
- W2144672231 crossrefType "journal-article" @default.
- W2144672231 hasAuthorship W2144672231A5008317106 @default.
- W2144672231 hasAuthorship W2144672231A5031787043 @default.
- W2144672231 hasAuthorship W2144672231A5088339142 @default.
- W2144672231 hasAuthorship W2144672231A5091103447 @default.
- W2144672231 hasBestOaLocation W21446722312 @default.
- W2144672231 hasConcept C105795698 @default.
- W2144672231 hasConcept C112972136 @default.
- W2144672231 hasConcept C11413529 @default.
- W2144672231 hasConcept C119857082 @default.
- W2144672231 hasConcept C121955636 @default.
- W2144672231 hasConcept C126255220 @default.
- W2144672231 hasConcept C144133560 @default.
- W2144672231 hasConcept C154945302 @default.
- W2144672231 hasConcept C162324750 @default.
- W2144672231 hasConcept C196083921 @default.
- W2144672231 hasConcept C199360897 @default.
- W2144672231 hasConcept C2777027219 @default.
- W2144672231 hasConcept C2777303404 @default.
- W2144672231 hasConcept C33923547 @default.
- W2144672231 hasConcept C41008148 @default.
- W2144672231 hasConcept C45374587 @default.
- W2144672231 hasConcept C50522688 @default.
- W2144672231 hasConcept C72434380 @default.
- W2144672231 hasConcept C97541855 @default.
- W2144672231 hasConceptScore W2144672231C105795698 @default.
- W2144672231 hasConceptScore W2144672231C112972136 @default.
- W2144672231 hasConceptScore W2144672231C11413529 @default.
- W2144672231 hasConceptScore W2144672231C119857082 @default.
- W2144672231 hasConceptScore W2144672231C121955636 @default.
- W2144672231 hasConceptScore W2144672231C126255220 @default.
- W2144672231 hasConceptScore W2144672231C144133560 @default.
- W2144672231 hasConceptScore W2144672231C154945302 @default.
- W2144672231 hasConceptScore W2144672231C162324750 @default.
- W2144672231 hasConceptScore W2144672231C196083921 @default.
- W2144672231 hasConceptScore W2144672231C199360897 @default.
- W2144672231 hasConceptScore W2144672231C2777027219 @default.
- W2144672231 hasConceptScore W2144672231C2777303404 @default.
- W2144672231 hasConceptScore W2144672231C33923547 @default.
- W2144672231 hasConceptScore W2144672231C41008148 @default.
- W2144672231 hasConceptScore W2144672231C45374587 @default.
- W2144672231 hasConceptScore W2144672231C50522688 @default.