SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W3090448441> ?p ?o ?g. }

Showing items 1 to 75 of 75 with 100 items per page.

W3090448441 abstract "Author(s): Cassano, Lucas | Advisor(s): Sayed, Ali H | Abstract: Reinforcement learning (RL) is a powerful machine learning paradigm that studies the interaction between a single agent with an unknown environment. A plethora of applications fit into the RL framework, however, in many cases of interest, a team of agents will need to interact with the environment and with each other to achieve a common goal. This is the object study of collaborative multi-agent RL (MARL).Several challenges arise when considering collaborative MARL. One of these challenges is decentralization. In many cases, due to design constraints, it is undesirable or inconvenient to constantly relay data between agents and a centralized location. Therefore, fully distributed solutions become preferable. The first part of this dissertation addresses the challenge of designing fully decentralized MARL algorithms. We consider two problems: policy evaluation and policy optimization. In the policy evaluation problem, the objective is to estimate the performance of a target team policy in a particular environment. This problem has been studied before for the case with streaming data, however, in most implementations the target policy is evaluated using a finite data set. For this case, existing algorithms guarantee convergence at a sub-linear rate. In this dissertation we introduce Fast Diffusion for Policy Evaluation (FDPE), an algorithm that converges at linear rate for the finite data set case. We then consider the policy optimization problem, where the objective is for all agents to learn an optimal team policy. This problem has also been studied recently, however, existing solutions are data inefficient and converge to Nash equilibria (whose performance can be catastrophically bad) as opposed to team optimal policies. For this case we introduce the Diffusion for Team Policy Optimization (DTPO) algorithm. DTPO is more data efficient than previous algorithms and does not converge to Nash equilibria. For both of these cases, we provide experimental studies that show the effectiveness of the proposed methods.Another challenge that arises in collaborative MARL, which is orthogonal to the decentralization problem, is that of scalability. The parameters that need to be estimated when full team policies are learned, grow exponentially with the number of agents. Hence, algorithms that learn joint team policies quickly become intractable. A solution to this problem is for each agent to learn an individual policy, such that the resulting joint team policy is optimal. This problem has been the object of much research lately. However, most solution methods are data inefficient and often make unrealistic assumptions that greatly limit the applicability of these approaches. To address this problem we introduce Logical Team Q-learning (LTQL), an algorithm that learns factored policies in a data efficient manner and is applicable to any cooperative MARL environment. We show that LTQL outperforms previous methods in a challenging predator-prey task.Another challenge is that of efficient exploration. This is a problem both in the single-agent and multi-agent settings, although in MARL it becomes more severe due to the larger state-action space. The challenge of deriving policies that are efficient at exploring the state space has been addressed in many recent works. However, most of these approaches rely on heuristics, and more importantly, they consider the problem of exploring the state space separately from that of learning an optimal policy (even though they are related, since the state-space is explored to collect data to learn an optimal policy). To address this challenge, we introduce the Information Seeking Learner (ISL), an algorithm that displays state of the art performance in difficult exploration benchmarks. The fundamental value of our work on exploration is that we take a fundamentally different approach from previous works. As opposed to earlier methods we consider the problem of exploring the state space and learning an optimal policy jointly. The main insight of our approach is that in RL, obtaining point estimates of the quantities of interest is not sufficient and confidence bound estimates are also necessary." @default.
W3090448441 created "2020-10-08" @default.
W3090448441 creator A5064217777 @default.
W3090448441 date "2020-01-01" @default.
W3090448441 modified "2023-09-27" @default.
W3090448441 title "Teamwork and Exploration in Reinforcement Learning" @default.
W3090448441 hasPublicationYear "2020" @default.
W3090448441 type Work @default.
W3090448441 sameAs 3090448441 @default.
W3090448441 citedByCount "0" @default.
W3090448441 crossrefType "journal-article" @default.
W3090448441 hasAuthorship W3090448441A5064217777 @default.
W3090448441 hasConcept C111226992 @default.
W3090448441 hasConcept C115903868 @default.
W3090448441 hasConcept C126255220 @default.
W3090448441 hasConcept C127413603 @default.
W3090448441 hasConcept C154945302 @default.
W3090448441 hasConcept C162324750 @default.
W3090448441 hasConcept C177264268 @default.
W3090448441 hasConcept C17744445 @default.
W3090448441 hasConcept C199360897 @default.
W3090448441 hasConcept C199539241 @default.
W3090448441 hasConcept C26713055 @default.
W3090448441 hasConcept C2777303404 @default.
W3090448441 hasConcept C33923547 @default.
W3090448441 hasConcept C41008148 @default.
W3090448441 hasConcept C42475967 @default.
W3090448441 hasConcept C50522688 @default.
W3090448441 hasConcept C539667460 @default.
W3090448441 hasConcept C97541855 @default.
W3090448441 hasConceptScore W3090448441C111226992 @default.
W3090448441 hasConceptScore W3090448441C115903868 @default.
W3090448441 hasConceptScore W3090448441C126255220 @default.
W3090448441 hasConceptScore W3090448441C127413603 @default.
W3090448441 hasConceptScore W3090448441C154945302 @default.
W3090448441 hasConceptScore W3090448441C162324750 @default.
W3090448441 hasConceptScore W3090448441C177264268 @default.
W3090448441 hasConceptScore W3090448441C17744445 @default.
W3090448441 hasConceptScore W3090448441C199360897 @default.
W3090448441 hasConceptScore W3090448441C199539241 @default.
W3090448441 hasConceptScore W3090448441C26713055 @default.
W3090448441 hasConceptScore W3090448441C2777303404 @default.
W3090448441 hasConceptScore W3090448441C33923547 @default.
W3090448441 hasConceptScore W3090448441C41008148 @default.
W3090448441 hasConceptScore W3090448441C42475967 @default.
W3090448441 hasConceptScore W3090448441C50522688 @default.
W3090448441 hasConceptScore W3090448441C539667460 @default.
W3090448441 hasConceptScore W3090448441C97541855 @default.
W3090448441 hasLocation W30904484411 @default.
W3090448441 hasOpenAccess W3090448441 @default.
W3090448441 hasPrimaryLocation W30904484411 @default.
W3090448441 hasRelatedWork W142858861 @default.
W3090448441 hasRelatedWork W1486271120 @default.
W3090448441 hasRelatedWork W149447503 @default.
W3090448441 hasRelatedWork W1505837856 @default.
W3090448441 hasRelatedWork W2122036682 @default.
W3090448441 hasRelatedWork W2287095650 @default.
W3090448441 hasRelatedWork W2524179627 @default.
W3090448441 hasRelatedWork W2787477569 @default.
W3090448441 hasRelatedWork W2799385102 @default.
W3090448441 hasRelatedWork W2890211540 @default.
W3090448441 hasRelatedWork W2902854091 @default.
W3090448441 hasRelatedWork W2952580032 @default.
W3090448441 hasRelatedWork W2962127055 @default.
W3090448441 hasRelatedWork W2963482581 @default.
W3090448441 hasRelatedWork W2970355847 @default.
W3090448441 hasRelatedWork W2990216309 @default.
W3090448441 hasRelatedWork W3011584947 @default.
W3090448441 hasRelatedWork W3197524599 @default.
W3090448441 hasRelatedWork W50830905 @default.
W3090448441 hasRelatedWork W637379431 @default.
W3090448441 isParatext "false" @default.
W3090448441 isRetracted "false" @default.
W3090448441 magId "3090448441" @default.
W3090448441 workType "article" @default.