Matches in SemOpenAlex for { <https://semopenalex.org/work/W3037924899> ?p ?o ?g. }
- W3037924899 abstract "Despite the fact that deep reinforcement learning (RL) has surpassed human-level performances in various tasks, it still has several fundamental challenges. First, most RL methods require intensive data from the exploration of the environment to achieve satisfactory performance. Second, the use of neural networks in RL renders it hard to interpret the internals of the system in a way that humans can understand. To address these two challenges, we propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations. Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm. We prove that in episodic RL, a finite reward automaton can express any non-Markovian bounded reward functions with finitely many reward values and approximate any non-Markovian bounded reward function (with infinitely many reward values) with arbitrary precision. We also provide a lower bound for the episode length such that the proposed RL approach almost surely converges to an optimal policy in the limit. We test this approach on two RL environments with non-Markovian reward functions, choosing a variety of tasks with increasing complexity for each environment. We compare our algorithm with the state-of-the-art RL algorithms for non-Markovian reward functions, such as Joint Inference of Reward machines and Policies for RL (JIRP), Learning Reward Machine (LRM), and Proximal Policy Optimization (PPO2). Our results show that our algorithm converges to an optimal policy faster than other baseline methods." @default.
- W3037924899 created "2020-07-02" @default.
- W3037924899 creator A5013789785 @default.
- W3037924899 creator A5035661649 @default.
- W3037924899 creator A5064701353 @default.
- W3037924899 creator A5068441112 @default.
- W3037924899 creator A5089731681 @default.
- W3037924899 date "2020-06-28" @default.
- W3037924899 modified "2023-09-27" @default.
- W3037924899 title "Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples" @default.
- W3037924899 cites W1494687478 @default.
- W3037924899 cites W1573881032 @default.
- W3037924899 cites W1989445634 @default.
- W3037924899 cites W2003250112 @default.
- W3037924899 cites W2339807279 @default.
- W3037924899 cites W2426537415 @default.
- W3037924899 cites W2474788619 @default.
- W3037924899 cites W2524638160 @default.
- W3037924899 cites W2531762107 @default.
- W3037924899 cites W2553882142 @default.
- W3037924899 cites W2566678492 @default.
- W3037924899 cites W2567705466 @default.
- W3037924899 cites W2604855040 @default.
- W3037924899 cites W2736601468 @default.
- W3037924899 cites W2741248519 @default.
- W3037924899 cites W2804948070 @default.
- W3037924899 cites W2807161818 @default.
- W3037924899 cites W2807270760 @default.
- W3037924899 cites W2808386811 @default.
- W3037924899 cites W2809461852 @default.
- W3037924899 cites W2887933419 @default.
- W3037924899 cites W2889713213 @default.
- W3037924899 cites W2963575966 @default.
- W3037924899 cites W2963778636 @default.
- W3037924899 cites W2964263874 @default.
- W3037924899 cites W2970673985 @default.
- W3037924899 cites W2972698871 @default.
- W3037924899 cites W2996875168 @default.
- W3037924899 cites W2997022832 @default.
- W3037924899 cites W3037755290 @default.
- W3037924899 cites W3100743289 @default.
- W3037924899 cites W3124474911 @default.
- W3037924899 cites W3166485685 @default.
- W3037924899 cites W3173218700 @default.
- W3037924899 cites W3195923140 @default.
- W3037924899 cites W3208221999 @default.
- W3037924899 doi "https://doi.org/10.48550/arxiv.2006.15714" @default.
- W3037924899 hasPublicationYear "2020" @default.
- W3037924899 type Work @default.
- W3037924899 sameAs 3037924899 @default.
- W3037924899 citedByCount "1" @default.
- W3037924899 countsByYear W30379248992021 @default.
- W3037924899 crossrefType "posted-content" @default.
- W3037924899 hasAuthorship W3037924899A5013789785 @default.
- W3037924899 hasAuthorship W3037924899A5035661649 @default.
- W3037924899 hasAuthorship W3037924899A5064701353 @default.
- W3037924899 hasAuthorship W3037924899A5068441112 @default.
- W3037924899 hasAuthorship W3037924899A5089731681 @default.
- W3037924899 hasBestOaLocation W30379248991 @default.
- W3037924899 hasConcept C11413529 @default.
- W3037924899 hasConcept C118615104 @default.
- W3037924899 hasConcept C134306372 @default.
- W3037924899 hasConcept C136197465 @default.
- W3037924899 hasConcept C154945302 @default.
- W3037924899 hasConcept C162838799 @default.
- W3037924899 hasConcept C167822520 @default.
- W3037924899 hasConcept C2776214188 @default.
- W3037924899 hasConcept C33923547 @default.
- W3037924899 hasConcept C34388435 @default.
- W3037924899 hasConcept C41008148 @default.
- W3037924899 hasConcept C80444323 @default.
- W3037924899 hasConcept C97541855 @default.
- W3037924899 hasConceptScore W3037924899C11413529 @default.
- W3037924899 hasConceptScore W3037924899C118615104 @default.
- W3037924899 hasConceptScore W3037924899C134306372 @default.
- W3037924899 hasConceptScore W3037924899C136197465 @default.
- W3037924899 hasConceptScore W3037924899C154945302 @default.
- W3037924899 hasConceptScore W3037924899C162838799 @default.
- W3037924899 hasConceptScore W3037924899C167822520 @default.
- W3037924899 hasConceptScore W3037924899C2776214188 @default.
- W3037924899 hasConceptScore W3037924899C33923547 @default.
- W3037924899 hasConceptScore W3037924899C34388435 @default.
- W3037924899 hasConceptScore W3037924899C41008148 @default.
- W3037924899 hasConceptScore W3037924899C80444323 @default.
- W3037924899 hasConceptScore W3037924899C97541855 @default.
- W3037924899 hasLocation W30379248991 @default.
- W3037924899 hasOpenAccess W3037924899 @default.
- W3037924899 hasPrimaryLocation W30379248991 @default.
- W3037924899 hasRelatedWork W1550626198 @default.
- W3037924899 hasRelatedWork W1598914363 @default.
- W3037924899 hasRelatedWork W1655879455 @default.
- W3037924899 hasRelatedWork W1699259381 @default.
- W3037924899 hasRelatedWork W1748450182 @default.
- W3037924899 hasRelatedWork W1893208258 @default.
- W3037924899 hasRelatedWork W2514527293 @default.
- W3037924899 hasRelatedWork W263936402 @default.
- W3037924899 hasRelatedWork W3037924899 @default.
- W3037924899 hasRelatedWork W3102251987 @default.
- W3037924899 isParatext "false" @default.
- W3037924899 isRetracted "false" @default.