Matches in SemOpenAlex for { <https://semopenalex.org/work/W4362683553> ?p ?o ?g. }
Showing items 1 to 98 of
98
with 100 items per page.
- W4362683553 endingPage "15" @default.
- W4362683553 startingPage "1" @default.
- W4362683553 abstract "Reinforcement learning (RL) still suffers from the problem of sample inefficiency and struggles with the exploration issue, particularly in situations with long-delayed rewards, sparse rewards, and deep local optimum. Recently, learning from demonstration (LfD) paradigm was proposed to tackle this problem. However, these methods usually require a large number of demonstrations. In this study, we present a sample efficient teacher-advice mechanism with Gaussian process (TAG) by leveraging a few expert demonstrations. In TAG, a teacher model is built to provide both an advice action and its associated confidence value. Then, a guided policy is formulated to guide the agent in the exploration phase via the defined criteria. Through the TAG mechanism, the agent is capable of exploring the environment more intentionally. Moreover, with the confidence value, the guided policy can guide the agent precisely. Also, due to the strong generalization ability of Gaussian process, the teacher model can utilize the demonstrations more effectively. Therefore, substantial improvement in performance and sample efficiency can be attained. Considerable experiments on sparse reward environments demonstrate that the TAG mechanism can help typical RL algorithms achieve significant performance gains. In addition, the TAG mechanism with soft actor-critic algorithm (TAG-SAC) attains the state-of-the-art performance over other LfD counterparts on several delayed reward and complicated continuous control environments." @default.
- W4362683553 created "2023-04-08" @default.
- W4362683553 creator A5012712688 @default.
- W4362683553 creator A5021632916 @default.
- W4362683553 creator A5027883397 @default.
- W4362683553 creator A5037942759 @default.
- W4362683553 creator A5045224722 @default.
- W4362683553 creator A5077595112 @default.
- W4362683553 creator A5085028455 @default.
- W4362683553 creator A5086548631 @default.
- W4362683553 date "2023-01-01" @default.
- W4362683553 modified "2023-10-14" @default.
- W4362683553 title "TAG: Teacher-Advice Mechanism With Gaussian Process for Reinforcement Learning" @default.
- W4362683553 doi "https://doi.org/10.1109/tnnls.2023.3262956" @default.
- W4362683553 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/37023165" @default.
- W4362683553 hasPublicationYear "2023" @default.
- W4362683553 type Work @default.
- W4362683553 citedByCount "0" @default.
- W4362683553 crossrefType "journal-article" @default.
- W4362683553 hasAuthorship W4362683553A5012712688 @default.
- W4362683553 hasAuthorship W4362683553A5021632916 @default.
- W4362683553 hasAuthorship W4362683553A5027883397 @default.
- W4362683553 hasAuthorship W4362683553A5037942759 @default.
- W4362683553 hasAuthorship W4362683553A5045224722 @default.
- W4362683553 hasAuthorship W4362683553A5077595112 @default.
- W4362683553 hasAuthorship W4362683553A5085028455 @default.
- W4362683553 hasAuthorship W4362683553A5086548631 @default.
- W4362683553 hasConcept C111472728 @default.
- W4362683553 hasConcept C111919701 @default.
- W4362683553 hasConcept C119857082 @default.
- W4362683553 hasConcept C121332964 @default.
- W4362683553 hasConcept C134306372 @default.
- W4362683553 hasConcept C138885662 @default.
- W4362683553 hasConcept C154945302 @default.
- W4362683553 hasConcept C162324750 @default.
- W4362683553 hasConcept C163716315 @default.
- W4362683553 hasConcept C175444787 @default.
- W4362683553 hasConcept C177148314 @default.
- W4362683553 hasConcept C185592680 @default.
- W4362683553 hasConcept C198531522 @default.
- W4362683553 hasConcept C199360897 @default.
- W4362683553 hasConcept C2776291640 @default.
- W4362683553 hasConcept C2778869765 @default.
- W4362683553 hasConcept C2779955035 @default.
- W4362683553 hasConcept C33923547 @default.
- W4362683553 hasConcept C41008148 @default.
- W4362683553 hasConcept C43617362 @default.
- W4362683553 hasConcept C61326573 @default.
- W4362683553 hasConcept C62520636 @default.
- W4362683553 hasConcept C89611455 @default.
- W4362683553 hasConcept C97541855 @default.
- W4362683553 hasConcept C98045186 @default.
- W4362683553 hasConceptScore W4362683553C111472728 @default.
- W4362683553 hasConceptScore W4362683553C111919701 @default.
- W4362683553 hasConceptScore W4362683553C119857082 @default.
- W4362683553 hasConceptScore W4362683553C121332964 @default.
- W4362683553 hasConceptScore W4362683553C134306372 @default.
- W4362683553 hasConceptScore W4362683553C138885662 @default.
- W4362683553 hasConceptScore W4362683553C154945302 @default.
- W4362683553 hasConceptScore W4362683553C162324750 @default.
- W4362683553 hasConceptScore W4362683553C163716315 @default.
- W4362683553 hasConceptScore W4362683553C175444787 @default.
- W4362683553 hasConceptScore W4362683553C177148314 @default.
- W4362683553 hasConceptScore W4362683553C185592680 @default.
- W4362683553 hasConceptScore W4362683553C198531522 @default.
- W4362683553 hasConceptScore W4362683553C199360897 @default.
- W4362683553 hasConceptScore W4362683553C2776291640 @default.
- W4362683553 hasConceptScore W4362683553C2778869765 @default.
- W4362683553 hasConceptScore W4362683553C2779955035 @default.
- W4362683553 hasConceptScore W4362683553C33923547 @default.
- W4362683553 hasConceptScore W4362683553C41008148 @default.
- W4362683553 hasConceptScore W4362683553C43617362 @default.
- W4362683553 hasConceptScore W4362683553C61326573 @default.
- W4362683553 hasConceptScore W4362683553C62520636 @default.
- W4362683553 hasConceptScore W4362683553C89611455 @default.
- W4362683553 hasConceptScore W4362683553C97541855 @default.
- W4362683553 hasConceptScore W4362683553C98045186 @default.
- W4362683553 hasFunder F4320321001 @default.
- W4362683553 hasFunder F4320329791 @default.
- W4362683553 hasLocation W43626835531 @default.
- W4362683553 hasLocation W43626835532 @default.
- W4362683553 hasOpenAccess W4362683553 @default.
- W4362683553 hasPrimaryLocation W43626835531 @default.
- W4362683553 hasRelatedWork W2101651834 @default.
- W4362683553 hasRelatedWork W2739429571 @default.
- W4362683553 hasRelatedWork W2959276766 @default.
- W4362683553 hasRelatedWork W2961085424 @default.
- W4362683553 hasRelatedWork W2963749556 @default.
- W4362683553 hasRelatedWork W3074294383 @default.
- W4362683553 hasRelatedWork W4206669594 @default.
- W4362683553 hasRelatedWork W4287827094 @default.
- W4362683553 hasRelatedWork W4319083788 @default.
- W4362683553 hasRelatedWork W4377293004 @default.
- W4362683553 isParatext "false" @default.
- W4362683553 isRetracted "false" @default.
- W4362683553 workType "article" @default.