SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4366332462> ?p ?o ?g. }

Showing items 1 to 72 of 72 with 100 items per page.

W4366332462 endingPage "14" @default.
W4366332462 startingPage "1" @default.
W4366332462 abstract "Reinforcement learning (RL) with sparse and deceptive rewards is a significant challenge because nonzero rewards are rarely obtained, and hence, the gradient calculated by the agent can be stochastic and without valid information. Recent work demonstrates that using memory buffers of previous experiences can lead to a more efficient learning process. However, existing methods usually require these experiences to be successful and may overly exploit them, which can cause the agent to adopt suboptimal behaviors. This study develops an approach that exploits diverse past trajectories for faster and more efficient online RL, even if these trajectories are suboptimal or not highly rewarded. The proposed algorithm merges a policy improvement step with an additional policy exploration step by using offline demonstration data. The main contribution of this study is that by regarding diverse past trajectories as guidance, instead of imitating them, our method directs its policy to follow and expand past trajectories, while still being able to learn without rewards and gradually approach optimality. Furthermore, a novel diversity measurement is introduced to maintain the diversity of the team and regulate exploration. The proposed algorithm is evaluated on a series of discrete and continuous control tasks with sparse and deceptive rewards. In comparison with the existing RL methods, the experimental results indicate that our proposed algorithm is significantly better than the baseline methods in terms of diverse exploration and avoiding local optima." @default.
W4366332462 created "2023-04-20" @default.
W4366332462 creator A5032417256 @default.
W4366332462 creator A5051021582 @default.
W4366332462 creator A5059360884 @default.
W4366332462 creator A5072743477 @default.
W4366332462 date "2023-04-18" @default.
W4366332462 modified "2023-09-30" @default.
W4366332462 title "Learning Diverse Policies with Soft Self-Generated Guidance" @default.
W4366332462 cites W2145339207 @default.
W4366332462 cites W2257979135 @default.
W4366332462 cites W2595056910 @default.
W4366332462 cites W2788862220 @default.
W4366332462 cites W2966128956 @default.
W4366332462 doi "https://doi.org/10.1155/2023/4705291" @default.
W4366332462 hasPublicationYear "2023" @default.
W4366332462 type Work @default.
W4366332462 citedByCount "0" @default.
W4366332462 crossrefType "journal-article" @default.
W4366332462 hasAuthorship W4366332462A5032417256 @default.
W4366332462 hasAuthorship W4366332462A5051021582 @default.
W4366332462 hasAuthorship W4366332462A5059360884 @default.
W4366332462 hasAuthorship W4366332462A5072743477 @default.
W4366332462 hasBestOaLocation W43663324621 @default.
W4366332462 hasConcept C111919701 @default.
W4366332462 hasConcept C119857082 @default.
W4366332462 hasConcept C126255220 @default.
W4366332462 hasConcept C141934464 @default.
W4366332462 hasConcept C144024400 @default.
W4366332462 hasConcept C154945302 @default.
W4366332462 hasConcept C165696696 @default.
W4366332462 hasConcept C19165224 @default.
W4366332462 hasConcept C2781316041 @default.
W4366332462 hasConcept C33923547 @default.
W4366332462 hasConcept C38652104 @default.
W4366332462 hasConcept C41008148 @default.
W4366332462 hasConcept C97541855 @default.
W4366332462 hasConcept C98045186 @default.
W4366332462 hasConceptScore W4366332462C111919701 @default.
W4366332462 hasConceptScore W4366332462C119857082 @default.
W4366332462 hasConceptScore W4366332462C126255220 @default.
W4366332462 hasConceptScore W4366332462C141934464 @default.
W4366332462 hasConceptScore W4366332462C144024400 @default.
W4366332462 hasConceptScore W4366332462C154945302 @default.
W4366332462 hasConceptScore W4366332462C165696696 @default.
W4366332462 hasConceptScore W4366332462C19165224 @default.
W4366332462 hasConceptScore W4366332462C2781316041 @default.
W4366332462 hasConceptScore W4366332462C33923547 @default.
W4366332462 hasConceptScore W4366332462C38652104 @default.
W4366332462 hasConceptScore W4366332462C41008148 @default.
W4366332462 hasConceptScore W4366332462C97541855 @default.
W4366332462 hasConceptScore W4366332462C98045186 @default.
W4366332462 hasFunder F4320335777 @default.
W4366332462 hasLocation W43663324621 @default.
W4366332462 hasOpenAccess W4366332462 @default.
W4366332462 hasPrimaryLocation W43663324621 @default.
W4366332462 hasRelatedWork W1527191935 @default.
W4366332462 hasRelatedWork W2020491719 @default.
W4366332462 hasRelatedWork W2923653485 @default.
W4366332462 hasRelatedWork W2957776456 @default.
W4366332462 hasRelatedWork W2964604098 @default.
W4366332462 hasRelatedWork W2997512100 @default.
W4366332462 hasRelatedWork W3022038857 @default.
W4366332462 hasRelatedWork W4319083788 @default.
W4366332462 hasRelatedWork W4319773215 @default.
W4366332462 hasRelatedWork W4361026739 @default.
W4366332462 hasVolume "2023" @default.
W4366332462 isParatext "false" @default.
W4366332462 isRetracted "false" @default.
W4366332462 workType "article" @default.