Matches in SemOpenAlex for { <https://semopenalex.org/work/W3196617346> ?p ?o ?g. }
- W3196617346 abstract "Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimal Shaping Algorithm (ROSA), an automated RS framework in which the shaping-reward function is constructed in a novel Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards and their optimal values while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which easily adopts existing RL algorithms, learns to construct a shaping-reward function that is tailored to the task thus ensuring efficient convergence to high performance policies. We demonstrate ROSA's congenial properties in three carefully designed experiments and show its superior performance against state-of-the-art RS algorithms in challenging sparse reward environments." @default.
- W3196617346 created "2021-09-13" @default.
- W3196617346 creator A5001411588 @default.
- W3196617346 creator A5002080576 @default.
- W3196617346 creator A5006156363 @default.
- W3196617346 creator A5011090841 @default.
- W3196617346 creator A5019611350 @default.
- W3196617346 creator A5024230752 @default.
- W3196617346 creator A5034601483 @default.
- W3196617346 creator A5071355722 @default.
- W3196617346 creator A5071666892 @default.
- W3196617346 creator A5087452149 @default.
- W3196617346 creator A5090073634 @default.
- W3196617346 date "2021-03-16" @default.
- W3196617346 modified "2023-09-26" @default.
- W3196617346 title "Learning to Shape Rewards using a Game of Switching Controls." @default.
- W3196617346 cites W119236796 @default.
- W3196617346 cites W1520597402 @default.
- W3196617346 cites W1528676759 @default.
- W3196617346 cites W1576580777 @default.
- W3196617346 cites W1710476689 @default.
- W3196617346 cites W1777239053 @default.
- W3196617346 cites W1988526405 @default.
- W3196617346 cites W2005808873 @default.
- W3196617346 cites W2006655145 @default.
- W3196617346 cites W2057913812 @default.
- W3196617346 cites W2095564494 @default.
- W3196617346 cites W2121863487 @default.
- W3196617346 cites W2141436719 @default.
- W3196617346 cites W2143435603 @default.
- W3196617346 cites W2151382427 @default.
- W3196617346 cites W2151416233 @default.
- W3196617346 cites W2156168464 @default.
- W3196617346 cites W2165131254 @default.
- W3196617346 cites W2202549229 @default.
- W3196617346 cites W2417786368 @default.
- W3196617346 cites W2481567506 @default.
- W3196617346 cites W2596982695 @default.
- W3196617346 cites W2614839826 @default.
- W3196617346 cites W2625366419 @default.
- W3196617346 cites W2736601468 @default.
- W3196617346 cites W2756196406 @default.
- W3196617346 cites W2795136206 @default.
- W3196617346 cites W2796835114 @default.
- W3196617346 cites W2883804791 @default.
- W3196617346 cites W2899205164 @default.
- W3196617346 cites W2902907165 @default.
- W3196617346 cites W2911385740 @default.
- W3196617346 cites W2911809292 @default.
- W3196617346 cites W2913350117 @default.
- W3196617346 cites W2938321354 @default.
- W3196617346 cites W2950530237 @default.
- W3196617346 cites W2984524734 @default.
- W3196617346 cites W2991046523 @default.
- W3196617346 cites W2996726407 @default.
- W3196617346 cites W2997072274 @default.
- W3196617346 cites W3028695364 @default.
- W3196617346 cites W3035569762 @default.
- W3196617346 cites W3089723243 @default.
- W3196617346 cites W3104193096 @default.
- W3196617346 cites W3107615218 @default.
- W3196617346 cites W3110981250 @default.
- W3196617346 cites W3138460475 @default.
- W3196617346 cites W3139306708 @default.
- W3196617346 cites W3152777888 @default.
- W3196617346 cites W779494576 @default.
- W3196617346 hasPublicationYear "2021" @default.
- W3196617346 type Work @default.
- W3196617346 sameAs 3196617346 @default.
- W3196617346 citedByCount "0" @default.
- W3196617346 crossrefType "posted-content" @default.
- W3196617346 hasAuthorship W3196617346A5001411588 @default.
- W3196617346 hasAuthorship W3196617346A5002080576 @default.
- W3196617346 hasAuthorship W3196617346A5006156363 @default.
- W3196617346 hasAuthorship W3196617346A5011090841 @default.
- W3196617346 hasAuthorship W3196617346A5019611350 @default.
- W3196617346 hasAuthorship W3196617346A5024230752 @default.
- W3196617346 hasAuthorship W3196617346A5034601483 @default.
- W3196617346 hasAuthorship W3196617346A5071355722 @default.
- W3196617346 hasAuthorship W3196617346A5071666892 @default.
- W3196617346 hasAuthorship W3196617346A5087452149 @default.
- W3196617346 hasAuthorship W3196617346A5090073634 @default.
- W3196617346 hasConcept C105795698 @default.
- W3196617346 hasConcept C106189395 @default.
- W3196617346 hasConcept C11413529 @default.
- W3196617346 hasConcept C119857082 @default.
- W3196617346 hasConcept C127413603 @default.
- W3196617346 hasConcept C14036430 @default.
- W3196617346 hasConcept C154945302 @default.
- W3196617346 hasConcept C159886148 @default.
- W3196617346 hasConcept C162324750 @default.
- W3196617346 hasConcept C199360897 @default.
- W3196617346 hasConcept C201995342 @default.
- W3196617346 hasConcept C2777303404 @default.
- W3196617346 hasConcept C2779436431 @default.
- W3196617346 hasConcept C2780451532 @default.
- W3196617346 hasConcept C2780801425 @default.
- W3196617346 hasConcept C33923547 @default.
- W3196617346 hasConcept C41008148 @default.
- W3196617346 hasConcept C48103436 @default.