Matches in SemOpenAlex for { <https://semopenalex.org/work/W3115706066> ?p ?o ?g. }
Showing items 1 to 98 of
98
with 100 items per page.
- W3115706066 abstract "Self-imitation learning is a Reinforcement Learning (RL) method that encourages actions whose returns were higher than expected, which helps in hard exploration and sparse reward problems. It was shown to improve the performance of on-policy actor-critic methods in several discrete control tasks. Nevertheless, applying self-imitation to the mostly action-value based off-policy RL methods is not straightforward. We propose SAIL, a novel generalization of self-imitation learning for off-policy RL, based on a modification of the Bellman optimality operator that we connect to Advantage Learning. Crucially, our method mitigates the problem of stale returns by choosing the most optimistic return estimate between the observed return and the current action-value for self-imitation. We demonstrate the empirical effectiveness of SAIL on the Arcade Learning Environment, with a focus on hard exploration games." @default.
- W3115706066 created "2021-01-05" @default.
- W3115706066 creator A5004267040 @default.
- W3115706066 creator A5065100569 @default.
- W3115706066 creator A5087706654 @default.
- W3115706066 date "2021-05-03" @default.
- W3115706066 modified "2023-10-13" @default.
- W3115706066 title "Self-Imitation Advantage Learning" @default.
- W3115706066 cites W106792269 @default.
- W3115706066 cites W1569296262 @default.
- W3115706066 cites W1757796397 @default.
- W3115706066 cites W1796544916 @default.
- W3115706066 cites W2121863487 @default.
- W3115706066 cites W2141559645 @default.
- W3115706066 cites W2145339207 @default.
- W3115706066 cites W2150459019 @default.
- W3115706066 cites W2155968351 @default.
- W3115706066 cites W2157864803 @default.
- W3115706066 cites W2173564293 @default.
- W3115706066 cites W2201581102 @default.
- W3115706066 cites W2334782222 @default.
- W3115706066 cites W2553109721 @default.
- W3115706066 cites W2561776174 @default.
- W3115706066 cites W2614839826 @default.
- W3115706066 cites W2733961795 @default.
- W3115706066 cites W2890148520 @default.
- W3115706066 cites W2902982219 @default.
- W3115706066 cites W2905342215 @default.
- W3115706066 cites W2908064123 @default.
- W3115706066 cites W2914261249 @default.
- W3115706066 cites W2942608247 @default.
- W3115706066 cites W2952412806 @default.
- W3115706066 cites W2962715211 @default.
- W3115706066 cites W2962878825 @default.
- W3115706066 cites W2962902376 @default.
- W3115706066 cites W2963276097 @default.
- W3115706066 cites W2963277051 @default.
- W3115706066 cites W2963403143 @default.
- W3115706066 cites W2963423916 @default.
- W3115706066 cites W2964001908 @default.
- W3115706066 cites W2964067469 @default.
- W3115706066 cites W2964121744 @default.
- W3115706066 cites W2964174623 @default.
- W3115706066 cites W2964185768 @default.
- W3115706066 cites W2964291307 @default.
- W3115706066 cites W2965435131 @default.
- W3115706066 cites W2970190219 @default.
- W3115706066 cites W2970868077 @default.
- W3115706066 cites W2971204130 @default.
- W3115706066 cites W2990181595 @default.
- W3115706066 cites W2996665523 @default.
- W3115706066 cites W2996695841 @default.
- W3115706066 cites W3013618273 @default.
- W3115706066 cites W3022566517 @default.
- W3115706066 cites W3035542676 @default.
- W3115706066 cites W3099050578 @default.
- W3115706066 cites W3103780890 @default.
- W3115706066 hasPublicationYear "2021" @default.
- W3115706066 type Work @default.
- W3115706066 sameAs 3115706066 @default.
- W3115706066 citedByCount "1" @default.
- W3115706066 countsByYear W31157060662021 @default.
- W3115706066 crossrefType "proceedings-article" @default.
- W3115706066 hasAuthorship W3115706066A5004267040 @default.
- W3115706066 hasAuthorship W3115706066A5065100569 @default.
- W3115706066 hasAuthorship W3115706066A5087706654 @default.
- W3115706066 hasBestOaLocation W31157060661 @default.
- W3115706066 hasConcept C107457646 @default.
- W3115706066 hasConcept C126388530 @default.
- W3115706066 hasConcept C154945302 @default.
- W3115706066 hasConcept C15744967 @default.
- W3115706066 hasConcept C41008148 @default.
- W3115706066 hasConcept C77805123 @default.
- W3115706066 hasConceptScore W3115706066C107457646 @default.
- W3115706066 hasConceptScore W3115706066C126388530 @default.
- W3115706066 hasConceptScore W3115706066C154945302 @default.
- W3115706066 hasConceptScore W3115706066C15744967 @default.
- W3115706066 hasConceptScore W3115706066C41008148 @default.
- W3115706066 hasConceptScore W3115706066C77805123 @default.
- W3115706066 hasLocation W31157060661 @default.
- W3115706066 hasLocation W31157060662 @default.
- W3115706066 hasLocation W31157060663 @default.
- W3115706066 hasOpenAccess W3115706066 @default.
- W3115706066 hasPrimaryLocation W31157060661 @default.
- W3115706066 hasRelatedWork W1531601525 @default.
- W3115706066 hasRelatedWork W2748952813 @default.
- W3115706066 hasRelatedWork W2758277628 @default.
- W3115706066 hasRelatedWork W2899084033 @default.
- W3115706066 hasRelatedWork W2948807893 @default.
- W3115706066 hasRelatedWork W3173606202 @default.
- W3115706066 hasRelatedWork W3183948672 @default.
- W3115706066 hasRelatedWork W4387497383 @default.
- W3115706066 hasRelatedWork W2778153218 @default.
- W3115706066 hasRelatedWork W3110381201 @default.
- W3115706066 isParatext "false" @default.
- W3115706066 isRetracted "false" @default.
- W3115706066 magId "3115706066" @default.
- W3115706066 workType "article" @default.