Matches in SemOpenAlex for { <https://semopenalex.org/work/W3113190209> ?p ?o ?g. }
- W3113190209 abstract "Efficient exploration under sparse rewards remains a key challenge in deep reinforcement learning. To guide exploration, previous work makes extensive use of intrinsic reward (IR). There are many heuristics for IR, including visitation counts, curiosity, and state-difference. In this paper, we analyze the pros and cons of each method and propose the regulated difference of inverse visitation counts as a simple but effective criterion for IR. The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment. The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning. In comparison, the previous SoTA only solves 50% of the tasks. BeBold also achieves SoTA on multiple tasks in NetHack, a popular rogue-like game that contains more challenging procedurally-generated environments." @default.
- W3113190209 created "2020-12-21" @default.
- W3113190209 creator A5008951080 @default.
- W3113190209 creator A5025376043 @default.
- W3113190209 creator A5047285420 @default.
- W3113190209 creator A5049093671 @default.
- W3113190209 creator A5066028215 @default.
- W3113190209 creator A5072427753 @default.
- W3113190209 creator A5084821923 @default.
- W3113190209 date "2020-12-15" @default.
- W3113190209 modified "2023-10-16" @default.
- W3113190209 title "BeBold: Exploration Beyond the Boundary of Explored Regions" @default.
- W3113190209 cites W1554318634 @default.
- W3113190209 cites W1786044565 @default.
- W3113190209 cites W1966213627 @default.
- W3113190209 cites W2034806191 @default.
- W3113190209 cites W2107667896 @default.
- W3113190209 cites W2116386744 @default.
- W3113190209 cites W2145339207 @default.
- W3113190209 cites W2257979135 @default.
- W3113190209 cites W2296073425 @default.
- W3113190209 cites W2556477470 @default.
- W3113190209 cites W2574978968 @default.
- W3113190209 cites W2593766708 @default.
- W3113190209 cites W2596982695 @default.
- W3113190209 cites W2614839826 @default.
- W3113190209 cites W2623491082 @default.
- W3113190209 cites W2724169821 @default.
- W3113190209 cites W2737215407 @default.
- W3113190209 cites W2744921630 @default.
- W3113190209 cites W2761873684 @default.
- W3113190209 cites W2766447205 @default.
- W3113190209 cites W2785542505 @default.
- W3113190209 cites W2806193267 @default.
- W3113190209 cites W2885550588 @default.
- W3113190209 cites W2899205164 @default.
- W3113190209 cites W2902907165 @default.
- W3113190209 cites W2913854057 @default.
- W3113190209 cites W2914261249 @default.
- W3113190209 cites W2914431475 @default.
- W3113190209 cites W2922388521 @default.
- W3113190209 cites W2948199691 @default.
- W3113190209 cites W2949682451 @default.
- W3113190209 cites W2950872548 @default.
- W3113190209 cites W2953100042 @default.
- W3113190209 cites W2953326529 @default.
- W3113190209 cites W2962730405 @default.
- W3113190209 cites W2963276097 @default.
- W3113190209 cites W2963438456 @default.
- W3113190209 cites W2963639957 @default.
- W3113190209 cites W2963761387 @default.
- W3113190209 cites W2963820385 @default.
- W3113190209 cites W2963938771 @default.
- W3113190209 cites W2964001908 @default.
- W3113190209 cites W2964062135 @default.
- W3113190209 cites W2964161785 @default.
- W3113190209 cites W2970384648 @default.
- W3113190209 cites W2972758308 @default.
- W3113190209 cites W2974778612 @default.
- W3113190209 cites W2982316857 @default.
- W3113190209 cites W3006178546 @default.
- W3113190209 cites W3006773485 @default.
- W3113190209 cites W3013618273 @default.
- W3113190209 cites W3036282537 @default.
- W3113190209 cites W3037871539 @default.
- W3113190209 cites W3129322645 @default.
- W3113190209 cites W779494576 @default.
- W3113190209 doi "https://doi.org/10.48550/arxiv.2012.08621" @default.
- W3113190209 hasPublicationYear "2020" @default.
- W3113190209 type Work @default.
- W3113190209 sameAs 3113190209 @default.
- W3113190209 citedByCount "9" @default.
- W3113190209 countsByYear W31131902092020 @default.
- W3113190209 countsByYear W31131902092021 @default.
- W3113190209 crossrefType "posted-content" @default.
- W3113190209 hasAuthorship W3113190209A5008951080 @default.
- W3113190209 hasAuthorship W3113190209A5025376043 @default.
- W3113190209 hasAuthorship W3113190209A5047285420 @default.
- W3113190209 hasAuthorship W3113190209A5049093671 @default.
- W3113190209 hasAuthorship W3113190209A5066028215 @default.
- W3113190209 hasAuthorship W3113190209A5072427753 @default.
- W3113190209 hasAuthorship W3113190209A5084821923 @default.
- W3113190209 hasBestOaLocation W31131902091 @default.
- W3113190209 hasConcept C111472728 @default.
- W3113190209 hasConcept C111919701 @default.
- W3113190209 hasConcept C119857082 @default.
- W3113190209 hasConcept C126255220 @default.
- W3113190209 hasConcept C127705205 @default.
- W3113190209 hasConcept C134306372 @default.
- W3113190209 hasConcept C138885662 @default.
- W3113190209 hasConcept C154945302 @default.
- W3113190209 hasConcept C15744967 @default.
- W3113190209 hasConcept C2780586882 @default.
- W3113190209 hasConcept C33435437 @default.
- W3113190209 hasConcept C33923547 @default.
- W3113190209 hasConcept C41008148 @default.
- W3113190209 hasConcept C62354387 @default.
- W3113190209 hasConcept C77805123 @default.
- W3113190209 hasConcept C97541855 @default.
- W3113190209 hasConceptScore W3113190209C111472728 @default.