Matches in SemOpenAlex for { <https://semopenalex.org/work/W2911648082> ?p ?o ?g. }
Showing items 1 to 99 of
99
with 100 items per page.
- W2911648082 abstract "Agents acting in real-world scenarios often have constraints such as finite budgets or daily job performance targets. While repeated (episodic) tasks can be solved with existing RL algorithms, methods need to be extended if the repetition depends on performance. Recent work has introduced a distributional perspective on reinforcement learning, providing a model of episodic returns. Inspired by these results we contribute the new budget- and risk-aware distributional reinforcement learning (BRAD-RL) algorithm that bootstraps from the C51 distributional output and then uses value iteration to estimate the value of starting an episode with a certain amount of budget. With this strategy we can make budget-wise action selection within each episode and maximize the return across episodes. Experiments in a grid-world domain highlight the benefits of our algorithm, maximizing discounted future returns when low cumulative performance may terminate repetition." @default.
- W2911648082 created "2019-02-21" @default.
- W2911648082 creator A5005259414 @default.
- W2911648082 creator A5023761610 @default.
- W2911648082 creator A5031124856 @default.
- W2911648082 creator A5035687300 @default.
- W2911648082 creator A5069752991 @default.
- W2911648082 date "2018-01-01" @default.
- W2911648082 modified "2023-09-27" @default.
- W2911648082 title "Learning on a Budget Using Distributional RL" @default.
- W2911648082 cites W134786152 @default.
- W2911648082 cites W15411808 @default.
- W2911648082 cites W1585575029 @default.
- W2911648082 cites W1845972764 @default.
- W2911648082 cites W1853859481 @default.
- W2911648082 cites W2076337359 @default.
- W2911648082 cites W2121863487 @default.
- W2911648082 cites W2141203641 @default.
- W2911648082 cites W2768908787 @default.
- W2911648082 cites W2774636080 @default.
- W2911648082 cites W2798546505 @default.
- W2911648082 cites W2798750840 @default.
- W2911648082 cites W2952720101 @default.
- W2911648082 cites W2953318193 @default.
- W2911648082 cites W2963423916 @default.
- W2911648082 cites W3084824572 @default.
- W2911648082 hasPublicationYear "2018" @default.
- W2911648082 type Work @default.
- W2911648082 sameAs 2911648082 @default.
- W2911648082 citedByCount "2" @default.
- W2911648082 countsByYear W29116480822018 @default.
- W2911648082 countsByYear W29116480822020 @default.
- W2911648082 crossrefType "journal-article" @default.
- W2911648082 hasAuthorship W2911648082A5005259414 @default.
- W2911648082 hasAuthorship W2911648082A5023761610 @default.
- W2911648082 hasAuthorship W2911648082A5031124856 @default.
- W2911648082 hasAuthorship W2911648082A5035687300 @default.
- W2911648082 hasAuthorship W2911648082A5069752991 @default.
- W2911648082 hasConcept C119857082 @default.
- W2911648082 hasConcept C126255220 @default.
- W2911648082 hasConcept C12713177 @default.
- W2911648082 hasConcept C138885662 @default.
- W2911648082 hasConcept C14646407 @default.
- W2911648082 hasConcept C154945302 @default.
- W2911648082 hasConcept C162324750 @default.
- W2911648082 hasConcept C175444787 @default.
- W2911648082 hasConcept C187691185 @default.
- W2911648082 hasConcept C2524010 @default.
- W2911648082 hasConcept C2776141515 @default.
- W2911648082 hasConcept C2776291640 @default.
- W2911648082 hasConcept C33923547 @default.
- W2911648082 hasConcept C41008148 @default.
- W2911648082 hasConcept C41895202 @default.
- W2911648082 hasConcept C8505890 @default.
- W2911648082 hasConcept C97541855 @default.
- W2911648082 hasConceptScore W2911648082C119857082 @default.
- W2911648082 hasConceptScore W2911648082C126255220 @default.
- W2911648082 hasConceptScore W2911648082C12713177 @default.
- W2911648082 hasConceptScore W2911648082C138885662 @default.
- W2911648082 hasConceptScore W2911648082C14646407 @default.
- W2911648082 hasConceptScore W2911648082C154945302 @default.
- W2911648082 hasConceptScore W2911648082C162324750 @default.
- W2911648082 hasConceptScore W2911648082C175444787 @default.
- W2911648082 hasConceptScore W2911648082C187691185 @default.
- W2911648082 hasConceptScore W2911648082C2524010 @default.
- W2911648082 hasConceptScore W2911648082C2776141515 @default.
- W2911648082 hasConceptScore W2911648082C2776291640 @default.
- W2911648082 hasConceptScore W2911648082C33923547 @default.
- W2911648082 hasConceptScore W2911648082C41008148 @default.
- W2911648082 hasConceptScore W2911648082C41895202 @default.
- W2911648082 hasConceptScore W2911648082C8505890 @default.
- W2911648082 hasConceptScore W2911648082C97541855 @default.
- W2911648082 hasLocation W29116480821 @default.
- W2911648082 hasOpenAccess W2911648082 @default.
- W2911648082 hasPrimaryLocation W29116480821 @default.
- W2911648082 hasRelatedWork W1965436038 @default.
- W2911648082 hasRelatedWork W2078244692 @default.
- W2911648082 hasRelatedWork W2188090365 @default.
- W2911648082 hasRelatedWork W2253338700 @default.
- W2911648082 hasRelatedWork W2325963248 @default.
- W2911648082 hasRelatedWork W2896408693 @default.
- W2911648082 hasRelatedWork W2902743955 @default.
- W2911648082 hasRelatedWork W2912131985 @default.
- W2911648082 hasRelatedWork W2963501401 @default.
- W2911648082 hasRelatedWork W2965585060 @default.
- W2911648082 hasRelatedWork W3006151906 @default.
- W2911648082 hasRelatedWork W3010151722 @default.
- W2911648082 hasRelatedWork W3092081675 @default.
- W2911648082 hasRelatedWork W3112563010 @default.
- W2911648082 hasRelatedWork W3121832314 @default.
- W2911648082 hasRelatedWork W3124902342 @default.
- W2911648082 hasRelatedWork W3125400992 @default.
- W2911648082 hasRelatedWork W3147729530 @default.
- W2911648082 hasRelatedWork W3157409643 @default.
- W2911648082 hasRelatedWork W3184471030 @default.
- W2911648082 isParatext "false" @default.
- W2911648082 isRetracted "false" @default.
- W2911648082 magId "2911648082" @default.
- W2911648082 workType "article" @default.