Matches in SemOpenAlex for { <https://semopenalex.org/work/W3170371439> ?p ?o ?g. }
- W3170371439 endingPage "7894" @default.
- W3170371439 startingPage "7886" @default.
- W3170371439 abstract "The shortcomings of maximum likelihood estimation in the context of model-based reinforcement learning have been highlighted by an increasing number of papers. When the model class is misspecified or has a limited representational capacity, model parameters with high likelihood might not necessarily result in high performance of the agent on a downstream control task. To alleviate this problem, we propose an end-to-end approach for model learning which directly optimizes the expected returns using implicit differentiation. We treat a value function that satisfies the Bellman optimality operator induced by the model as an implicit function of model parameters and show how to differentiate the function. We provide theoretical and empirical evidence highlighting the benefits of our approach in the model misspecification regime compared to likelihood-based methods." @default.
- W3170371439 created "2021-06-22" @default.
- W3170371439 creator A5039961228 @default.
- W3170371439 creator A5067918843 @default.
- W3170371439 creator A5070953294 @default.
- W3170371439 creator A5086833270 @default.
- W3170371439 date "2022-06-28" @default.
- W3170371439 modified "2023-09-30" @default.
- W3170371439 title "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation" @default.
- W3170371439 cites W1521930086 @default.
- W3170371439 cites W1542343170 @default.
- W3170371439 cites W1665214252 @default.
- W3170371439 cites W1965878388 @default.
- W3170371439 cites W1966360666 @default.
- W3170371439 cites W1980035368 @default.
- W3170371439 cites W1984454144 @default.
- W3170371439 cites W1985658808 @default.
- W3170371439 cites W2001733650 @default.
- W3170371439 cites W2002428251 @default.
- W3170371439 cites W2011301426 @default.
- W3170371439 cites W2015667537 @default.
- W3170371439 cites W2045374782 @default.
- W3170371439 cites W2052329972 @default.
- W3170371439 cites W2062422869 @default.
- W3170371439 cites W2075970677 @default.
- W3170371439 cites W2078226168 @default.
- W3170371439 cites W2081947709 @default.
- W3170371439 cites W2091565802 @default.
- W3170371439 cites W2098774185 @default.
- W3170371439 cites W2102773314 @default.
- W3170371439 cites W2110114082 @default.
- W3170371439 cites W2115211925 @default.
- W3170371439 cites W2115597380 @default.
- W3170371439 cites W2121863487 @default.
- W3170371439 cites W2141559645 @default.
- W3170371439 cites W2145339207 @default.
- W3170371439 cites W2146292423 @default.
- W3170371439 cites W2146989110 @default.
- W3170371439 cites W2155027007 @default.
- W3170371439 cites W2158782408 @default.
- W3170371439 cites W2163602945 @default.
- W3170371439 cites W2165428239 @default.
- W3170371439 cites W2168565265 @default.
- W3170371439 cites W2258731934 @default.
- W3170371439 cites W2320680700 @default.
- W3170371439 cites W2397240726 @default.
- W3170371439 cites W2472803348 @default.
- W3170371439 cites W2489939061 @default.
- W3170371439 cites W2569188995 @default.
- W3170371439 cites W2604763608 @default.
- W3170371439 cites W2626325961 @default.
- W3170371439 cites W2626747984 @default.
- W3170371439 cites W2799151646 @default.
- W3170371439 cites W2900152462 @default.
- W3170371439 cites W2904246096 @default.
- W3170371439 cites W2912779783 @default.
- W3170371439 cites W2920362155 @default.
- W3170371439 cites W2963414638 @default.
- W3170371439 cites W2963923407 @default.
- W3170371439 cites W2963960193 @default.
- W3170371439 cites W2970277495 @default.
- W3170371439 cites W2970697704 @default.
- W3170371439 cites W2970900903 @default.
- W3170371439 cites W2972448932 @default.
- W3170371439 cites W2981018396 @default.
- W3170371439 cites W2999415063 @default.
- W3170371439 cites W3006331850 @default.
- W3170371439 cites W3007008575 @default.
- W3170371439 cites W3017367030 @default.
- W3170371439 cites W3029753614 @default.
- W3170371439 cites W3034999548 @default.
- W3170371439 cites W3037120332 @default.
- W3170371439 cites W3100810159 @default.
- W3170371439 cites W3101192004 @default.
- W3170371439 cites W3103780890 @default.
- W3170371439 cites W3118210634 @default.
- W3170371439 cites W3122690883 @default.
- W3170371439 cites W3154938165 @default.
- W3170371439 cites W56535507 @default.
- W3170371439 cites W3005347330 @default.
- W3170371439 doi "https://doi.org/10.1609/aaai.v36i7.20758" @default.
- W3170371439 hasPublicationYear "2022" @default.
- W3170371439 type Work @default.
- W3170371439 sameAs 3170371439 @default.
- W3170371439 citedByCount "1" @default.
- W3170371439 countsByYear W31703714392022 @default.
- W3170371439 crossrefType "journal-article" @default.
- W3170371439 hasAuthorship W3170371439A5039961228 @default.
- W3170371439 hasAuthorship W3170371439A5067918843 @default.
- W3170371439 hasAuthorship W3170371439A5070953294 @default.
- W3170371439 hasAuthorship W3170371439A5086833270 @default.
- W3170371439 hasBestOaLocation W31703714391 @default.
- W3170371439 hasConcept C105795698 @default.
- W3170371439 hasConcept C11413529 @default.
- W3170371439 hasConcept C119857082 @default.
- W3170371439 hasConcept C126255220 @default.
- W3170371439 hasConcept C14036430 @default.
- W3170371439 hasConcept C14646407 @default.