Matches in SemOpenAlex for { <https://semopenalex.org/work/W3122336430> ?p ?o ?g. }
- W3122336430 abstract "Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems that uses a model to make predictions about future behavior. For each state encountered, MPC solves an online optimization problem to choose a control action that will minimize future cost. This is a surprisingly effective strategy, but real-time performance requirements warrant the use of simple models. If the model is not sufficiently accurate, then the resulting controller can be biased, limiting performance. We present a framework for improving on MPC with model-free reinforcement learning (RL). The key insight is to view MPC as constructing a series of local Q-function approximations. We show that by using a parameter λ, similar to the trace decay parameter in TD(λ), we can systematically trade-off learned value estimates against the local Q-function approximations. We present a theoretical analysis that shows how error from inaccurate models in MPC and value function estimation in RL can be balanced. We further propose an algorithm that changes λ over time to reduce the dependence on MPC as our estimates of the value function improve, and test the efficacy our approach on challenging high-dimensional manipulation tasks with biased models in simulation. We demonstrate that our approach can obtain performance comparable with MPC with access to true dynamics even under severe model bias and is more sample efficient as compared to model-free RL." @default.
- W3122336430 created "2021-02-01" @default.
- W3122336430 creator A5017774267 @default.
- W3122336430 creator A5052552981 @default.
- W3122336430 creator A5057995939 @default.
- W3122336430 date "2021-05-03" @default.
- W3122336430 modified "2023-09-27" @default.
- W3122336430 title "Blending MPC & Value Function Approximation for Efficient Reinforcement Learning" @default.
- W3122336430 cites W1191599655 @default.
- W3122336430 cites W1747856733 @default.
- W3122336430 cites W1967821692 @default.
- W3122336430 cites W1996625075 @default.
- W3122336430 cites W2003132288 @default.
- W3122336430 cites W2134491302 @default.
- W3122336430 cites W2257979135 @default.
- W3122336430 cites W2410617946 @default.
- W3122336430 cites W2489939061 @default.
- W3122336430 cites W2736601468 @default.
- W3122336430 cites W2738778707 @default.
- W3122336430 cites W2772709170 @default.
- W3122336430 cites W2805805280 @default.
- W3122336430 cites W2898917980 @default.
- W3122336430 cites W2904246096 @default.
- W3122336430 cites W2915577923 @default.
- W3122336430 cites W2962872206 @default.
- W3122336430 cites W2963411833 @default.
- W3122336430 cites W2963642149 @default.
- W3122336430 cites W2963820385 @default.
- W3122336430 cites W2963906246 @default.
- W3122336430 cites W2963960193 @default.
- W3122336430 cites W2964121744 @default.
- W3122336430 cites W2964349150 @default.
- W3122336430 cites W2997046053 @default.
- W3122336430 cites W3032727894 @default.
- W3122336430 cites W3107706751 @default.
- W3122336430 cites W3108175278 @default.
- W3122336430 hasPublicationYear "2021" @default.
- W3122336430 type Work @default.
- W3122336430 sameAs 3122336430 @default.
- W3122336430 citedByCount "5" @default.
- W3122336430 countsByYear W31223364302021 @default.
- W3122336430 countsByYear W31223364302023 @default.
- W3122336430 crossrefType "proceedings-article" @default.
- W3122336430 hasAuthorship W3122336430A5017774267 @default.
- W3122336430 hasAuthorship W3122336430A5052552981 @default.
- W3122336430 hasAuthorship W3122336430A5057995939 @default.
- W3122336430 hasConcept C112972136 @default.
- W3122336430 hasConcept C119857082 @default.
- W3122336430 hasConcept C126255220 @default.
- W3122336430 hasConcept C138885662 @default.
- W3122336430 hasConcept C14036430 @default.
- W3122336430 hasConcept C14646407 @default.
- W3122336430 hasConcept C154945302 @default.
- W3122336430 hasConcept C172205157 @default.
- W3122336430 hasConcept C203479927 @default.
- W3122336430 hasConcept C2775924081 @default.
- W3122336430 hasConcept C2776291640 @default.
- W3122336430 hasConcept C33923547 @default.
- W3122336430 hasConcept C41008148 @default.
- W3122336430 hasConcept C41895202 @default.
- W3122336430 hasConcept C47446073 @default.
- W3122336430 hasConcept C50644808 @default.
- W3122336430 hasConcept C6557445 @default.
- W3122336430 hasConcept C75291252 @default.
- W3122336430 hasConcept C78458016 @default.
- W3122336430 hasConcept C86803240 @default.
- W3122336430 hasConcept C91873725 @default.
- W3122336430 hasConcept C97541855 @default.
- W3122336430 hasConceptScore W3122336430C112972136 @default.
- W3122336430 hasConceptScore W3122336430C119857082 @default.
- W3122336430 hasConceptScore W3122336430C126255220 @default.
- W3122336430 hasConceptScore W3122336430C138885662 @default.
- W3122336430 hasConceptScore W3122336430C14036430 @default.
- W3122336430 hasConceptScore W3122336430C14646407 @default.
- W3122336430 hasConceptScore W3122336430C154945302 @default.
- W3122336430 hasConceptScore W3122336430C172205157 @default.
- W3122336430 hasConceptScore W3122336430C203479927 @default.
- W3122336430 hasConceptScore W3122336430C2775924081 @default.
- W3122336430 hasConceptScore W3122336430C2776291640 @default.
- W3122336430 hasConceptScore W3122336430C33923547 @default.
- W3122336430 hasConceptScore W3122336430C41008148 @default.
- W3122336430 hasConceptScore W3122336430C41895202 @default.
- W3122336430 hasConceptScore W3122336430C47446073 @default.
- W3122336430 hasConceptScore W3122336430C50644808 @default.
- W3122336430 hasConceptScore W3122336430C6557445 @default.
- W3122336430 hasConceptScore W3122336430C75291252 @default.
- W3122336430 hasConceptScore W3122336430C78458016 @default.
- W3122336430 hasConceptScore W3122336430C86803240 @default.
- W3122336430 hasConceptScore W3122336430C91873725 @default.
- W3122336430 hasConceptScore W3122336430C97541855 @default.
- W3122336430 hasLocation W31223364301 @default.
- W3122336430 hasOpenAccess W3122336430 @default.
- W3122336430 hasPrimaryLocation W31223364301 @default.
- W3122336430 hasRelatedWork W1529382682 @default.
- W3122336430 hasRelatedWork W2018666763 @default.
- W3122336430 hasRelatedWork W2019214786 @default.
- W3122336430 hasRelatedWork W2088164783 @default.
- W3122336430 hasRelatedWork W215225477 @default.
- W3122336430 hasRelatedWork W2278171066 @default.
- W3122336430 hasRelatedWork W2278618761 @default.