Matches in SemOpenAlex for { <https://semopenalex.org/work/W3201820699> ?p ?o ?g. }
- W3201820699 endingPage "15" @default.
- W3201820699 startingPage "1" @default.
- W3201820699 abstract "Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. Especially for continuous control, solving this differential equation and its extension the Hamilton-Jacobi-Isaacs equation, is important as it yields the optimal policy that achieves the maximum reward on a give task. In the case of the Hamilton-Jacobi-Isaacs equation, which includes an adversary controlling the environment and minimizing the reward, the obtained policy is also robust to perturbations of the dynamics. In this paper we propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI). These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems to derive the optimal policy and optimal adversary in closed form. This analytic expression simplifies the differential equations and enables us to solve for the optimal value function using value iteration for continuous actions and states as well as the adversarial case. Notably, the resulting algorithms do not require discretization of states or actions. We apply the resulting algorithms to the Furuta pendulum and cartpole. We show that both algorithms obtain the optimal policy. The robustness Sim2Real experiments on the physical systems show that the policies successfully achieve the task in the real-world. When changing the masses of the pendulum, we observe that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi." @default.
- W3201820699 created "2021-10-11" @default.
- W3201820699 creator A5014707300 @default.
- W3201820699 creator A5036260775 @default.
- W3201820699 creator A5042151011 @default.
- W3201820699 creator A5061193324 @default.
- W3201820699 creator A5071367253 @default.
- W3201820699 creator A5088706700 @default.
- W3201820699 date "2022-01-01" @default.
- W3201820699 modified "2023-10-02" @default.
- W3201820699 title "Continuous-Time Fitted Value Iteration for Robust Policies" @default.
- W3201820699 cites W1191599655 @default.
- W3201820699 cites W134786152 @default.
- W3201820699 cites W1515851193 @default.
- W3201820699 cites W1542941925 @default.
- W3201820699 cites W1558902221 @default.
- W3201820699 cites W1574514837 @default.
- W3201820699 cites W1578936488 @default.
- W3201820699 cites W1581842928 @default.
- W3201820699 cites W1589216054 @default.
- W3201820699 cites W1646707810 @default.
- W3201820699 cites W166862392 @default.
- W3201820699 cites W1714604093 @default.
- W3201820699 cites W1892947258 @default.
- W3201820699 cites W1965878388 @default.
- W3201820699 cites W1977237536 @default.
- W3201820699 cites W1980801308 @default.
- W3201820699 cites W1989855774 @default.
- W3201820699 cites W1990437501 @default.
- W3201820699 cites W2009533501 @default.
- W3201820699 cites W2042882799 @default.
- W3201820699 cites W2043962331 @default.
- W3201820699 cites W2103626435 @default.
- W3201820699 cites W2108984317 @default.
- W3201820699 cites W2113501460 @default.
- W3201820699 cites W2117355432 @default.
- W3201820699 cites W2119567691 @default.
- W3201820699 cites W2120346334 @default.
- W3201820699 cites W2125074935 @default.
- W3201820699 cites W2132397872 @default.
- W3201820699 cites W2145060720 @default.
- W3201820699 cites W2145339207 @default.
- W3201820699 cites W2160284799 @default.
- W3201820699 cites W2167940675 @default.
- W3201820699 cites W2169982856 @default.
- W3201820699 cites W2173248099 @default.
- W3201820699 cites W2197493596 @default.
- W3201820699 cites W2239929133 @default.
- W3201820699 cites W2290354866 @default.
- W3201820699 cites W2296319761 @default.
- W3201820699 cites W2341171179 @default.
- W3201820699 cites W2343609400 @default.
- W3201820699 cites W2602963933 @default.
- W3201820699 cites W2618318883 @default.
- W3201820699 cites W2736601468 @default.
- W3201820699 cites W2773525213 @default.
- W3201820699 cites W2773691349 @default.
- W3201820699 cites W2789824229 @default.
- W3201820699 cites W2810785043 @default.
- W3201820699 cites W287592399 @default.
- W3201820699 cites W2898758998 @default.
- W3201820699 cites W2945924974 @default.
- W3201820699 cites W2952337742 @default.
- W3201820699 cites W2952981100 @default.
- W3201820699 cites W2962730452 @default.
- W3201820699 cites W2962896691 @default.
- W3201820699 cites W2962902376 @default.
- W3201820699 cites W2963684914 @default.
- W3201820699 cites W2963906246 @default.
- W3201820699 cites W2964040381 @default.
- W3201820699 cites W2964686099 @default.
- W3201820699 cites W2968116426 @default.
- W3201820699 cites W2970362693 @default.
- W3201820699 cites W2970795963 @default.
- W3201820699 cites W2970971581 @default.
- W3201820699 cites W2975293371 @default.
- W3201820699 cites W2990747716 @default.
- W3201820699 cites W2998544442 @default.
- W3201820699 cites W3021796787 @default.
- W3201820699 cites W3031711947 @default.
- W3201820699 cites W3037024314 @default.
- W3201820699 cites W3082784051 @default.
- W3201820699 cites W3096740192 @default.
- W3201820699 cites W3098237412 @default.
- W3201820699 cites W3099878876 @default.
- W3201820699 cites W3109395707 @default.
- W3201820699 cites W3126300335 @default.
- W3201820699 cites W3130843035 @default.
- W3201820699 cites W3131920644 @default.
- W3201820699 cites W3164428799 @default.
- W3201820699 cites W3167204845 @default.
- W3201820699 cites W3173331501 @default.
- W3201820699 cites W3204973825 @default.
- W3201820699 cites W565431999 @default.
- W3201820699 cites W580164506 @default.
- W3201820699 doi "https://doi.org/10.1109/tpami.2022.3215769" @default.
- W3201820699 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/36260585" @default.
- W3201820699 hasPublicationYear "2022" @default.