SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4377010149> ?p ?o ?g. }

Showing items 1 to 59 of 59 with 100 items per page.

W4377010149 abstract "In this paper, we study distributionally robust offline reinforcement learning (robust offline RL), which seeks to find an optimal policy purely from an offline dataset that can perform well in perturbed environments. In specific, we propose a generic algorithm framework called Doubly Pessimistic Model-based Policy Optimization ($P^2MPO$), which features a novel combination of a flexible model estimation subroutine and a doubly pessimistic policy optimization step. Notably, the double pessimism principle is crucial to overcome the distributional shifts incurred by (i) the mismatch between the behavior policy and the target policies; and (ii) the perturbation of the nominal model. Under certain accuracy conditions on the model estimation subroutine, we prove that $P^2MPO$ is sample-efficient with robust partial coverage data, which only requires the offline data to have good coverage of the distributions induced by the optimal robust policy and the perturbed models around the nominal model. By tailoring specific model estimation subroutines for concrete examples of RMDPs, including tabular RMDPs, factored RMDPs, kernel and neural RMDPs, we prove that $P^2MPO$ enjoys a $tilde{mathcal{O}}(n^{-1/2})$ convergence rate, where $n$ is the dataset size. We highlight that all these examples, except tabular RMDPs, are first identified and proven tractable by this work. Furthermore, we continue our study of robust offline RL in the robust Markov games (RMGs). By extending the double pessimism principle identified for single-agent RMDPs, we propose another algorithm framework that can efficiently find the robust Nash equilibria among players using only robust unilateral (partial) coverage data. To our best knowledge, this work proposes the first general learning principle -- double pessimism -- for robust offline RL and shows that it is provably efficient with general function approximation." @default.
W4377010149 created "2023-05-19" @default.
W4377010149 creator A5011147039 @default.
W4377010149 creator A5019685128 @default.
W4377010149 creator A5032405701 @default.
W4377010149 creator A5037709424 @default.
W4377010149 date "2023-05-16" @default.
W4377010149 modified "2023-09-29" @default.
W4377010149 title "Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage" @default.
W4377010149 doi "https://doi.org/10.48550/arxiv.2305.09659" @default.
W4377010149 hasPublicationYear "2023" @default.
W4377010149 type Work @default.
W4377010149 citedByCount "0" @default.
W4377010149 crossrefType "posted-content" @default.
W4377010149 hasAuthorship W4377010149A5011147039 @default.
W4377010149 hasAuthorship W4377010149A5019685128 @default.
W4377010149 hasAuthorship W4377010149A5032405701 @default.
W4377010149 hasAuthorship W4377010149A5037709424 @default.
W4377010149 hasBestOaLocation W43770101491 @default.
W4377010149 hasConcept C104317684 @default.
W4377010149 hasConcept C111919701 @default.
W4377010149 hasConcept C11413529 @default.
W4377010149 hasConcept C126255220 @default.
W4377010149 hasConcept C154945302 @default.
W4377010149 hasConcept C185592680 @default.
W4377010149 hasConcept C33923547 @default.
W4377010149 hasConcept C41008148 @default.
W4377010149 hasConcept C55493867 @default.
W4377010149 hasConcept C63479239 @default.
W4377010149 hasConcept C96147967 @default.
W4377010149 hasConcept C97541855 @default.
W4377010149 hasConceptScore W4377010149C104317684 @default.
W4377010149 hasConceptScore W4377010149C111919701 @default.
W4377010149 hasConceptScore W4377010149C11413529 @default.
W4377010149 hasConceptScore W4377010149C126255220 @default.
W4377010149 hasConceptScore W4377010149C154945302 @default.
W4377010149 hasConceptScore W4377010149C185592680 @default.
W4377010149 hasConceptScore W4377010149C33923547 @default.
W4377010149 hasConceptScore W4377010149C41008148 @default.
W4377010149 hasConceptScore W4377010149C55493867 @default.
W4377010149 hasConceptScore W4377010149C63479239 @default.
W4377010149 hasConceptScore W4377010149C96147967 @default.
W4377010149 hasConceptScore W4377010149C97541855 @default.
W4377010149 hasLocation W43770101491 @default.
W4377010149 hasOpenAccess W4377010149 @default.
W4377010149 hasPrimaryLocation W43770101491 @default.
W4377010149 hasRelatedWork W1500014405 @default.
W4377010149 hasRelatedWork W1566603375 @default.
W4377010149 hasRelatedWork W2083865705 @default.
W4377010149 hasRelatedWork W2317555075 @default.
W4377010149 hasRelatedWork W260766989 @default.
W4377010149 hasRelatedWork W2959276766 @default.
W4377010149 hasRelatedWork W3037422413 @default.
W4377010149 hasRelatedWork W3139193008 @default.
W4377010149 hasRelatedWork W4206669594 @default.
W4377010149 hasRelatedWork W4295941380 @default.
W4377010149 isParatext "false" @default.
W4377010149 isRetracted "false" @default.
W4377010149 workType "article" @default.