Matches in SemOpenAlex for { <https://semopenalex.org/work/W2948204052> ?p ?o ?g. }
- W2948204052 abstract "In this work, we consider the problem of model selection for deep reinforcement learning (RL) in real-world environments. Typically, the performance of deep RL algorithms is evaluated via on-policy interactions with the target environment. However, comparing models in a real-world environment for the purposes of early stopping or hyperparameter tuning is costly and often practically infeasible. This leads us to examine off-policy policy evaluation (OPE) in such settings. We focus on OPE for value-based methods, which are of particular interest in deep RL, with applications like robotics, where off-policy algorithms based on Q-function estimation can often attain better sample complexity than direct policy optimization. Existing OPE metrics either rely on a model of the environment, or the use of importance sampling (IS) to correct for the data being off-policy. However, for high-dimensional observations, such as images, models of the environment can be difficult to fit and value-based methods can make IS hard to use or even ill-conditioned, especially when dealing with continuous action spaces. In this paper, we focus on the specific case of MDPs with continuous action spaces and sparse binary rewards, which is representative of many important real-world applications. We propose an alternative metric that relies on neither models nor IS, by framing OPE as a positive-unlabeled (PU) classification problem with the Q-function as the decision function. We experimentally show that this metric outperforms baselines on a number of tasks. Most importantly, it can reliably predict the relative performance of different policies in a number of generalization scenarios, including the transfer to the real-world of policies trained in simulation for an image-based robotic manipulation task." @default.
- W2948204052 created "2019-06-14" @default.
- W2948204052 creator A5022032374 @default.
- W2948204052 creator A5026322200 @default.
- W2948204052 creator A5034642277 @default.
- W2948204052 creator A5053574266 @default.
- W2948204052 creator A5067133312 @default.
- W2948204052 creator A5079462322 @default.
- W2948204052 date "2019-06-04" @default.
- W2948204052 modified "2023-10-03" @default.
- W2948204052 title "Off-Policy Evaluation via Off-Policy Classification" @default.
- W2948204052 cites W1514587017 @default.
- W2948204052 cites W1575592356 @default.
- W2948204052 cites W1809653203 @default.
- W2948204052 cites W1978161072 @default.
- W2948204052 cites W2001947543 @default.
- W2948204052 cites W2006330826 @default.
- W2948204052 cites W2078483536 @default.
- W2948204052 cites W2089087399 @default.
- W2948204052 cites W2107741520 @default.
- W2948204052 cites W2108598243 @default.
- W2948204052 cites W2108692343 @default.
- W2948204052 cites W2142641780 @default.
- W2948204052 cites W2145339207 @default.
- W2948204052 cites W2155968351 @default.
- W2948204052 cites W2158782408 @default.
- W2948204052 cites W2173248099 @default.
- W2948204052 cites W2234859443 @default.
- W2948204052 cites W2273088453 @default.
- W2948204052 cites W2275802500 @default.
- W2948204052 cites W2767617058 @default.
- W2948204052 cites W2788575380 @default.
- W2948204052 cites W2796303840 @default.
- W2948204052 cites W2796979132 @default.
- W2948204052 cites W2809668646 @default.
- W2948204052 cites W2810785043 @default.
- W2948204052 cites W2903181768 @default.
- W2948204052 cites W2963341628 @default.
- W2948204052 cites W2963390419 @default.
- W2948204052 cites W2963403143 @default.
- W2948204052 cites W2963435596 @default.
- W2948204052 cites W2963882293 @default.
- W2948204052 cites W2964068481 @default.
- W2948204052 cites W2964118020 @default.
- W2948204052 cites W2964297722 @default.
- W2948204052 cites W2970705602 @default.
- W2948204052 cites W3098679278 @default.
- W2948204052 cites W2168551972 @default.
- W2948204052 cites W2963824530 @default.
- W2948204052 cites W3023932353 @default.
- W2948204052 hasPublicationYear "2019" @default.
- W2948204052 type Work @default.
- W2948204052 sameAs 2948204052 @default.
- W2948204052 citedByCount "3" @default.
- W2948204052 countsByYear W29482040522019 @default.
- W2948204052 countsByYear W29482040522021 @default.
- W2948204052 crossrefType "posted-content" @default.
- W2948204052 hasAuthorship W2948204052A5022032374 @default.
- W2948204052 hasAuthorship W2948204052A5026322200 @default.
- W2948204052 hasAuthorship W2948204052A5034642277 @default.
- W2948204052 hasAuthorship W2948204052A5053574266 @default.
- W2948204052 hasAuthorship W2948204052A5067133312 @default.
- W2948204052 hasAuthorship W2948204052A5079462322 @default.
- W2948204052 hasConcept C108583219 @default.
- W2948204052 hasConcept C119857082 @default.
- W2948204052 hasConcept C12267149 @default.
- W2948204052 hasConcept C126255220 @default.
- W2948204052 hasConcept C14036430 @default.
- W2948204052 hasConcept C14646407 @default.
- W2948204052 hasConcept C154945302 @default.
- W2948204052 hasConcept C162324750 @default.
- W2948204052 hasConcept C166109690 @default.
- W2948204052 hasConcept C169760540 @default.
- W2948204052 hasConcept C176217482 @default.
- W2948204052 hasConcept C187736073 @default.
- W2948204052 hasConcept C21547014 @default.
- W2948204052 hasConcept C26760741 @default.
- W2948204052 hasConcept C2780898871 @default.
- W2948204052 hasConcept C33923547 @default.
- W2948204052 hasConcept C41008148 @default.
- W2948204052 hasConcept C48372109 @default.
- W2948204052 hasConcept C50817715 @default.
- W2948204052 hasConcept C66905080 @default.
- W2948204052 hasConcept C73602740 @default.
- W2948204052 hasConcept C78458016 @default.
- W2948204052 hasConcept C8642999 @default.
- W2948204052 hasConcept C86803240 @default.
- W2948204052 hasConcept C94375191 @default.
- W2948204052 hasConcept C97541855 @default.
- W2948204052 hasConceptScore W2948204052C108583219 @default.
- W2948204052 hasConceptScore W2948204052C119857082 @default.
- W2948204052 hasConceptScore W2948204052C12267149 @default.
- W2948204052 hasConceptScore W2948204052C126255220 @default.
- W2948204052 hasConceptScore W2948204052C14036430 @default.
- W2948204052 hasConceptScore W2948204052C14646407 @default.
- W2948204052 hasConceptScore W2948204052C154945302 @default.
- W2948204052 hasConceptScore W2948204052C162324750 @default.
- W2948204052 hasConceptScore W2948204052C166109690 @default.
- W2948204052 hasConceptScore W2948204052C169760540 @default.
- W2948204052 hasConceptScore W2948204052C176217482 @default.