Matches in SemOpenAlex for { <https://semopenalex.org/work/W3037842334> ?p ?o ?g. }
- W3037842334 abstract "We consider the task of policy learning from an offline dataset generated by some behavior policy. We analyze the two most prominent families of algorithms for this task: policy optimization and Q-learning. We demonstrate that policy optimization suffers from two problems, overfitting and spurious minima, that do not appear in Q-learning or full-feedback problems (i.e. cost-sensitive classification). Specifically, we describe the phenomenon of ``bandit overfitting'' in which an algorithm overfits based on the actions observed in the dataset, and show that it affects policy optimization but not Q-learning. Moreover, we show that the policy optimization objective suffers from spurious minima even with linear policies, whereas the Q-learning objective is convex for linear models. We empirically verify the existence of both problems in realistic datasets with neural network models." @default.
- W3037842334 created "2020-07-02" @default.
- W3037842334 creator A5005235934 @default.
- W3037842334 creator A5022202456 @default.
- W3037842334 creator A5037978510 @default.
- W3037842334 creator A5086430646 @default.
- W3037842334 date "2020-06-27" @default.
- W3037842334 modified "2023-09-26" @default.
- W3037842334 title "Overfitting and Optimization in Offline Policy Learning." @default.
- W3037842334 cites W123476658 @default.
- W3037842334 cites W1522301498 @default.
- W3037842334 cites W1530699444 @default.
- W3037842334 cites W1809653203 @default.
- W3037842334 cites W192920577 @default.
- W3037842334 cites W2020160576 @default.
- W3037842334 cites W2044523229 @default.
- W3037842334 cites W2101762657 @default.
- W3037842334 cites W2113651538 @default.
- W3037842334 cites W2121863487 @default.
- W3037842334 cites W2122124659 @default.
- W3037842334 cites W2145339207 @default.
- W3037842334 cites W2194775991 @default.
- W3037842334 cites W2201912979 @default.
- W3037842334 cites W2609169452 @default.
- W3037842334 cites W2768978543 @default.
- W3037842334 cites W2785875001 @default.
- W3037842334 cites W2787551658 @default.
- W3037842334 cites W2810365967 @default.
- W3037842334 cites W2810785043 @default.
- W3037842334 cites W2894604724 @default.
- W3037842334 cites W2899912485 @default.
- W3037842334 cites W2908670005 @default.
- W3037842334 cites W2936304709 @default.
- W3037842334 cites W2946146170 @default.
- W3037842334 cites W2950220847 @default.
- W3037842334 cites W2951403958 @default.
- W3037842334 cites W2952594289 @default.
- W3037842334 cites W2953334758 @default.
- W3037842334 cites W2962736281 @default.
- W3037842334 cites W2963323139 @default.
- W3037842334 cites W2963453204 @default.
- W3037842334 cites W2963534251 @default.
- W3037842334 cites W2963704132 @default.
- W3037842334 cites W2964270008 @default.
- W3037842334 cites W2970168598 @default.
- W3037842334 cites W2970971581 @default.
- W3037842334 cites W2971026276 @default.
- W3037842334 cites W2994081359 @default.
- W3037842334 cites W3002720540 @default.
- W3037842334 cites W3022566517 @default.
- W3037842334 cites W3118608800 @default.
- W3037842334 cites W3125697501 @default.
- W3037842334 cites W91593682 @default.
- W3037842334 hasPublicationYear "2020" @default.
- W3037842334 type Work @default.
- W3037842334 sameAs 3037842334 @default.
- W3037842334 citedByCount "0" @default.
- W3037842334 crossrefType "posted-content" @default.
- W3037842334 hasAuthorship W3037842334A5005235934 @default.
- W3037842334 hasAuthorship W3037842334A5022202456 @default.
- W3037842334 hasAuthorship W3037842334A5037978510 @default.
- W3037842334 hasAuthorship W3037842334A5086430646 @default.
- W3037842334 hasConcept C11413529 @default.
- W3037842334 hasConcept C119857082 @default.
- W3037842334 hasConcept C126255220 @default.
- W3037842334 hasConcept C134306372 @default.
- W3037842334 hasConcept C137836250 @default.
- W3037842334 hasConcept C154945302 @default.
- W3037842334 hasConcept C162324750 @default.
- W3037842334 hasConcept C186633575 @default.
- W3037842334 hasConcept C187736073 @default.
- W3037842334 hasConcept C22019652 @default.
- W3037842334 hasConcept C2780451532 @default.
- W3037842334 hasConcept C33923547 @default.
- W3037842334 hasConcept C41008148 @default.
- W3037842334 hasConcept C50644808 @default.
- W3037842334 hasConcept C97256817 @default.
- W3037842334 hasConceptScore W3037842334C11413529 @default.
- W3037842334 hasConceptScore W3037842334C119857082 @default.
- W3037842334 hasConceptScore W3037842334C126255220 @default.
- W3037842334 hasConceptScore W3037842334C134306372 @default.
- W3037842334 hasConceptScore W3037842334C137836250 @default.
- W3037842334 hasConceptScore W3037842334C154945302 @default.
- W3037842334 hasConceptScore W3037842334C162324750 @default.
- W3037842334 hasConceptScore W3037842334C186633575 @default.
- W3037842334 hasConceptScore W3037842334C187736073 @default.
- W3037842334 hasConceptScore W3037842334C22019652 @default.
- W3037842334 hasConceptScore W3037842334C2780451532 @default.
- W3037842334 hasConceptScore W3037842334C33923547 @default.
- W3037842334 hasConceptScore W3037842334C41008148 @default.
- W3037842334 hasConceptScore W3037842334C50644808 @default.
- W3037842334 hasConceptScore W3037842334C97256817 @default.
- W3037842334 hasLocation W30378423341 @default.
- W3037842334 hasOpenAccess W3037842334 @default.
- W3037842334 hasPrimaryLocation W30378423341 @default.
- W3037842334 hasRelatedWork W1660214635 @default.
- W3037842334 hasRelatedWork W1731608655 @default.
- W3037842334 hasRelatedWork W1859820673 @default.
- W3037842334 hasRelatedWork W2143203060 @default.
- W3037842334 hasRelatedWork W2150647453 @default.