Matches in SemOpenAlex for { <https://semopenalex.org/work/W3034525128> ?p ?o ?g. }
Showing items 1 to 82 of
82
with 100 items per page.
- W3034525128 endingPage "9668" @default.
- W3034525128 startingPage "9659" @default.
- W3034525128 abstract "We provide theoretical investigations into off-policy evaluation in reinforcement learning using function approximators for (marginalized) importance weights and value functions. Our contributions include: (1) A new estimator, MWL, that directly estimates importance ratios over the state-action distributions, removing the reliance on knowledge of the behavior policy as in prior work (Liu et al., 2018). (2) Another new estimator, MQL, obtained by swapping the roles of importance weights and value-functions in MWL. MQL has an intuitive interpretation of minimizing average Bellman errors and can be combined with MWL in a doubly robust manner. (3) Several additional results that offer further insights into these methods, including the sample complexity analyses of MWL and MQL, their asymptotic optimality in the tabular setting, how the learned importance weights depend the choice of the discriminator class, and how our methods provide a unified view of some old and new algorithms in RL." @default.
- W3034525128 created "2020-06-19" @default.
- W3034525128 creator A5000145545 @default.
- W3034525128 creator A5000984081 @default.
- W3034525128 creator A5008181744 @default.
- W3034525128 date "2020-07-12" @default.
- W3034525128 modified "2023-09-23" @default.
- W3034525128 title "Minimax Weight and Q-Function Learning for Off-Policy Evaluation" @default.
- W3034525128 hasPublicationYear "2020" @default.
- W3034525128 type Work @default.
- W3034525128 sameAs 3034525128 @default.
- W3034525128 citedByCount "39" @default.
- W3034525128 countsByYear W30345251282019 @default.
- W3034525128 countsByYear W30345251282020 @default.
- W3034525128 countsByYear W30345251282021 @default.
- W3034525128 countsByYear W30345251282022 @default.
- W3034525128 crossrefType "proceedings-article" @default.
- W3034525128 hasAuthorship W3034525128A5000145545 @default.
- W3034525128 hasAuthorship W3034525128A5000984081 @default.
- W3034525128 hasAuthorship W3034525128A5008181744 @default.
- W3034525128 hasConcept C105795698 @default.
- W3034525128 hasConcept C126255220 @default.
- W3034525128 hasConcept C14036430 @default.
- W3034525128 hasConcept C14646407 @default.
- W3034525128 hasConcept C149728462 @default.
- W3034525128 hasConcept C154945302 @default.
- W3034525128 hasConcept C185429906 @default.
- W3034525128 hasConcept C2777212361 @default.
- W3034525128 hasConcept C2778445095 @default.
- W3034525128 hasConcept C2779803651 @default.
- W3034525128 hasConcept C33923547 @default.
- W3034525128 hasConcept C41008148 @default.
- W3034525128 hasConcept C76155785 @default.
- W3034525128 hasConcept C78458016 @default.
- W3034525128 hasConcept C86803240 @default.
- W3034525128 hasConcept C94915269 @default.
- W3034525128 hasConcept C97541855 @default.
- W3034525128 hasConceptScore W3034525128C105795698 @default.
- W3034525128 hasConceptScore W3034525128C126255220 @default.
- W3034525128 hasConceptScore W3034525128C14036430 @default.
- W3034525128 hasConceptScore W3034525128C14646407 @default.
- W3034525128 hasConceptScore W3034525128C149728462 @default.
- W3034525128 hasConceptScore W3034525128C154945302 @default.
- W3034525128 hasConceptScore W3034525128C185429906 @default.
- W3034525128 hasConceptScore W3034525128C2777212361 @default.
- W3034525128 hasConceptScore W3034525128C2778445095 @default.
- W3034525128 hasConceptScore W3034525128C2779803651 @default.
- W3034525128 hasConceptScore W3034525128C33923547 @default.
- W3034525128 hasConceptScore W3034525128C41008148 @default.
- W3034525128 hasConceptScore W3034525128C76155785 @default.
- W3034525128 hasConceptScore W3034525128C78458016 @default.
- W3034525128 hasConceptScore W3034525128C86803240 @default.
- W3034525128 hasConceptScore W3034525128C94915269 @default.
- W3034525128 hasConceptScore W3034525128C97541855 @default.
- W3034525128 hasOpenAccess W3034525128 @default.
- W3034525128 hasRelatedWork W1514587017 @default.
- W3034525128 hasRelatedWork W2104753538 @default.
- W3034525128 hasRelatedWork W2117355432 @default.
- W3034525128 hasRelatedWork W2119567691 @default.
- W3034525128 hasRelatedWork W2120346334 @default.
- W3034525128 hasRelatedWork W2121863487 @default.
- W3034525128 hasRelatedWork W2890022552 @default.
- W3034525128 hasRelatedWork W2945624305 @default.
- W3034525128 hasRelatedWork W2962785728 @default.
- W3034525128 hasRelatedWork W2962802563 @default.
- W3034525128 hasRelatedWork W2963704132 @default.
- W3034525128 hasRelatedWork W2964068481 @default.
- W3034525128 hasRelatedWork W2971026276 @default.
- W3034525128 hasRelatedWork W2981972696 @default.
- W3034525128 hasRelatedWork W2991522342 @default.
- W3034525128 hasRelatedWork W2993185773 @default.
- W3034525128 hasRelatedWork W2994709386 @default.
- W3034525128 hasRelatedWork W2994798986 @default.
- W3034525128 hasRelatedWork W2994838927 @default.
- W3034525128 hasRelatedWork W3022566517 @default.
- W3034525128 hasVolume "1" @default.
- W3034525128 isParatext "false" @default.
- W3034525128 isRetracted "false" @default.
- W3034525128 magId "3034525128" @default.
- W3034525128 workType "article" @default.