Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313446032> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4313446032 abstract "Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms. We show that the regularizer leads to a lower bound to the offline policy optimization objective, which can help avoid over-estimation errors, and explains the benefits of our approach across a range of continuous control domains when compared to existing state-of-the-art algorithms." @default.
- W4313446032 created "2023-01-06" @default.
- W4313446032 creator A5000297059 @default.
- W4313446032 creator A5036366964 @default.
- W4313446032 creator A5048272675 @default.
- W4313446032 creator A5054850777 @default.
- W4313446032 creator A5061193324 @default.
- W4313446032 creator A5064794691 @default.
- W4313446032 creator A5065836447 @default.
- W4313446032 creator A5078210646 @default.
- W4313446032 creator A5079926596 @default.
- W4313446032 date "2022-12-29" @default.
- W4313446032 modified "2023-09-25" @default.
- W4313446032 title "Offline Policy Optimization in RL with Variance Regularizaton" @default.
- W4313446032 doi "https://doi.org/10.48550/arxiv.2212.14405" @default.
- W4313446032 hasPublicationYear "2022" @default.
- W4313446032 type Work @default.
- W4313446032 citedByCount "0" @default.
- W4313446032 crossrefType "posted-content" @default.
- W4313446032 hasAuthorship W4313446032A5000297059 @default.
- W4313446032 hasAuthorship W4313446032A5036366964 @default.
- W4313446032 hasAuthorship W4313446032A5048272675 @default.
- W4313446032 hasAuthorship W4313446032A5054850777 @default.
- W4313446032 hasAuthorship W4313446032A5061193324 @default.
- W4313446032 hasAuthorship W4313446032A5064794691 @default.
- W4313446032 hasAuthorship W4313446032A5065836447 @default.
- W4313446032 hasAuthorship W4313446032A5078210646 @default.
- W4313446032 hasAuthorship W4313446032A5079926596 @default.
- W4313446032 hasBestOaLocation W43134460321 @default.
- W4313446032 hasConcept C11413529 @default.
- W4313446032 hasConcept C121955636 @default.
- W4313446032 hasConcept C126255220 @default.
- W4313446032 hasConcept C144133560 @default.
- W4313446032 hasConcept C154945302 @default.
- W4313446032 hasConcept C159985019 @default.
- W4313446032 hasConcept C192562407 @default.
- W4313446032 hasConcept C196083921 @default.
- W4313446032 hasConcept C204323151 @default.
- W4313446032 hasConcept C26517878 @default.
- W4313446032 hasConcept C2776135515 @default.
- W4313446032 hasConcept C33923547 @default.
- W4313446032 hasConcept C38652104 @default.
- W4313446032 hasConcept C41008148 @default.
- W4313446032 hasConcept C97541855 @default.
- W4313446032 hasConceptScore W4313446032C11413529 @default.
- W4313446032 hasConceptScore W4313446032C121955636 @default.
- W4313446032 hasConceptScore W4313446032C126255220 @default.
- W4313446032 hasConceptScore W4313446032C144133560 @default.
- W4313446032 hasConceptScore W4313446032C154945302 @default.
- W4313446032 hasConceptScore W4313446032C159985019 @default.
- W4313446032 hasConceptScore W4313446032C192562407 @default.
- W4313446032 hasConceptScore W4313446032C196083921 @default.
- W4313446032 hasConceptScore W4313446032C204323151 @default.
- W4313446032 hasConceptScore W4313446032C26517878 @default.
- W4313446032 hasConceptScore W4313446032C2776135515 @default.
- W4313446032 hasConceptScore W4313446032C33923547 @default.
- W4313446032 hasConceptScore W4313446032C38652104 @default.
- W4313446032 hasConceptScore W4313446032C41008148 @default.
- W4313446032 hasConceptScore W4313446032C97541855 @default.
- W4313446032 hasLocation W43134460321 @default.
- W4313446032 hasOpenAccess W4313446032 @default.
- W4313446032 hasPrimaryLocation W43134460321 @default.
- W4313446032 hasRelatedWork W1562959674 @default.
- W4313446032 hasRelatedWork W2923653485 @default.
- W4313446032 hasRelatedWork W2946396478 @default.
- W4313446032 hasRelatedWork W2952472710 @default.
- W4313446032 hasRelatedWork W2953144887 @default.
- W4313446032 hasRelatedWork W2957776456 @default.
- W4313446032 hasRelatedWork W4206669594 @default.
- W4313446032 hasRelatedWork W4210912933 @default.
- W4313446032 hasRelatedWork W4224287422 @default.
- W4313446032 hasRelatedWork W4255994452 @default.
- W4313446032 isParatext "false" @default.
- W4313446032 isRetracted "false" @default.
- W4313446032 workType "article" @default.