Matches in SemOpenAlex for { <https://semopenalex.org/work/W4361230992> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4361230992 abstract "Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing $Q$-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed textit{In-sample Learning} paradigm (i.e., IQL), which improves the policy by quantile regression using only data samples, shows great promise because it learns an optimal policy without querying the value function of any unseen actions. However, it remains unclear how this type of method handles the distributional shift in learning the value function. In this work, we make a key finding that the in-sample learning paradigm arises under the textit{Implicit Value Regularization} (IVR) framework. This gives a deeper understanding of why the in-sample learning paradigm works, i.e., it applies implicit value regularization to the policy. Based on the IVR framework, we further propose two practical algorithms, Sparse $Q$-learning (SQL) and Exponential $Q$-learning (EQL), which adopt the same value regularization used in existing works, but in a complete in-sample manner. Compared with IQL, we find that our algorithms introduce sparsity in learning the value function, making them more robust in noisy data regimes. We also verify the effectiveness of SQL and EQL on D4RL benchmark datasets and show the benefits of in-sample learning by comparing them with CQL in small data regimes." @default.
- W4361230992 created "2023-03-31" @default.
- W4361230992 creator A5046600167 @default.
- W4361230992 creator A5048272675 @default.
- W4361230992 creator A5052921346 @default.
- W4361230992 creator A5066768790 @default.
- W4361230992 creator A5078210646 @default.
- W4361230992 creator A5078581090 @default.
- W4361230992 creator A5088234990 @default.
- W4361230992 date "2023-03-28" @default.
- W4361230992 modified "2023-09-27" @default.
- W4361230992 title "Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization" @default.
- W4361230992 doi "https://doi.org/10.48550/arxiv.2303.15810" @default.
- W4361230992 hasPublicationYear "2023" @default.
- W4361230992 type Work @default.
- W4361230992 citedByCount "0" @default.
- W4361230992 crossrefType "posted-content" @default.
- W4361230992 hasAuthorship W4361230992A5046600167 @default.
- W4361230992 hasAuthorship W4361230992A5048272675 @default.
- W4361230992 hasAuthorship W4361230992A5052921346 @default.
- W4361230992 hasAuthorship W4361230992A5066768790 @default.
- W4361230992 hasAuthorship W4361230992A5078210646 @default.
- W4361230992 hasAuthorship W4361230992A5078581090 @default.
- W4361230992 hasAuthorship W4361230992A5088234990 @default.
- W4361230992 hasBestOaLocation W43612309921 @default.
- W4361230992 hasConcept C119857082 @default.
- W4361230992 hasConcept C126255220 @default.
- W4361230992 hasConcept C14036430 @default.
- W4361230992 hasConcept C14646407 @default.
- W4361230992 hasConcept C154945302 @default.
- W4361230992 hasConcept C185592680 @default.
- W4361230992 hasConcept C198531522 @default.
- W4361230992 hasConcept C2776135515 @default.
- W4361230992 hasConcept C33923547 @default.
- W4361230992 hasConcept C41008148 @default.
- W4361230992 hasConcept C43617362 @default.
- W4361230992 hasConcept C63817138 @default.
- W4361230992 hasConcept C78458016 @default.
- W4361230992 hasConcept C86803240 @default.
- W4361230992 hasConcept C97541855 @default.
- W4361230992 hasConceptScore W4361230992C119857082 @default.
- W4361230992 hasConceptScore W4361230992C126255220 @default.
- W4361230992 hasConceptScore W4361230992C14036430 @default.
- W4361230992 hasConceptScore W4361230992C14646407 @default.
- W4361230992 hasConceptScore W4361230992C154945302 @default.
- W4361230992 hasConceptScore W4361230992C185592680 @default.
- W4361230992 hasConceptScore W4361230992C198531522 @default.
- W4361230992 hasConceptScore W4361230992C2776135515 @default.
- W4361230992 hasConceptScore W4361230992C33923547 @default.
- W4361230992 hasConceptScore W4361230992C41008148 @default.
- W4361230992 hasConceptScore W4361230992C43617362 @default.
- W4361230992 hasConceptScore W4361230992C63817138 @default.
- W4361230992 hasConceptScore W4361230992C78458016 @default.
- W4361230992 hasConceptScore W4361230992C86803240 @default.
- W4361230992 hasConceptScore W4361230992C97541855 @default.
- W4361230992 hasLocation W43612309921 @default.
- W4361230992 hasOpenAccess W4361230992 @default.
- W4361230992 hasPrimaryLocation W43612309921 @default.
- W4361230992 hasRelatedWork W2155027007 @default.
- W4361230992 hasRelatedWork W2160284799 @default.
- W4361230992 hasRelatedWork W2624731731 @default.
- W4361230992 hasRelatedWork W2765302304 @default.
- W4361230992 hasRelatedWork W2950892788 @default.
- W4361230992 hasRelatedWork W2963680188 @default.
- W4361230992 hasRelatedWork W3022038857 @default.
- W4361230992 hasRelatedWork W3188220908 @default.
- W4361230992 hasRelatedWork W4289860834 @default.
- W4361230992 hasRelatedWork W4319083788 @default.
- W4361230992 isParatext "false" @default.
- W4361230992 isRetracted "false" @default.
- W4361230992 workType "article" @default.