Matches in SemOpenAlex for { <https://semopenalex.org/work/W4283016871> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4283016871 abstract "Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets, without interacting with the underlying environment during the learning process. A key challenge of offline RL is the instability of policy training, caused by the mismatch between the distribution of the offline data and the undiscounted stationary state-action distribution of the learned policy. To avoid the detrimental impact of distribution mismatch, we regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process. Further, we train a dynamics model to both implement this regularization and better estimate the stationary distribution of the current policy, reducing the error induced by distribution mismatch. On a wide range of continuous-control offline RL datasets, our method indicates competitive performance, which validates our algorithm. The code is publicly available." @default.
- W4283016871 created "2022-06-18" @default.
- W4283016871 creator A5010394308 @default.
- W4283016871 creator A5036611759 @default.
- W4283016871 creator A5056212977 @default.
- W4283016871 creator A5067027007 @default.
- W4283016871 date "2022-06-14" @default.
- W4283016871 modified "2023-09-29" @default.
- W4283016871 title "Regularizing a Model-based Policy Stationary Distribution to Stabilize Offline Reinforcement Learning" @default.
- W4283016871 doi "https://doi.org/10.48550/arxiv.2206.07166" @default.
- W4283016871 hasPublicationYear "2022" @default.
- W4283016871 type Work @default.
- W4283016871 citedByCount "0" @default.
- W4283016871 crossrefType "posted-content" @default.
- W4283016871 hasAuthorship W4283016871A5010394308 @default.
- W4283016871 hasAuthorship W4283016871A5036611759 @default.
- W4283016871 hasAuthorship W4283016871A5056212977 @default.
- W4283016871 hasAuthorship W4283016871A5067027007 @default.
- W4283016871 hasBestOaLocation W42830168711 @default.
- W4283016871 hasConcept C110121322 @default.
- W4283016871 hasConcept C111919701 @default.
- W4283016871 hasConcept C112972136 @default.
- W4283016871 hasConcept C119857082 @default.
- W4283016871 hasConcept C126255220 @default.
- W4283016871 hasConcept C127413603 @default.
- W4283016871 hasConcept C134306372 @default.
- W4283016871 hasConcept C136764020 @default.
- W4283016871 hasConcept C146978453 @default.
- W4283016871 hasConcept C154945302 @default.
- W4283016871 hasConcept C204323151 @default.
- W4283016871 hasConcept C2776135515 @default.
- W4283016871 hasConcept C2780102126 @default.
- W4283016871 hasConcept C2780490138 @default.
- W4283016871 hasConcept C2986087404 @default.
- W4283016871 hasConcept C33923547 @default.
- W4283016871 hasConcept C41008148 @default.
- W4283016871 hasConcept C97541855 @default.
- W4283016871 hasConcept C98045186 @default.
- W4283016871 hasConceptScore W4283016871C110121322 @default.
- W4283016871 hasConceptScore W4283016871C111919701 @default.
- W4283016871 hasConceptScore W4283016871C112972136 @default.
- W4283016871 hasConceptScore W4283016871C119857082 @default.
- W4283016871 hasConceptScore W4283016871C126255220 @default.
- W4283016871 hasConceptScore W4283016871C127413603 @default.
- W4283016871 hasConceptScore W4283016871C134306372 @default.
- W4283016871 hasConceptScore W4283016871C136764020 @default.
- W4283016871 hasConceptScore W4283016871C146978453 @default.
- W4283016871 hasConceptScore W4283016871C154945302 @default.
- W4283016871 hasConceptScore W4283016871C204323151 @default.
- W4283016871 hasConceptScore W4283016871C2776135515 @default.
- W4283016871 hasConceptScore W4283016871C2780102126 @default.
- W4283016871 hasConceptScore W4283016871C2780490138 @default.
- W4283016871 hasConceptScore W4283016871C2986087404 @default.
- W4283016871 hasConceptScore W4283016871C33923547 @default.
- W4283016871 hasConceptScore W4283016871C41008148 @default.
- W4283016871 hasConceptScore W4283016871C97541855 @default.
- W4283016871 hasConceptScore W4283016871C98045186 @default.
- W4283016871 hasLocation W42830168711 @default.
- W4283016871 hasOpenAccess W4283016871 @default.
- W4283016871 hasPrimaryLocation W42830168711 @default.
- W4283016871 hasRelatedWork W3022038857 @default.
- W4283016871 hasRelatedWork W3034786558 @default.
- W4283016871 hasRelatedWork W4221145086 @default.
- W4283016871 hasRelatedWork W4225619808 @default.
- W4283016871 hasRelatedWork W4226283576 @default.
- W4283016871 hasRelatedWork W4283016871 @default.
- W4283016871 hasRelatedWork W4311991951 @default.
- W4283016871 hasRelatedWork W4318621078 @default.
- W4283016871 hasRelatedWork W4319083788 @default.
- W4283016871 hasRelatedWork W4226221094 @default.
- W4283016871 isParatext "false" @default.
- W4283016871 isRetracted "false" @default.
- W4283016871 workType "article" @default.