Matches in SemOpenAlex for { <https://semopenalex.org/work/W3214059887> ?p ?o ?g. }
Showing items 1 to 86 of
86
with 100 items per page.
- W3214059887 abstract "Many of the challenges facing today's reinforcement learning (RL) algorithms, such as robustness, generalization, transfer, and computational efficiency are closely related to compression. Prior work has convincingly argued why minimizing information is useful in the supervised learning setting, but standard RL algorithms lack an explicit mechanism for compression. The RL setting is unique because (1) its sequential nature allows an agent to use past information to avoid looking at future observations and (2) the agent can optimize its behavior to prefer states where decision making requires few bits. We take advantage of these properties to propose a method (RPC) for learning simple policies. This method brings together ideas from information bottlenecks, model-based RL, and bits-back coding into a simple and theoretically-justified algorithm. Our method jointly optimizes a latent-space model and policy to be self-consistent, such that the policy avoids states where the model is inaccurate. We demonstrate that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck. We also demonstrate that our method learns policies that are more robust and generalize better to new tasks." @default.
- W3214059887 created "2021-11-22" @default.
- W3214059887 creator A5026322200 @default.
- W3214059887 creator A5035051008 @default.
- W3214059887 creator A5071983998 @default.
- W3214059887 date "2021-12-06" @default.
- W3214059887 modified "2023-09-24" @default.
- W3214059887 title "Robust Predictable Control" @default.
- W3214059887 hasPublicationYear "2021" @default.
- W3214059887 type Work @default.
- W3214059887 sameAs 3214059887 @default.
- W3214059887 citedByCount "0" @default.
- W3214059887 crossrefType "proceedings-article" @default.
- W3214059887 hasAuthorship W3214059887A5026322200 @default.
- W3214059887 hasAuthorship W3214059887A5035051008 @default.
- W3214059887 hasAuthorship W3214059887A5071983998 @default.
- W3214059887 hasConcept C104317684 @default.
- W3214059887 hasConcept C105795698 @default.
- W3214059887 hasConcept C111472728 @default.
- W3214059887 hasConcept C119857082 @default.
- W3214059887 hasConcept C134306372 @default.
- W3214059887 hasConcept C138885662 @default.
- W3214059887 hasConcept C149635348 @default.
- W3214059887 hasConcept C152139883 @default.
- W3214059887 hasConcept C154945302 @default.
- W3214059887 hasConcept C177148314 @default.
- W3214059887 hasConcept C179518139 @default.
- W3214059887 hasConcept C185592680 @default.
- W3214059887 hasConcept C2780513914 @default.
- W3214059887 hasConcept C2780586882 @default.
- W3214059887 hasConcept C33923547 @default.
- W3214059887 hasConcept C41008148 @default.
- W3214059887 hasConcept C55493867 @default.
- W3214059887 hasConcept C60008888 @default.
- W3214059887 hasConcept C63479239 @default.
- W3214059887 hasConcept C78548338 @default.
- W3214059887 hasConcept C97541855 @default.
- W3214059887 hasConceptScore W3214059887C104317684 @default.
- W3214059887 hasConceptScore W3214059887C105795698 @default.
- W3214059887 hasConceptScore W3214059887C111472728 @default.
- W3214059887 hasConceptScore W3214059887C119857082 @default.
- W3214059887 hasConceptScore W3214059887C134306372 @default.
- W3214059887 hasConceptScore W3214059887C138885662 @default.
- W3214059887 hasConceptScore W3214059887C149635348 @default.
- W3214059887 hasConceptScore W3214059887C152139883 @default.
- W3214059887 hasConceptScore W3214059887C154945302 @default.
- W3214059887 hasConceptScore W3214059887C177148314 @default.
- W3214059887 hasConceptScore W3214059887C179518139 @default.
- W3214059887 hasConceptScore W3214059887C185592680 @default.
- W3214059887 hasConceptScore W3214059887C2780513914 @default.
- W3214059887 hasConceptScore W3214059887C2780586882 @default.
- W3214059887 hasConceptScore W3214059887C33923547 @default.
- W3214059887 hasConceptScore W3214059887C41008148 @default.
- W3214059887 hasConceptScore W3214059887C55493867 @default.
- W3214059887 hasConceptScore W3214059887C60008888 @default.
- W3214059887 hasConceptScore W3214059887C63479239 @default.
- W3214059887 hasConceptScore W3214059887C78548338 @default.
- W3214059887 hasConceptScore W3214059887C97541855 @default.
- W3214059887 hasLocation W32140598871 @default.
- W3214059887 hasOpenAccess W3214059887 @default.
- W3214059887 hasPrimaryLocation W32140598871 @default.
- W3214059887 hasRelatedWork W142858861 @default.
- W3214059887 hasRelatedWork W2524179627 @default.
- W3214059887 hasRelatedWork W2616944917 @default.
- W3214059887 hasRelatedWork W2787477569 @default.
- W3214059887 hasRelatedWork W2890211540 @default.
- W3214059887 hasRelatedWork W2899670155 @default.
- W3214059887 hasRelatedWork W2902854091 @default.
- W3214059887 hasRelatedWork W2970355847 @default.
- W3214059887 hasRelatedWork W2990216309 @default.
- W3214059887 hasRelatedWork W2996201207 @default.
- W3214059887 hasRelatedWork W3004810722 @default.
- W3214059887 hasRelatedWork W3011584947 @default.
- W3214059887 hasRelatedWork W3037613053 @default.
- W3214059887 hasRelatedWork W3042380218 @default.
- W3214059887 hasRelatedWork W3090448441 @default.
- W3214059887 hasRelatedWork W3097059970 @default.
- W3214059887 hasRelatedWork W3178225342 @default.
- W3214059887 hasRelatedWork W3197524599 @default.
- W3214059887 hasRelatedWork W3208550664 @default.
- W3214059887 hasRelatedWork W50830905 @default.
- W3214059887 hasVolume "34" @default.
- W3214059887 isParatext "false" @default.
- W3214059887 isRetracted "false" @default.
- W3214059887 magId "3214059887" @default.
- W3214059887 workType "article" @default.