Matches in SemOpenAlex for { <https://semopenalex.org/work/W4312713068> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W4312713068 abstract "Model-based reinforcement learning algorithms can alleviate the low sample efficiency problem compared with modelfree methods for control tasks. However, the learned policy's performance often lags behind the best model-free algorithms since its weak exploration ability. Existing model-based reinforcement learning algorithms learn policy by interacting with the learned world model and then use the learned policy to guide a new round of world model learning. Due to weak policy exploration ability, the learned world model has a large bias. As a result, it fails to learn the globally optimal policy on such a world model. This paper improves the learned world model by maximizing both the reward and the corresponding policy entropy in the framework of maximum entropy reinforcement learning. The effectiveness of applying the maximum entropy approach to model-based reinforcement learning is supported by the better performance of our algorithm on several complex mujoco and deepmind control suite tasks." @default.
- W4312713068 created "2023-01-05" @default.
- W4312713068 creator A5004031758 @default.
- W4312713068 creator A5012509620 @default.
- W4312713068 creator A5032986088 @default.
- W4312713068 creator A5048965499 @default.
- W4312713068 date "2022-07-18" @default.
- W4312713068 modified "2023-10-18" @default.
- W4312713068 title "MaxEnt Dreamer: Maximum Entropy Reinforcement Learning with World Model" @default.
- W4312713068 cites W2145339207 @default.
- W4312713068 cites W2158782408 @default.
- W4312713068 cites W2742169147 @default.
- W4312713068 cites W2746553466 @default.
- W4312713068 cites W2787666871 @default.
- W4312713068 cites W2798494119 @default.
- W4312713068 cites W2888492136 @default.
- W4312713068 cites W2963403593 @default.
- W4312713068 cites W2963523627 @default.
- W4312713068 cites W3118210634 @default.
- W4312713068 cites W32403112 @default.
- W4312713068 doi "https://doi.org/10.1109/ijcnn55064.2022.9892381" @default.
- W4312713068 hasPublicationYear "2022" @default.
- W4312713068 type Work @default.
- W4312713068 citedByCount "0" @default.
- W4312713068 crossrefType "proceedings-article" @default.
- W4312713068 hasAuthorship W4312713068A5004031758 @default.
- W4312713068 hasAuthorship W4312713068A5012509620 @default.
- W4312713068 hasAuthorship W4312713068A5032986088 @default.
- W4312713068 hasAuthorship W4312713068A5048965499 @default.
- W4312713068 hasConcept C106301342 @default.
- W4312713068 hasConcept C119857082 @default.
- W4312713068 hasConcept C121332964 @default.
- W4312713068 hasConcept C126255220 @default.
- W4312713068 hasConcept C127413603 @default.
- W4312713068 hasConcept C154945302 @default.
- W4312713068 hasConcept C17744445 @default.
- W4312713068 hasConcept C196340769 @default.
- W4312713068 hasConcept C199539241 @default.
- W4312713068 hasConcept C33923547 @default.
- W4312713068 hasConcept C41008148 @default.
- W4312713068 hasConcept C62520636 @default.
- W4312713068 hasConcept C66938386 @default.
- W4312713068 hasConcept C67203356 @default.
- W4312713068 hasConcept C79581498 @default.
- W4312713068 hasConcept C9679016 @default.
- W4312713068 hasConcept C97541855 @default.
- W4312713068 hasConceptScore W4312713068C106301342 @default.
- W4312713068 hasConceptScore W4312713068C119857082 @default.
- W4312713068 hasConceptScore W4312713068C121332964 @default.
- W4312713068 hasConceptScore W4312713068C126255220 @default.
- W4312713068 hasConceptScore W4312713068C127413603 @default.
- W4312713068 hasConceptScore W4312713068C154945302 @default.
- W4312713068 hasConceptScore W4312713068C17744445 @default.
- W4312713068 hasConceptScore W4312713068C196340769 @default.
- W4312713068 hasConceptScore W4312713068C199539241 @default.
- W4312713068 hasConceptScore W4312713068C33923547 @default.
- W4312713068 hasConceptScore W4312713068C41008148 @default.
- W4312713068 hasConceptScore W4312713068C62520636 @default.
- W4312713068 hasConceptScore W4312713068C66938386 @default.
- W4312713068 hasConceptScore W4312713068C67203356 @default.
- W4312713068 hasConceptScore W4312713068C79581498 @default.
- W4312713068 hasConceptScore W4312713068C9679016 @default.
- W4312713068 hasConceptScore W4312713068C97541855 @default.
- W4312713068 hasLocation W43127130681 @default.
- W4312713068 hasOpenAccess W4312713068 @default.
- W4312713068 hasPrimaryLocation W43127130681 @default.
- W4312713068 hasRelatedWork W1564932097 @default.
- W4312713068 hasRelatedWork W172603552 @default.
- W4312713068 hasRelatedWork W2071035582 @default.
- W4312713068 hasRelatedWork W2105011545 @default.
- W4312713068 hasRelatedWork W3022038857 @default.
- W4312713068 hasRelatedWork W3035324854 @default.
- W4312713068 hasRelatedWork W3102693044 @default.
- W4312713068 hasRelatedWork W3211602134 @default.
- W4312713068 hasRelatedWork W4312713068 @default.
- W4312713068 hasRelatedWork W4319083788 @default.
- W4312713068 isParatext "false" @default.
- W4312713068 isRetracted "false" @default.
- W4312713068 workType "article" @default.