SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W3100810159> ?p ?o ?g. }

Showing items 1 to 78 of 78 with 100 items per page.

W3100810159 endingPage "5552" @default.
W3100810159 startingPage "5541" @default.
W3100810159 abstract "Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning. As our main contribution, we introduce the principle of value equivalence: two models are value equivalent with respect to a set of functions and policies if they yield the same Bellman updates. We propose a formulation of the model learning problem based on the value equivalence principle and analyze how the set of feasible solutions is impacted by the choice of policies and functions. Specifically, we show that, as we augment the set of policies and functions considered, the class of value equivalent models shrinks, until eventually collapsing to a single point corresponding to a model that perfectly describes the environment. In many problems, directly modelling state-to-state transitions may be both difficult and unnecessary. By leveraging the value-equivalence principle one may find simpler models without compromising performance, saving computation and memory. We illustrate the benefits of value-equivalent model learning with experiments comparing it against more traditional counterparts like maximum likelihood estimation. More generally, we argue that the principle of value equivalence underlies a number of recent empirical successes in RL, such as Value Iteration Networks, the Predictron, Value Prediction Networks, TreeQN, and MuZero, and provides a first theoretical underpinning of those results." @default.
W3100810159 created "2020-11-23" @default.
W3100810159 creator A5008592589 @default.
W3100810159 creator A5045742904 @default.
W3100810159 creator A5065366930 @default.
W3100810159 creator A5091771290 @default.
W3100810159 date "2020-01-01" @default.
W3100810159 modified "2023-10-18" @default.
W3100810159 title "The Value Equivalence Principle for Model-Based Reinforcement Learning" @default.
W3100810159 hasPublicationYear "2020" @default.
W3100810159 type Work @default.
W3100810159 sameAs 3100810159 @default.
W3100810159 citedByCount "9" @default.
W3100810159 countsByYear W31008101592020 @default.
W3100810159 countsByYear W31008101592021 @default.
W3100810159 countsByYear W31008101592022 @default.
W3100810159 countsByYear W31008101592023 @default.
W3100810159 crossrefType "proceedings-article" @default.
W3100810159 hasAuthorship W3100810159A5008592589 @default.
W3100810159 hasAuthorship W3100810159A5045742904 @default.
W3100810159 hasAuthorship W3100810159A5065366930 @default.
W3100810159 hasAuthorship W3100810159A5091771290 @default.
W3100810159 hasConcept C118615104 @default.
W3100810159 hasConcept C119857082 @default.
W3100810159 hasConcept C126255220 @default.
W3100810159 hasConcept C14646407 @default.
W3100810159 hasConcept C154945302 @default.
W3100810159 hasConcept C177264268 @default.
W3100810159 hasConcept C199360897 @default.
W3100810159 hasConcept C2776291640 @default.
W3100810159 hasConcept C2777044963 @default.
W3100810159 hasConcept C2780069185 @default.
W3100810159 hasConcept C33923547 @default.
W3100810159 hasConcept C41008148 @default.
W3100810159 hasConcept C97541855 @default.
W3100810159 hasConceptScore W3100810159C118615104 @default.
W3100810159 hasConceptScore W3100810159C119857082 @default.
W3100810159 hasConceptScore W3100810159C126255220 @default.
W3100810159 hasConceptScore W3100810159C14646407 @default.
W3100810159 hasConceptScore W3100810159C154945302 @default.
W3100810159 hasConceptScore W3100810159C177264268 @default.
W3100810159 hasConceptScore W3100810159C199360897 @default.
W3100810159 hasConceptScore W3100810159C2776291640 @default.
W3100810159 hasConceptScore W3100810159C2777044963 @default.
W3100810159 hasConceptScore W3100810159C2780069185 @default.
W3100810159 hasConceptScore W3100810159C33923547 @default.
W3100810159 hasConceptScore W3100810159C41008148 @default.
W3100810159 hasConceptScore W3100810159C97541855 @default.
W3100810159 hasLocation W31008101591 @default.
W3100810159 hasOpenAccess W3100810159 @default.
W3100810159 hasPrimaryLocation W31008101591 @default.
W3100810159 hasRelatedWork W1714211023 @default.
W3100810159 hasRelatedWork W1846953850 @default.
W3100810159 hasRelatedWork W1980035368 @default.
W3100810159 hasRelatedWork W2121863487 @default.
W3100810159 hasRelatedWork W2511462892 @default.
W3100810159 hasRelatedWork W2741475873 @default.
W3100810159 hasRelatedWork W2890208753 @default.
W3100810159 hasRelatedWork W2932752459 @default.
W3100810159 hasRelatedWork W2946911039 @default.
W3100810159 hasRelatedWork W2963794592 @default.
W3100810159 hasRelatedWork W2970202659 @default.
W3100810159 hasRelatedWork W2990389059 @default.
W3100810159 hasRelatedWork W2995081787 @default.
W3100810159 hasRelatedWork W3036581350 @default.
W3100810159 hasRelatedWork W3038822267 @default.
W3100810159 hasRelatedWork W3093384886 @default.
W3100810159 hasRelatedWork W3098696050 @default.
W3100810159 hasRelatedWork W3098853508 @default.
W3100810159 hasRelatedWork W3103780890 @default.
W3100810159 hasRelatedWork W3118210634 @default.
W3100810159 hasVolume "33" @default.
W3100810159 isParatext "false" @default.
W3100810159 isRetracted "false" @default.
W3100810159 magId "3100810159" @default.
W3100810159 workType "article" @default.