Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386408085> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4386408085 endingPage "28767" @default.
- W4386408085 startingPage "28746" @default.
- W4386408085 abstract "Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to restrict planning to a less complex set of policies when estimating an MDP from sparse or noisy data (Jiang et al., 2015). It is commonly understood that discount regularization functions by de-emphasizing or ignoring delayed effects. In this paper, we reveal an alternate view of discount regularization that exposes unintended consequences. We demonstrate that planning under a lower discount factor produces an identical optimal policy to planning using any prior on the transition matrix that has the same distribution for all states and actions. In fact, it functions like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. Our equivalence theorem leads to an explicit formula to set regularization parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific method across simple empirical examples as well as a medical cancer simulator." @default.
- W4386408085 created "2023-09-05" @default.
- W4386408085 creator A5019933957 @default.
- W4386408085 creator A5022629804 @default.
- W4386408085 creator A5025828990 @default.
- W4386408085 creator A5038771285 @default.
- W4386408085 creator A5077652158 @default.
- W4386408085 date "2023-07-01" @default.
- W4386408085 modified "2023-10-14" @default.
- W4386408085 title "The Unintended Consequences of Discount Regularization: Improving Regularization in Certainty Equivalence Reinforcement Learning." @default.
- W4386408085 cites W2096976789 @default.
- W4386408085 cites W2138434918 @default.
- W4386408085 cites W2891520095 @default.
- W4386408085 cites W3011199275 @default.
- W4386408085 cites W3084947478 @default.
- W4386408085 cites W4286561193 @default.
- W4386408085 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/37662875" @default.
- W4386408085 hasPublicationYear "2023" @default.
- W4386408085 type Work @default.
- W4386408085 citedByCount "0" @default.
- W4386408085 crossrefType "journal-article" @default.
- W4386408085 hasAuthorship W4386408085A5019933957 @default.
- W4386408085 hasAuthorship W4386408085A5022629804 @default.
- W4386408085 hasAuthorship W4386408085A5025828990 @default.
- W4386408085 hasAuthorship W4386408085A5038771285 @default.
- W4386408085 hasAuthorship W4386408085A5077652158 @default.
- W4386408085 hasConcept C118615104 @default.
- W4386408085 hasConcept C126255220 @default.
- W4386408085 hasConcept C134306372 @default.
- W4386408085 hasConcept C135252773 @default.
- W4386408085 hasConcept C141718189 @default.
- W4386408085 hasConcept C149782125 @default.
- W4386408085 hasConcept C152442038 @default.
- W4386408085 hasConcept C154945302 @default.
- W4386408085 hasConcept C2776135515 @default.
- W4386408085 hasConcept C2780069185 @default.
- W4386408085 hasConcept C28761237 @default.
- W4386408085 hasConcept C33923547 @default.
- W4386408085 hasConcept C41008148 @default.
- W4386408085 hasConcept C79248915 @default.
- W4386408085 hasConcept C97541855 @default.
- W4386408085 hasConceptScore W4386408085C118615104 @default.
- W4386408085 hasConceptScore W4386408085C126255220 @default.
- W4386408085 hasConceptScore W4386408085C134306372 @default.
- W4386408085 hasConceptScore W4386408085C135252773 @default.
- W4386408085 hasConceptScore W4386408085C141718189 @default.
- W4386408085 hasConceptScore W4386408085C149782125 @default.
- W4386408085 hasConceptScore W4386408085C152442038 @default.
- W4386408085 hasConceptScore W4386408085C154945302 @default.
- W4386408085 hasConceptScore W4386408085C2776135515 @default.
- W4386408085 hasConceptScore W4386408085C2780069185 @default.
- W4386408085 hasConceptScore W4386408085C28761237 @default.
- W4386408085 hasConceptScore W4386408085C33923547 @default.
- W4386408085 hasConceptScore W4386408085C41008148 @default.
- W4386408085 hasConceptScore W4386408085C79248915 @default.
- W4386408085 hasConceptScore W4386408085C97541855 @default.
- W4386408085 hasLocation W43864080851 @default.
- W4386408085 hasOpenAccess W4386408085 @default.
- W4386408085 hasPrimaryLocation W43864080851 @default.
- W4386408085 hasRelatedWork W105745955 @default.
- W4386408085 hasRelatedWork W1481686068 @default.
- W4386408085 hasRelatedWork W2124464778 @default.
- W4386408085 hasRelatedWork W2164161365 @default.
- W4386408085 hasRelatedWork W2366023887 @default.
- W4386408085 hasRelatedWork W2805921369 @default.
- W4386408085 hasRelatedWork W2807240017 @default.
- W4386408085 hasRelatedWork W3213636572 @default.
- W4386408085 hasRelatedWork W4286858311 @default.
- W4386408085 hasRelatedWork W1678245947 @default.
- W4386408085 hasVolume "202" @default.
- W4386408085 isParatext "false" @default.
- W4386408085 isRetracted "false" @default.
- W4386408085 workType "article" @default.