Matches in SemOpenAlex for { <https://semopenalex.org/work/W2902037383> ?p ?o ?g. }
Showing items 1 to 100 of
100
with 100 items per page.
- W2902037383 abstract "We consider reinforcement learning (RL) in continuous time and study the problem of achieving the best trade-off between exploration of a black box environment and exploitation of current knowledge. We propose an entropy-regularized reward function involving the differential entropy of the distributions of actions, and motivate and devise an exploratory formulation for the feature dynamics that captures repetitive learning under exploration. The resulting optimization problem is a revitalization of the classical relaxed stochastic control. We carry out a complete analysis of the problem in the linear--quadratic (LQ) setting and deduce that the optimal feedback control distribution for balancing exploitation and exploration is Gaussian. This in turn interprets and justifies the widely adopted Gaussian exploration in RL, beyond its simplicity for sampling. Moreover, the exploitation and exploration are captured, respectively and mutual-exclusively, by the mean and variance of the Gaussian distribution. We also find that a more random environment contains more learning opportunities in the sense that less exploration is needed. We characterize the cost of exploration, which, for the LQ case, is shown to be proportional to the entropy regularization weight and inversely proportional to the discount rate. Finally, as the weight of exploration decays to zero, we prove the convergence of the solution of the entropy-regularized LQ problem to the one of the classical LQ problem." @default.
- W2902037383 created "2018-12-11" @default.
- W2902037383 creator A5004405673 @default.
- W2902037383 creator A5057769019 @default.
- W2902037383 creator A5066116626 @default.
- W2902037383 date "2018-12-04" @default.
- W2902037383 modified "2023-09-26" @default.
- W2902037383 title "Exploration versus exploitation in reinforcement learning: a stochastic control approach" @default.
- W2902037383 cites W1505937442 @default.
- W2902037383 cites W1514171990 @default.
- W2902037383 cites W1580561995 @default.
- W2902037383 cites W1607065182 @default.
- W2902037383 cites W1988526405 @default.
- W2902037383 cites W2039522160 @default.
- W2902037383 cites W2041404167 @default.
- W2902037383 cites W2049110942 @default.
- W2902037383 cites W2066557929 @default.
- W2902037383 cites W2068098627 @default.
- W2902037383 cites W2098774185 @default.
- W2902037383 cites W2107726111 @default.
- W2902037383 cites W2113501460 @default.
- W2902037383 cites W2116659221 @default.
- W2902037383 cites W2121863487 @default.
- W2902037383 cites W2123447947 @default.
- W2902037383 cites W2129239159 @default.
- W2902037383 cites W2145339207 @default.
- W2902037383 cites W2149721706 @default.
- W2902037383 cites W2150339816 @default.
- W2902037383 cites W2163840227 @default.
- W2902037383 cites W2165131254 @default.
- W2902037383 cites W2167856595 @default.
- W2902037383 cites W2257979135 @default.
- W2902037383 cites W2593044849 @default.
- W2902037383 cites W2594103415 @default.
- W2902037383 cites W2623491082 @default.
- W2902037383 cites W2766447205 @default.
- W2902037383 cites W2781726626 @default.
- W2902037383 cites W2951222758 @default.
- W2902037383 cites W2952791429 @default.
- W2902037383 cites W2963267001 @default.
- W2902037383 cites W2963864421 @default.
- W2902037383 cites W2964161785 @default.
- W2902037383 doi "https://doi.org/10.48550/arxiv.1812.01552" @default.
- W2902037383 hasPublicationYear "2018" @default.
- W2902037383 type Work @default.
- W2902037383 sameAs 2902037383 @default.
- W2902037383 citedByCount "9" @default.
- W2902037383 countsByYear W29020373832019 @default.
- W2902037383 countsByYear W29020373832020 @default.
- W2902037383 countsByYear W29020373832021 @default.
- W2902037383 crossrefType "posted-content" @default.
- W2902037383 hasAuthorship W2902037383A5004405673 @default.
- W2902037383 hasAuthorship W2902037383A5057769019 @default.
- W2902037383 hasAuthorship W2902037383A5066116626 @default.
- W2902037383 hasBestOaLocation W29020373831 @default.
- W2902037383 hasConcept C106301342 @default.
- W2902037383 hasConcept C121332964 @default.
- W2902037383 hasConcept C126255220 @default.
- W2902037383 hasConcept C129844170 @default.
- W2902037383 hasConcept C154945302 @default.
- W2902037383 hasConcept C163716315 @default.
- W2902037383 hasConcept C2524010 @default.
- W2902037383 hasConcept C2776135515 @default.
- W2902037383 hasConcept C28826006 @default.
- W2902037383 hasConcept C33923547 @default.
- W2902037383 hasConcept C41008148 @default.
- W2902037383 hasConcept C62520636 @default.
- W2902037383 hasConcept C91575142 @default.
- W2902037383 hasConcept C97541855 @default.
- W2902037383 hasConceptScore W2902037383C106301342 @default.
- W2902037383 hasConceptScore W2902037383C121332964 @default.
- W2902037383 hasConceptScore W2902037383C126255220 @default.
- W2902037383 hasConceptScore W2902037383C129844170 @default.
- W2902037383 hasConceptScore W2902037383C154945302 @default.
- W2902037383 hasConceptScore W2902037383C163716315 @default.
- W2902037383 hasConceptScore W2902037383C2524010 @default.
- W2902037383 hasConceptScore W2902037383C2776135515 @default.
- W2902037383 hasConceptScore W2902037383C28826006 @default.
- W2902037383 hasConceptScore W2902037383C33923547 @default.
- W2902037383 hasConceptScore W2902037383C41008148 @default.
- W2902037383 hasConceptScore W2902037383C62520636 @default.
- W2902037383 hasConceptScore W2902037383C91575142 @default.
- W2902037383 hasConceptScore W2902037383C97541855 @default.
- W2902037383 hasLocation W29020373831 @default.
- W2902037383 hasOpenAccess W2902037383 @default.
- W2902037383 hasPrimaryLocation W29020373831 @default.
- W2902037383 hasRelatedWork W1655909868 @default.
- W2902037383 hasRelatedWork W1836611381 @default.
- W2902037383 hasRelatedWork W1996639363 @default.
- W2902037383 hasRelatedWork W2060032448 @default.
- W2902037383 hasRelatedWork W2610900671 @default.
- W2902037383 hasRelatedWork W2953159110 @default.
- W2902037383 hasRelatedWork W2963530777 @default.
- W2902037383 hasRelatedWork W3099634386 @default.
- W2902037383 hasRelatedWork W4287867006 @default.
- W2902037383 hasRelatedWork W4299881944 @default.
- W2902037383 isParatext "false" @default.
- W2902037383 isRetracted "false" @default.
- W2902037383 magId "2902037383" @default.
- W2902037383 workType "article" @default.