Matches in SemOpenAlex for { <https://semopenalex.org/work/W4321593961> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W4321593961 abstract "In this paper, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and death structure. Specifically, we consider a controlled queue with impatient jobs and the main objective is to optimize a trade-off between energy consumption and user-perceived performance. Within this setting, the emph{diameter} $D$ of the MDP is $Omega(S^S)$, where $S$ is the number of states. Therefore, the existing lower and upper bounds on the regret at time$T$, of order $O(sqrt{DSAT})$ for MDPs with $S$ states and $A$ actions, may suggest that reinforcement learning is inefficient here. In our main result however, we exploit the structure of our MDPs to show that the regret of a slightly-tweaked version of the classical learning algorithm {sc Ucrl2} is in fact upper bounded by $tilde{mathcal{O}}(sqrt{E_2AT})$ where $E_2$ is related to the weighted second moment of the stationary measure of a reference policy. Importantly, $E_2$ is bounded independently of $S$. Thus, our bound is asymptotically independent of the number of states and of the diameter. This result is based on a careful study of the number of visits performed by the learning algorithm to the states of the MDP, which is highly non-uniform." @default.
- W4321593961 created "2023-02-24" @default.
- W4321593961 creator A5010122736 @default.
- W4321593961 creator A5073088518 @default.
- W4321593961 creator A5089464465 @default.
- W4321593961 date "2022-11-28" @default.
- W4321593961 modified "2023-09-27" @default.
- W4321593961 title "Reinforcement Learning in a Birth and Death Process: Breaking the Dependence on the State Space" @default.
- W4321593961 hasPublicationYear "2022" @default.
- W4321593961 type Work @default.
- W4321593961 citedByCount "0" @default.
- W4321593961 crossrefType "proceedings-article" @default.
- W4321593961 hasAuthorship W4321593961A5010122736 @default.
- W4321593961 hasAuthorship W4321593961A5073088518 @default.
- W4321593961 hasAuthorship W4321593961A5089464465 @default.
- W4321593961 hasBestOaLocation W43215939611 @default.
- W4321593961 hasConcept C10138342 @default.
- W4321593961 hasConcept C105795698 @default.
- W4321593961 hasConcept C106189395 @default.
- W4321593961 hasConcept C111919701 @default.
- W4321593961 hasConcept C11413529 @default.
- W4321593961 hasConcept C114614502 @default.
- W4321593961 hasConcept C118615104 @default.
- W4321593961 hasConcept C121332964 @default.
- W4321593961 hasConcept C134306372 @default.
- W4321593961 hasConcept C154945302 @default.
- W4321593961 hasConcept C159886148 @default.
- W4321593961 hasConcept C160403385 @default.
- W4321593961 hasConcept C162324750 @default.
- W4321593961 hasConcept C182306322 @default.
- W4321593961 hasConcept C199360897 @default.
- W4321593961 hasConcept C22684755 @default.
- W4321593961 hasConcept C2778572836 @default.
- W4321593961 hasConcept C2779557605 @default.
- W4321593961 hasConcept C33923547 @default.
- W4321593961 hasConcept C34388435 @default.
- W4321593961 hasConcept C41008148 @default.
- W4321593961 hasConcept C48103436 @default.
- W4321593961 hasConcept C50817715 @default.
- W4321593961 hasConcept C62520636 @default.
- W4321593961 hasConcept C72434380 @default.
- W4321593961 hasConcept C77553402 @default.
- W4321593961 hasConcept C97541855 @default.
- W4321593961 hasConceptScore W4321593961C10138342 @default.
- W4321593961 hasConceptScore W4321593961C105795698 @default.
- W4321593961 hasConceptScore W4321593961C106189395 @default.
- W4321593961 hasConceptScore W4321593961C111919701 @default.
- W4321593961 hasConceptScore W4321593961C11413529 @default.
- W4321593961 hasConceptScore W4321593961C114614502 @default.
- W4321593961 hasConceptScore W4321593961C118615104 @default.
- W4321593961 hasConceptScore W4321593961C121332964 @default.
- W4321593961 hasConceptScore W4321593961C134306372 @default.
- W4321593961 hasConceptScore W4321593961C154945302 @default.
- W4321593961 hasConceptScore W4321593961C159886148 @default.
- W4321593961 hasConceptScore W4321593961C160403385 @default.
- W4321593961 hasConceptScore W4321593961C162324750 @default.
- W4321593961 hasConceptScore W4321593961C182306322 @default.
- W4321593961 hasConceptScore W4321593961C199360897 @default.
- W4321593961 hasConceptScore W4321593961C22684755 @default.
- W4321593961 hasConceptScore W4321593961C2778572836 @default.
- W4321593961 hasConceptScore W4321593961C2779557605 @default.
- W4321593961 hasConceptScore W4321593961C33923547 @default.
- W4321593961 hasConceptScore W4321593961C34388435 @default.
- W4321593961 hasConceptScore W4321593961C41008148 @default.
- W4321593961 hasConceptScore W4321593961C48103436 @default.
- W4321593961 hasConceptScore W4321593961C50817715 @default.
- W4321593961 hasConceptScore W4321593961C62520636 @default.
- W4321593961 hasConceptScore W4321593961C72434380 @default.
- W4321593961 hasConceptScore W4321593961C77553402 @default.
- W4321593961 hasConceptScore W4321593961C97541855 @default.
- W4321593961 hasLocation W43215939611 @default.
- W4321593961 hasLocation W43215939612 @default.
- W4321593961 hasOpenAccess W4321593961 @default.
- W4321593961 hasPrimaryLocation W43215939611 @default.
- W4321593961 hasRelatedWork W2132908009 @default.
- W4321593961 hasRelatedWork W2465145931 @default.
- W4321593961 hasRelatedWork W2512014291 @default.
- W4321593961 hasRelatedWork W2803304514 @default.
- W4321593961 hasRelatedWork W2945119207 @default.
- W4321593961 hasRelatedWork W2966613056 @default.
- W4321593961 hasRelatedWork W2985982678 @default.
- W4321593961 hasRelatedWork W2992629954 @default.
- W4321593961 hasRelatedWork W3091875946 @default.
- W4321593961 hasRelatedWork W3092949629 @default.
- W4321593961 isParatext "false" @default.
- W4321593961 isRetracted "false" @default.
- W4321593961 workType "article" @default.