Matches in SemOpenAlex for { <https://semopenalex.org/work/W4287690869> ?p ?o ?g. }
Showing items 1 to 61 of
61
with 100 items per page.
- W4287690869 abstract "We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. For example, in large problems, or when the interaction with the environment is brief, finding an optimal arm is infeasible, and regret-minimizing algorithms tend to over-explore. To overcome this issue, algorithms for such settings should instead focus on playing near-optimal arms. To this end, we suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$. We then present a variant of the Thompson Sampling (TS) algorithm, called $epsilon$-TS, and prove its asymptotic optimality in terms of the lenient regret. Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of $epsilon$-TS is bounded by a constant. Finally, we show that $epsilon$-TS can be applied to improve the performance when the agent knows a lower bound of the suboptimality gaps." @default.
- W4287690869 created "2022-07-26" @default.
- W4287690869 creator A5018784842 @default.
- W4287690869 creator A5036260775 @default.
- W4287690869 date "2020-08-10" @default.
- W4287690869 modified "2023-09-26" @default.
- W4287690869 title "Lenient Regret for Multi-Armed Bandits" @default.
- W4287690869 doi "https://doi.org/10.48550/arxiv.2008.03959" @default.
- W4287690869 hasPublicationYear "2020" @default.
- W4287690869 type Work @default.
- W4287690869 citedByCount "0" @default.
- W4287690869 crossrefType "posted-content" @default.
- W4287690869 hasAuthorship W4287690869A5018784842 @default.
- W4287690869 hasAuthorship W4287690869A5036260775 @default.
- W4287690869 hasBestOaLocation W42876908691 @default.
- W4287690869 hasConcept C119857082 @default.
- W4287690869 hasConcept C121332964 @default.
- W4287690869 hasConcept C126255220 @default.
- W4287690869 hasConcept C134306372 @default.
- W4287690869 hasConcept C144237770 @default.
- W4287690869 hasConcept C199360897 @default.
- W4287690869 hasConcept C2777027219 @default.
- W4287690869 hasConcept C2780791683 @default.
- W4287690869 hasConcept C33923547 @default.
- W4287690869 hasConcept C34388435 @default.
- W4287690869 hasConcept C41008148 @default.
- W4287690869 hasConcept C50817715 @default.
- W4287690869 hasConcept C62520636 @default.
- W4287690869 hasConcept C73602740 @default.
- W4287690869 hasConcept C77553402 @default.
- W4287690869 hasConceptScore W4287690869C119857082 @default.
- W4287690869 hasConceptScore W4287690869C121332964 @default.
- W4287690869 hasConceptScore W4287690869C126255220 @default.
- W4287690869 hasConceptScore W4287690869C134306372 @default.
- W4287690869 hasConceptScore W4287690869C144237770 @default.
- W4287690869 hasConceptScore W4287690869C199360897 @default.
- W4287690869 hasConceptScore W4287690869C2777027219 @default.
- W4287690869 hasConceptScore W4287690869C2780791683 @default.
- W4287690869 hasConceptScore W4287690869C33923547 @default.
- W4287690869 hasConceptScore W4287690869C34388435 @default.
- W4287690869 hasConceptScore W4287690869C41008148 @default.
- W4287690869 hasConceptScore W4287690869C50817715 @default.
- W4287690869 hasConceptScore W4287690869C62520636 @default.
- W4287690869 hasConceptScore W4287690869C73602740 @default.
- W4287690869 hasConceptScore W4287690869C77553402 @default.
- W4287690869 hasLocation W42876908691 @default.
- W4287690869 hasOpenAccess W4287690869 @default.
- W4287690869 hasPrimaryLocation W42876908691 @default.
- W4287690869 hasRelatedWork W1600255059 @default.
- W4287690869 hasRelatedWork W2139129601 @default.
- W4287690869 hasRelatedWork W2141645258 @default.
- W4287690869 hasRelatedWork W2738218455 @default.
- W4287690869 hasRelatedWork W2752599163 @default.
- W4287690869 hasRelatedWork W2951802169 @default.
- W4287690869 hasRelatedWork W2963382851 @default.
- W4287690869 hasRelatedWork W3002095816 @default.
- W4287690869 hasRelatedWork W4318620749 @default.
- W4287690869 hasRelatedWork W4367190832 @default.
- W4287690869 isParatext "false" @default.
- W4287690869 isRetracted "false" @default.
- W4287690869 workType "article" @default.