Matches in SemOpenAlex for { <https://semopenalex.org/work/W3174540892> ?p ?o ?g. }
Showing items 1 to 95 of
95
with 100 items per page.
- W3174540892 abstract "Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed. However, previous work only study the offline setting where problem dynamics are known. We address this restrictive assumption, designing the first algorithms for learning good policies for Multi-action RMABs online using combinations of Lagrangian relaxation and Q-learning. Our first approach, MAIQL, extends a method for Q-learning the Whittle index in binary-action RMABs to the multi-action setting. We derive a generalized update rule and convergence proof and establish that, under standard assumptions, MAIQL converges to the asymptotically optimal multi-action RMAB policy as $trightarrow{}infty$. However, MAIQL relies on learning Q-functions and indexes on two timescales which leads to slow convergence and requires problem structure to perform well. Thus, we design a second algorithm, LPQL, which learns the well-performing and more general Lagrange policy for multi-action RMABs by learning to minimize the Lagrange bound through a variant of Q-learning. To ensure fast convergence, we take an approximation strategy that enables learning on a single timescale, then give a guarantee relating the approximation's precision to an upper bound of LPQL's return as $trightarrow{}infty$. Finally, we show that our approaches always outperform baselines across multiple settings, including one derived from real-world medication adherence data." @default.
- W3174540892 created "2021-07-05" @default.
- W3174540892 creator A5000327528 @default.
- W3174540892 creator A5006107552 @default.
- W3174540892 creator A5071925175 @default.
- W3174540892 creator A5083581537 @default.
- W3174540892 date "2021-08-14" @default.
- W3174540892 modified "2023-10-17" @default.
- W3174540892 title "Q-Learning Lagrange Policies for Multi-Action Restless Bandits" @default.
- W3174540892 cites W1981704813 @default.
- W3174540892 cites W2003798390 @default.
- W3174540892 cites W2044069028 @default.
- W3174540892 cites W2044502527 @default.
- W3174540892 cites W2049216612 @default.
- W3174540892 cites W2056921512 @default.
- W3174540892 cites W2076556053 @default.
- W3174540892 cites W2115138651 @default.
- W3174540892 cites W2141515329 @default.
- W3174540892 cites W2154204727 @default.
- W3174540892 cites W2184204218 @default.
- W3174540892 cites W2809112007 @default.
- W3174540892 cites W2809254427 @default.
- W3174540892 cites W2883807213 @default.
- W3174540892 cites W3008966844 @default.
- W3174540892 cites W3011214669 @default.
- W3174540892 cites W3099991607 @default.
- W3174540892 cites W3103744592 @default.
- W3174540892 cites W3189801773 @default.
- W3174540892 doi "https://doi.org/10.1145/3447548.3467370" @default.
- W3174540892 hasPublicationYear "2021" @default.
- W3174540892 type Work @default.
- W3174540892 sameAs 3174540892 @default.
- W3174540892 citedByCount "1" @default.
- W3174540892 countsByYear W31745408922023 @default.
- W3174540892 crossrefType "proceedings-article" @default.
- W3174540892 hasAuthorship W3174540892A5000327528 @default.
- W3174540892 hasAuthorship W3174540892A5006107552 @default.
- W3174540892 hasAuthorship W3174540892A5071925175 @default.
- W3174540892 hasAuthorship W3174540892A5083581537 @default.
- W3174540892 hasBestOaLocation W31745408922 @default.
- W3174540892 hasConcept C11413529 @default.
- W3174540892 hasConcept C121332964 @default.
- W3174540892 hasConcept C126255220 @default.
- W3174540892 hasConcept C134306372 @default.
- W3174540892 hasConcept C154945302 @default.
- W3174540892 hasConcept C15744967 @default.
- W3174540892 hasConcept C162324750 @default.
- W3174540892 hasConcept C2776029896 @default.
- W3174540892 hasConcept C2777303404 @default.
- W3174540892 hasConcept C2780791683 @default.
- W3174540892 hasConcept C28826006 @default.
- W3174540892 hasConcept C33923547 @default.
- W3174540892 hasConcept C41008148 @default.
- W3174540892 hasConcept C50522688 @default.
- W3174540892 hasConcept C62520636 @default.
- W3174540892 hasConcept C77553402 @default.
- W3174540892 hasConcept C77805123 @default.
- W3174540892 hasConcept C91765299 @default.
- W3174540892 hasConceptScore W3174540892C11413529 @default.
- W3174540892 hasConceptScore W3174540892C121332964 @default.
- W3174540892 hasConceptScore W3174540892C126255220 @default.
- W3174540892 hasConceptScore W3174540892C134306372 @default.
- W3174540892 hasConceptScore W3174540892C154945302 @default.
- W3174540892 hasConceptScore W3174540892C15744967 @default.
- W3174540892 hasConceptScore W3174540892C162324750 @default.
- W3174540892 hasConceptScore W3174540892C2776029896 @default.
- W3174540892 hasConceptScore W3174540892C2777303404 @default.
- W3174540892 hasConceptScore W3174540892C2780791683 @default.
- W3174540892 hasConceptScore W3174540892C28826006 @default.
- W3174540892 hasConceptScore W3174540892C33923547 @default.
- W3174540892 hasConceptScore W3174540892C41008148 @default.
- W3174540892 hasConceptScore W3174540892C50522688 @default.
- W3174540892 hasConceptScore W3174540892C62520636 @default.
- W3174540892 hasConceptScore W3174540892C77553402 @default.
- W3174540892 hasConceptScore W3174540892C77805123 @default.
- W3174540892 hasConceptScore W3174540892C91765299 @default.
- W3174540892 hasLocation W31745408921 @default.
- W3174540892 hasLocation W31745408922 @default.
- W3174540892 hasLocation W31745408923 @default.
- W3174540892 hasOpenAccess W3174540892 @default.
- W3174540892 hasPrimaryLocation W31745408921 @default.
- W3174540892 hasRelatedWork W1992701192 @default.
- W3174540892 hasRelatedWork W2002102264 @default.
- W3174540892 hasRelatedWork W2075228635 @default.
- W3174540892 hasRelatedWork W2155100848 @default.
- W3174540892 hasRelatedWork W2780775897 @default.
- W3174540892 hasRelatedWork W3097196196 @default.
- W3174540892 hasRelatedWork W3149469165 @default.
- W3174540892 hasRelatedWork W3183857959 @default.
- W3174540892 hasRelatedWork W4237964977 @default.
- W3174540892 hasRelatedWork W4243844638 @default.
- W3174540892 isParatext "false" @default.
- W3174540892 isRetracted "false" @default.
- W3174540892 magId "3174540892" @default.
- W3174540892 workType "article" @default.