Matches in SemOpenAlex for { <https://semopenalex.org/work/W100039866> ?p ?o ?g. }
- W100039866 endingPage "243" @default.
- W100039866 startingPage "231" @default.
- W100039866 abstract "We consider a stochastic extension of the loop-free shortest path problem with adversarial rewards. In this episodicMarkov decision problem an agent traverses through an acyclic graph with random transitions: at each step of an episode the agent chooses an action, receives some reward, and arrives at a random next state, where the reward and the distribution of the next state depend on the actual state and the chosen action. We consider the bandit situation when only the reward of the just visited state-action pair is revealed to the agent. For this problem we develop algorithms that perform asymptotically as well as the best stationary policy in hindsight. Assuming that all states are reachable with probability a>0 under all policies, we give an algorithm and prove that its regret is O(L^2 sqrt(T|A|)/a), where T is the number of episodes, A denotes the (finite) set of actions, and L is the length of the longest path in the graph. Variants of the algorithm are given that improve the dependence on the transition probabilities under specific conditions. The results are also extended to variations of the problem, including the case when the agent competes with time varying policies." @default.
- W100039866 created "2016-06-24" @default.
- W100039866 creator A5041609088 @default.
- W100039866 creator A5069856068 @default.
- W100039866 creator A5077167635 @default.
- W100039866 date "2010-01-01" @default.
- W100039866 modified "2023-10-05" @default.
- W100039866 title "The Online Loop-free Stochastic Shortest-Path Problem." @default.
- W100039866 cites W1528133536 @default.
- W100039866 cites W1570963478 @default.
- W100039866 cites W1575658237 @default.
- W100039866 cites W1576452626 @default.
- W100039866 cites W1600338740 @default.
- W100039866 cites W1970041563 @default.
- W100039866 cites W1988790447 @default.
- W100039866 cites W2014482607 @default.
- W100039866 cites W2016041712 @default.
- W100039866 cites W2055639053 @default.
- W100039866 cites W2074680702 @default.
- W100039866 cites W2077902449 @default.
- W100039866 cites W2093825590 @default.
- W100039866 cites W2102179694 @default.
- W100039866 cites W2116067849 @default.
- W100039866 cites W2119567691 @default.
- W100039866 cites W2120745256 @default.
- W100039866 cites W2121863487 @default.
- W100039866 cites W2122187689 @default.
- W100039866 cites W2135829225 @default.
- W100039866 cites W2138637686 @default.
- W100039866 cites W2149988068 @default.
- W100039866 cites W2156211713 @default.
- W100039866 cites W2158455534 @default.
- W100039866 cites W2162926979 @default.
- W100039866 cites W2611627047 @default.
- W100039866 hasPublicationYear "2010" @default.
- W100039866 type Work @default.
- W100039866 sameAs 100039866 @default.
- W100039866 citedByCount "25" @default.
- W100039866 countsByYear W1000398662012 @default.
- W100039866 countsByYear W1000398662013 @default.
- W100039866 countsByYear W1000398662014 @default.
- W100039866 countsByYear W1000398662015 @default.
- W100039866 countsByYear W1000398662017 @default.
- W100039866 countsByYear W1000398662019 @default.
- W100039866 countsByYear W1000398662020 @default.
- W100039866 countsByYear W1000398662021 @default.
- W100039866 countsByYear W1000398662022 @default.
- W100039866 crossrefType "proceedings-article" @default.
- W100039866 hasAuthorship W100039866A5041609088 @default.
- W100039866 hasAuthorship W100039866A5069856068 @default.
- W100039866 hasAuthorship W100039866A5077167635 @default.
- W100039866 hasConcept C10347200 @default.
- W100039866 hasConcept C105795698 @default.
- W100039866 hasConcept C11413529 @default.
- W100039866 hasConcept C114614502 @default.
- W100039866 hasConcept C118615104 @default.
- W100039866 hasConcept C121332964 @default.
- W100039866 hasConcept C126255220 @default.
- W100039866 hasConcept C132525143 @default.
- W100039866 hasConcept C15744967 @default.
- W100039866 hasConcept C180747234 @default.
- W100039866 hasConcept C181789720 @default.
- W100039866 hasConcept C199360897 @default.
- W100039866 hasConcept C22590252 @default.
- W100039866 hasConcept C2777735758 @default.
- W100039866 hasConcept C2780791683 @default.
- W100039866 hasConcept C33923547 @default.
- W100039866 hasConcept C41008148 @default.
- W100039866 hasConcept C48103436 @default.
- W100039866 hasConcept C50817715 @default.
- W100039866 hasConcept C62520636 @default.
- W100039866 hasConceptScore W100039866C10347200 @default.
- W100039866 hasConceptScore W100039866C105795698 @default.
- W100039866 hasConceptScore W100039866C11413529 @default.
- W100039866 hasConceptScore W100039866C114614502 @default.
- W100039866 hasConceptScore W100039866C118615104 @default.
- W100039866 hasConceptScore W100039866C121332964 @default.
- W100039866 hasConceptScore W100039866C126255220 @default.
- W100039866 hasConceptScore W100039866C132525143 @default.
- W100039866 hasConceptScore W100039866C15744967 @default.
- W100039866 hasConceptScore W100039866C180747234 @default.
- W100039866 hasConceptScore W100039866C181789720 @default.
- W100039866 hasConceptScore W100039866C199360897 @default.
- W100039866 hasConceptScore W100039866C22590252 @default.
- W100039866 hasConceptScore W100039866C2777735758 @default.
- W100039866 hasConceptScore W100039866C2780791683 @default.
- W100039866 hasConceptScore W100039866C33923547 @default.
- W100039866 hasConceptScore W100039866C41008148 @default.
- W100039866 hasConceptScore W100039866C48103436 @default.
- W100039866 hasConceptScore W100039866C50817715 @default.
- W100039866 hasConceptScore W100039866C62520636 @default.
- W100039866 hasLocation W1000398661 @default.
- W100039866 hasOpenAccess W100039866 @default.
- W100039866 hasPrimaryLocation W1000398661 @default.
- W100039866 hasRelatedWork W1570963478 @default.
- W100039866 hasRelatedWork W1662803991 @default.
- W100039866 hasRelatedWork W1849095486 @default.
- W100039866 hasRelatedWork W1850488217 @default.