Matches in SemOpenAlex for { <https://semopenalex.org/work/W3048454540> ?p ?o ?g. }
- W3048454540 abstract "The standard feedback model of reinforcement learning requires revealing the reward of every visited state-action pair. However, in practice, it is often the case that such frequent feedback is not available. In this work, we take a first step towards relaxing this assumption and require a weaker form of feedback, which we refer to as emph{trajectory feedback}. Instead of observing the reward obtained after every action, we assume we only receive a score that represents the quality of the whole trajectory observed by the agent, namely, the sum of all rewards obtained over this trajectory. We extend reinforcement learning algorithms to this setting, based on least-squares estimation of the unknown reward, for both the known and unknown transition model cases, and study the performance of these algorithms by analyzing their regret. For cases where the transition model is unknown, we offer a hybrid optimistic-Thompson Sampling approach that results in a tractable algorithm." @default.
- W3048454540 created "2020-08-18" @default.
- W3048454540 creator A5018784842 @default.
- W3048454540 creator A5036260775 @default.
- W3048454540 creator A5090891199 @default.
- W3048454540 date "2020-08-13" @default.
- W3048454540 modified "2023-09-27" @default.
- W3048454540 title "Reinforcement Learning with Trajectory Feedback" @default.
- W3048454540 cites W1487320471 @default.
- W3048454540 cites W1518931405 @default.
- W3048454540 cites W1560153690 @default.
- W3048454540 cites W1850488217 @default.
- W3048454540 cites W1968018625 @default.
- W3048454540 cites W2019029889 @default.
- W3048454540 cites W2103715332 @default.
- W3048454540 cites W2111764152 @default.
- W3048454540 cites W2119567691 @default.
- W3048454540 cites W2119717200 @default.
- W3048454540 cites W2119738618 @default.
- W3048454540 cites W2145339207 @default.
- W3048454540 cites W2150234726 @default.
- W3048454540 cites W2160163723 @default.
- W3048454540 cites W2166253248 @default.
- W3048454540 cites W21934178 @default.
- W3048454540 cites W2518564545 @default.
- W3048454540 cites W2585536612 @default.
- W3048454540 cites W2766447205 @default.
- W3048454540 cites W2773557179 @default.
- W3048454540 cites W2907502549 @default.
- W3048454540 cites W2946284958 @default.
- W3048454540 cites W2963049774 @default.
- W3048454540 cites W2963158178 @default.
- W3048454540 cites W2963490519 @default.
- W3048454540 cites W2963582321 @default.
- W3048454540 cites W2963603291 @default.
- W3048454540 cites W2964054583 @default.
- W3048454540 cites W2964161785 @default.
- W3048454540 cites W2964299116 @default.
- W3048454540 cites W2970770768 @default.
- W3048454540 cites W2970870329 @default.
- W3048454540 cites W2971249033 @default.
- W3048454540 cites W2991935368 @default.
- W3048454540 cites W3037983390 @default.
- W3048454540 cites W3046626913 @default.
- W3048454540 cites W50486269 @default.
- W3048454540 hasPublicationYear "2020" @default.
- W3048454540 type Work @default.
- W3048454540 sameAs 3048454540 @default.
- W3048454540 citedByCount "0" @default.
- W3048454540 crossrefType "posted-content" @default.
- W3048454540 hasAuthorship W3048454540A5018784842 @default.
- W3048454540 hasAuthorship W3048454540A5036260775 @default.
- W3048454540 hasAuthorship W3048454540A5090891199 @default.
- W3048454540 hasConcept C11413529 @default.
- W3048454540 hasConcept C119857082 @default.
- W3048454540 hasConcept C121332964 @default.
- W3048454540 hasConcept C1276947 @default.
- W3048454540 hasConcept C13662910 @default.
- W3048454540 hasConcept C154945302 @default.
- W3048454540 hasConcept C15744967 @default.
- W3048454540 hasConcept C2780791683 @default.
- W3048454540 hasConcept C41008148 @default.
- W3048454540 hasConcept C48103436 @default.
- W3048454540 hasConcept C50817715 @default.
- W3048454540 hasConcept C62520636 @default.
- W3048454540 hasConcept C67203356 @default.
- W3048454540 hasConcept C77805123 @default.
- W3048454540 hasConcept C97541855 @default.
- W3048454540 hasConceptScore W3048454540C11413529 @default.
- W3048454540 hasConceptScore W3048454540C119857082 @default.
- W3048454540 hasConceptScore W3048454540C121332964 @default.
- W3048454540 hasConceptScore W3048454540C1276947 @default.
- W3048454540 hasConceptScore W3048454540C13662910 @default.
- W3048454540 hasConceptScore W3048454540C154945302 @default.
- W3048454540 hasConceptScore W3048454540C15744967 @default.
- W3048454540 hasConceptScore W3048454540C2780791683 @default.
- W3048454540 hasConceptScore W3048454540C41008148 @default.
- W3048454540 hasConceptScore W3048454540C48103436 @default.
- W3048454540 hasConceptScore W3048454540C50817715 @default.
- W3048454540 hasConceptScore W3048454540C62520636 @default.
- W3048454540 hasConceptScore W3048454540C67203356 @default.
- W3048454540 hasConceptScore W3048454540C77805123 @default.
- W3048454540 hasConceptScore W3048454540C97541855 @default.
- W3048454540 hasLocation W30484545401 @default.
- W3048454540 hasOpenAccess W3048454540 @default.
- W3048454540 hasPrimaryLocation W30484545401 @default.
- W3048454540 hasRelatedWork W1812824539 @default.
- W3048454540 hasRelatedWork W1982948368 @default.
- W3048454540 hasRelatedWork W1999874108 @default.
- W3048454540 hasRelatedWork W2057050711 @default.
- W3048454540 hasRelatedWork W2097031964 @default.
- W3048454540 hasRelatedWork W2109169869 @default.
- W3048454540 hasRelatedWork W2117626647 @default.
- W3048454540 hasRelatedWork W2435367904 @default.
- W3048454540 hasRelatedWork W2576787642 @default.
- W3048454540 hasRelatedWork W2890882194 @default.
- W3048454540 hasRelatedWork W2915060045 @default.
- W3048454540 hasRelatedWork W2945884844 @default.
- W3048454540 hasRelatedWork W3035599863 @default.
- W3048454540 hasRelatedWork W3107949129 @default.