Matches in SemOpenAlex for { <https://semopenalex.org/work/W2904232271> ?p ?o ?g. }
Showing items 1 to 80 of
80
with 100 items per page.
- W2904232271 endingPage "6119" @default.
- W2904232271 startingPage "6112" @default.
- W2904232271 abstract "Imitation learning has been widely used to speed up learning in novice agents, by allowing them to leverage existing data from experts. Allowing an agent to be influenced by external observations can benefit to the learning process, but it also puts the agent at risk of following sub-optimal behaviours. In this paper, we study this problem in the context of bandits. More specifically, we consider that an agent (learner) is interacting with a bandit-style decision task, but can also observe a target policy interacting with the same environment. The learner observes only the target’s actions, not the rewards obtained. We introduce a new bandit optimism modifier that uses conditional optimism contingent on the actions of the target in order to guide the agent’s exploration. We analyze the effect of this modification on the well-known Upper Confidence Bound algorithm by proving that it preserves a regret upper-bound of order O(lnT), even in the presence of a very poor target, and we derive the dependency of the expected regret on the general target policy. We provide empirical results showing both great benefits as well as certain limitations inherent to observational learning in the multi-armed bandit setting. Experiments are conducted using targets satisfying theoretical assumptions with high probability, thus narrowing the gap between theory and application." @default.
- W2904232271 created "2018-12-22" @default.
- W2904232271 creator A5007600034 @default.
- W2904232271 creator A5065836447 @default.
- W2904232271 creator A5069876930 @default.
- W2904232271 date "2019-07-17" @default.
- W2904232271 modified "2023-10-16" @default.
- W2904232271 title "Leveraging Observations in Bandits: Between Risks and Benefits" @default.
- W2904232271 doi "https://doi.org/10.1609/aaai.v33i01.33016112" @default.
- W2904232271 hasPublicationYear "2019" @default.
- W2904232271 type Work @default.
- W2904232271 sameAs 2904232271 @default.
- W2904232271 citedByCount "4" @default.
- W2904232271 countsByYear W29042322712019 @default.
- W2904232271 countsByYear W29042322712020 @default.
- W2904232271 countsByYear W29042322712022 @default.
- W2904232271 crossrefType "journal-article" @default.
- W2904232271 hasAuthorship W2904232271A5007600034 @default.
- W2904232271 hasAuthorship W2904232271A5065836447 @default.
- W2904232271 hasAuthorship W2904232271A5069876930 @default.
- W2904232271 hasBestOaLocation W29042322711 @default.
- W2904232271 hasConcept C10138342 @default.
- W2904232271 hasConcept C119857082 @default.
- W2904232271 hasConcept C123197309 @default.
- W2904232271 hasConcept C151730666 @default.
- W2904232271 hasConcept C153083717 @default.
- W2904232271 hasConcept C154945302 @default.
- W2904232271 hasConcept C15744967 @default.
- W2904232271 hasConcept C162324750 @default.
- W2904232271 hasConcept C182306322 @default.
- W2904232271 hasConcept C187736073 @default.
- W2904232271 hasConcept C19768560 @default.
- W2904232271 hasConcept C204017024 @default.
- W2904232271 hasConcept C2779343474 @default.
- W2904232271 hasConcept C2780451532 @default.
- W2904232271 hasConcept C41008148 @default.
- W2904232271 hasConcept C50817715 @default.
- W2904232271 hasConcept C73602740 @default.
- W2904232271 hasConcept C77805123 @default.
- W2904232271 hasConcept C86803240 @default.
- W2904232271 hasConceptScore W2904232271C10138342 @default.
- W2904232271 hasConceptScore W2904232271C119857082 @default.
- W2904232271 hasConceptScore W2904232271C123197309 @default.
- W2904232271 hasConceptScore W2904232271C151730666 @default.
- W2904232271 hasConceptScore W2904232271C153083717 @default.
- W2904232271 hasConceptScore W2904232271C154945302 @default.
- W2904232271 hasConceptScore W2904232271C15744967 @default.
- W2904232271 hasConceptScore W2904232271C162324750 @default.
- W2904232271 hasConceptScore W2904232271C182306322 @default.
- W2904232271 hasConceptScore W2904232271C187736073 @default.
- W2904232271 hasConceptScore W2904232271C19768560 @default.
- W2904232271 hasConceptScore W2904232271C204017024 @default.
- W2904232271 hasConceptScore W2904232271C2779343474 @default.
- W2904232271 hasConceptScore W2904232271C2780451532 @default.
- W2904232271 hasConceptScore W2904232271C41008148 @default.
- W2904232271 hasConceptScore W2904232271C50817715 @default.
- W2904232271 hasConceptScore W2904232271C73602740 @default.
- W2904232271 hasConceptScore W2904232271C77805123 @default.
- W2904232271 hasConceptScore W2904232271C86803240 @default.
- W2904232271 hasIssue "01" @default.
- W2904232271 hasLocation W29042322711 @default.
- W2904232271 hasOpenAccess W2904232271 @default.
- W2904232271 hasPrimaryLocation W29042322711 @default.
- W2904232271 hasRelatedWork W1911551976 @default.
- W2904232271 hasRelatedWork W2613863488 @default.
- W2904232271 hasRelatedWork W2904232271 @default.
- W2904232271 hasRelatedWork W2945291696 @default.
- W2904232271 hasRelatedWork W2963432546 @default.
- W2904232271 hasRelatedWork W3152873672 @default.
- W2904232271 hasRelatedWork W3159366499 @default.
- W2904232271 hasRelatedWork W3202765922 @default.
- W2904232271 hasRelatedWork W4287078801 @default.
- W2904232271 hasRelatedWork W4289887588 @default.
- W2904232271 hasVolume "33" @default.
- W2904232271 isParatext "false" @default.
- W2904232271 isRetracted "false" @default.
- W2904232271 magId "2904232271" @default.
- W2904232271 workType "article" @default.