Matches in SemOpenAlex for { <https://semopenalex.org/work/W3042776265> ?p ?o ?g. }
Showing items 1 to 96 of
96
with 100 items per page.
- W3042776265 abstract "This is a short communication on a Lyapunov function argument for softmax in bandit problems. There are a number of excellent papers coming out using differential equations for policy gradient algorithms in reinforcement learning cite{agarwal2019optimality,bhandari2019global,mei2020global}. We give a short argument that gives a regret bound for the soft-max ordinary differential equation for bandit problems. We derive a similar result for a different policy gradient algorithm, again for bandit problems. For this second algorithm, it is possible to prove regret bounds in the stochastic case cite{DW20}. At the end, we summarize some ideas and issues on deriving stochastic regret bounds for policy gradients." @default.
- W3042776265 created "2020-07-23" @default.
- W3042776265 creator A5068542108 @default.
- W3042776265 date "2020-07-20" @default.
- W3042776265 modified "2023-09-27" @default.
- W3042776265 title "A Short Note on Soft-max and Policy Gradients in Bandits Problems." @default.
- W3042776265 cites W2948432982 @default.
- W3042776265 cites W3034426742 @default.
- W3042776265 cites W3042983647 @default.
- W3042776265 cites W3046626913 @default.
- W3042776265 hasPublicationYear "2020" @default.
- W3042776265 type Work @default.
- W3042776265 sameAs 3042776265 @default.
- W3042776265 citedByCount "1" @default.
- W3042776265 countsByYear W30427762652020 @default.
- W3042776265 crossrefType "posted-content" @default.
- W3042776265 hasAuthorship W3042776265A5068542108 @default.
- W3042776265 hasConcept C105795698 @default.
- W3042776265 hasConcept C121332964 @default.
- W3042776265 hasConcept C126255220 @default.
- W3042776265 hasConcept C134306372 @default.
- W3042776265 hasConcept C14036430 @default.
- W3042776265 hasConcept C144237770 @default.
- W3042776265 hasConcept C154945302 @default.
- W3042776265 hasConcept C158622935 @default.
- W3042776265 hasConcept C185592680 @default.
- W3042776265 hasConcept C188441871 @default.
- W3042776265 hasConcept C28826006 @default.
- W3042776265 hasConcept C33923547 @default.
- W3042776265 hasConcept C41008148 @default.
- W3042776265 hasConcept C50644808 @default.
- W3042776265 hasConcept C50817715 @default.
- W3042776265 hasConcept C51544822 @default.
- W3042776265 hasConcept C51955184 @default.
- W3042776265 hasConcept C55493867 @default.
- W3042776265 hasConcept C60640748 @default.
- W3042776265 hasConcept C62520636 @default.
- W3042776265 hasConcept C78045399 @default.
- W3042776265 hasConcept C78458016 @default.
- W3042776265 hasConcept C86803240 @default.
- W3042776265 hasConcept C93226319 @default.
- W3042776265 hasConcept C97355855 @default.
- W3042776265 hasConcept C98184364 @default.
- W3042776265 hasConceptScore W3042776265C105795698 @default.
- W3042776265 hasConceptScore W3042776265C121332964 @default.
- W3042776265 hasConceptScore W3042776265C126255220 @default.
- W3042776265 hasConceptScore W3042776265C134306372 @default.
- W3042776265 hasConceptScore W3042776265C14036430 @default.
- W3042776265 hasConceptScore W3042776265C144237770 @default.
- W3042776265 hasConceptScore W3042776265C154945302 @default.
- W3042776265 hasConceptScore W3042776265C158622935 @default.
- W3042776265 hasConceptScore W3042776265C185592680 @default.
- W3042776265 hasConceptScore W3042776265C188441871 @default.
- W3042776265 hasConceptScore W3042776265C28826006 @default.
- W3042776265 hasConceptScore W3042776265C33923547 @default.
- W3042776265 hasConceptScore W3042776265C41008148 @default.
- W3042776265 hasConceptScore W3042776265C50644808 @default.
- W3042776265 hasConceptScore W3042776265C50817715 @default.
- W3042776265 hasConceptScore W3042776265C51544822 @default.
- W3042776265 hasConceptScore W3042776265C51955184 @default.
- W3042776265 hasConceptScore W3042776265C55493867 @default.
- W3042776265 hasConceptScore W3042776265C60640748 @default.
- W3042776265 hasConceptScore W3042776265C62520636 @default.
- W3042776265 hasConceptScore W3042776265C78045399 @default.
- W3042776265 hasConceptScore W3042776265C78458016 @default.
- W3042776265 hasConceptScore W3042776265C86803240 @default.
- W3042776265 hasConceptScore W3042776265C93226319 @default.
- W3042776265 hasConceptScore W3042776265C97355855 @default.
- W3042776265 hasConceptScore W3042776265C98184364 @default.
- W3042776265 hasLocation W30427762651 @default.
- W3042776265 hasOpenAccess W3042776265 @default.
- W3042776265 hasPrimaryLocation W30427762651 @default.
- W3042776265 hasRelatedWork W1520324410 @default.
- W3042776265 hasRelatedWork W2006036413 @default.
- W3042776265 hasRelatedWork W2026850974 @default.
- W3042776265 hasRelatedWork W2082344166 @default.
- W3042776265 hasRelatedWork W2120323365 @default.
- W3042776265 hasRelatedWork W2149288161 @default.
- W3042776265 hasRelatedWork W2327350185 @default.
- W3042776265 hasRelatedWork W2565517532 @default.
- W3042776265 hasRelatedWork W2612769410 @default.
- W3042776265 hasRelatedWork W2912554695 @default.
- W3042776265 hasRelatedWork W2961360458 @default.
- W3042776265 hasRelatedWork W3011080757 @default.
- W3042776265 hasRelatedWork W3044451384 @default.
- W3042776265 hasRelatedWork W3099294802 @default.
- W3042776265 hasRelatedWork W3105702366 @default.
- W3042776265 hasRelatedWork W3125094489 @default.
- W3042776265 hasRelatedWork W3125516344 @default.
- W3042776265 hasRelatedWork W3137334160 @default.
- W3042776265 hasRelatedWork W31934288 @default.
- W3042776265 hasRelatedWork W2183348085 @default.
- W3042776265 isParatext "false" @default.
- W3042776265 isRetracted "false" @default.
- W3042776265 magId "3042776265" @default.
- W3042776265 workType "article" @default.