Matches in SemOpenAlex for { <https://semopenalex.org/work/W3031673669> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W3031673669 endingPage "75" @default.
- W3031673669 startingPage "68" @default.
- W3031673669 abstract "We consider the problem of estimating the policy and transition probability model of a Markov Decision Process from data (state, action, next state tuples). The transition probability and policy are assumed to be parametric functions of a sparse set of features associated with the tuples. We propose two regularized maximum likelihood estimation algorithms for learning the transition probability model and policy, respectively. An upper bound is established on the regret, which is the difference between the average reward of the estimated policy under the estimated transition probabilities and that of the original unknown policy under the true (unknown) transition probabilities. We provide a sample complexity result showing that we can achieve a low regret with a relatively small amount of training samples. We illustrate the theoretical results with a healthcare example and a robot navigation experiment." @default.
- W3031673669 created "2020-06-05" @default.
- W3031673669 creator A5054049845 @default.
- W3031673669 creator A5062549428 @default.
- W3031673669 creator A5075696701 @default.
- W3031673669 date "2021-01-01" @default.
- W3031673669 modified "2023-10-17" @default.
- W3031673669 title "Learning parametric policies and transition probability models of markov decision processes from data" @default.
- W3031673669 cites W1965878388 @default.
- W3031673669 cites W1986014385 @default.
- W3031673669 cites W2044191104 @default.
- W3031673669 cites W2144018496 @default.
- W3031673669 cites W2155153696 @default.
- W3031673669 cites W2168565265 @default.
- W3031673669 cites W2560629614 @default.
- W3031673669 cites W2581566809 @default.
- W3031673669 doi "https://doi.org/10.1016/j.ejcon.2020.04.003" @default.
- W3031673669 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/7944408" @default.
- W3031673669 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/33716408" @default.
- W3031673669 hasPublicationYear "2021" @default.
- W3031673669 type Work @default.
- W3031673669 sameAs 3031673669 @default.
- W3031673669 citedByCount "2" @default.
- W3031673669 countsByYear W30316736692020 @default.
- W3031673669 countsByYear W30316736692021 @default.
- W3031673669 crossrefType "journal-article" @default.
- W3031673669 hasAuthorship W3031673669A5054049845 @default.
- W3031673669 hasAuthorship W3031673669A5062549428 @default.
- W3031673669 hasAuthorship W3031673669A5075696701 @default.
- W3031673669 hasBestOaLocation W30316736691 @default.
- W3031673669 hasConcept C104317684 @default.
- W3031673669 hasConcept C105795698 @default.
- W3031673669 hasConcept C106189395 @default.
- W3031673669 hasConcept C11413529 @default.
- W3031673669 hasConcept C117251300 @default.
- W3031673669 hasConcept C118615104 @default.
- W3031673669 hasConcept C118930307 @default.
- W3031673669 hasConcept C119857082 @default.
- W3031673669 hasConcept C126255220 @default.
- W3031673669 hasConcept C159886148 @default.
- W3031673669 hasConcept C17098449 @default.
- W3031673669 hasConcept C177264268 @default.
- W3031673669 hasConcept C185592680 @default.
- W3031673669 hasConcept C194232998 @default.
- W3031673669 hasConcept C199360897 @default.
- W3031673669 hasConcept C33923547 @default.
- W3031673669 hasConcept C41008148 @default.
- W3031673669 hasConcept C48103436 @default.
- W3031673669 hasConcept C50817715 @default.
- W3031673669 hasConcept C55493867 @default.
- W3031673669 hasConcept C98763669 @default.
- W3031673669 hasConceptScore W3031673669C104317684 @default.
- W3031673669 hasConceptScore W3031673669C105795698 @default.
- W3031673669 hasConceptScore W3031673669C106189395 @default.
- W3031673669 hasConceptScore W3031673669C11413529 @default.
- W3031673669 hasConceptScore W3031673669C117251300 @default.
- W3031673669 hasConceptScore W3031673669C118615104 @default.
- W3031673669 hasConceptScore W3031673669C118930307 @default.
- W3031673669 hasConceptScore W3031673669C119857082 @default.
- W3031673669 hasConceptScore W3031673669C126255220 @default.
- W3031673669 hasConceptScore W3031673669C159886148 @default.
- W3031673669 hasConceptScore W3031673669C17098449 @default.
- W3031673669 hasConceptScore W3031673669C177264268 @default.
- W3031673669 hasConceptScore W3031673669C185592680 @default.
- W3031673669 hasConceptScore W3031673669C194232998 @default.
- W3031673669 hasConceptScore W3031673669C199360897 @default.
- W3031673669 hasConceptScore W3031673669C33923547 @default.
- W3031673669 hasConceptScore W3031673669C41008148 @default.
- W3031673669 hasConceptScore W3031673669C48103436 @default.
- W3031673669 hasConceptScore W3031673669C50817715 @default.
- W3031673669 hasConceptScore W3031673669C55493867 @default.
- W3031673669 hasConceptScore W3031673669C98763669 @default.
- W3031673669 hasFunder F4320306076 @default.
- W3031673669 hasFunder F4320332161 @default.
- W3031673669 hasFunder F4320337345 @default.
- W3031673669 hasLocation W30316736691 @default.
- W3031673669 hasLocation W30316736692 @default.
- W3031673669 hasLocation W30316736693 @default.
- W3031673669 hasOpenAccess W3031673669 @default.
- W3031673669 hasPrimaryLocation W30316736691 @default.
- W3031673669 hasRelatedWork W102453 @default.
- W3031673669 hasRelatedWork W491107 @default.
- W3031673669 hasRelatedWork W5133103 @default.
- W3031673669 hasRelatedWork W5310384 @default.
- W3031673669 hasRelatedWork W5718419 @default.
- W3031673669 hasRelatedWork W5876636 @default.
- W3031673669 hasRelatedWork W7587899 @default.
- W3031673669 hasRelatedWork W8540740 @default.
- W3031673669 hasRelatedWork W8709591 @default.
- W3031673669 hasRelatedWork W9932698 @default.
- W3031673669 hasVolume "57" @default.
- W3031673669 isParatext "false" @default.
- W3031673669 isRetracted "false" @default.
- W3031673669 magId "3031673669" @default.
- W3031673669 workType "article" @default.