Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386722149> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W4386722149 abstract "We propose a framework for verifiable and compositional reinforcement learning (RL) in which a collection of RL subsystems, each of which learns to accomplish a separate subtask, are composed to achieve an overall task. The framework consists of a high-level model, represented as a parametric Markov decision process, which is used to plan and analyze compositions of subsystems, and of the collection of low-level subsystems themselves. The subsystems are implemented as deep RL agents operating under partial observability. By defining interfaces between the subsystems, the framework enables automatic decompositions of task specifications, e.g., reach a target set of states with a probability of at least 0.95, into individual subtask specifications, i.e. achieve the subsystem's exit conditions with at least some minimum probability, given that its entry conditions are met. This in turn allows for the independent training and testing of the subsystems. We present theoretical results guaranteeing that if each subsystem learns a policy satisfying its subtask specification, then their composition is guaranteed to satisfy the overall task specification. Conversely, if the subtask specifications cannot all be satisfied by the learned policies, we present a method, formulated as the problem of finding an optimal set of parameters in the high-level model, to automatically update the subtask specifications to account for the observed shortcomings. The result is an iterative procedure for defining subtask specifications, and for training the subsystems to meet them. Experimental results demonstrate the presented framework's novel capabilities in environments with both full and partial observability, discrete and continuous state and action spaces, as well as deterministic and stochastic dynamics." @default.
- W4386722149 created "2023-09-14" @default.
- W4386722149 creator A5016121500 @default.
- W4386722149 creator A5025744882 @default.
- W4386722149 creator A5050465036 @default.
- W4386722149 creator A5066859311 @default.
- W4386722149 creator A5068441112 @default.
- W4386722149 date "2023-09-09" @default.
- W4386722149 modified "2023-09-30" @default.
- W4386722149 title "Verifiable Reinforcement Learning Systems via Compositionality" @default.
- W4386722149 doi "https://doi.org/10.48550/arxiv.2309.06420" @default.
- W4386722149 hasPublicationYear "2023" @default.
- W4386722149 type Work @default.
- W4386722149 citedByCount "0" @default.
- W4386722149 crossrefType "posted-content" @default.
- W4386722149 hasAuthorship W4386722149A5016121500 @default.
- W4386722149 hasAuthorship W4386722149A5025744882 @default.
- W4386722149 hasAuthorship W4386722149A5050465036 @default.
- W4386722149 hasAuthorship W4386722149A5066859311 @default.
- W4386722149 hasAuthorship W4386722149A5068441112 @default.
- W4386722149 hasBestOaLocation W43867221491 @default.
- W4386722149 hasConcept C105795698 @default.
- W4386722149 hasConcept C106189395 @default.
- W4386722149 hasConcept C117251300 @default.
- W4386722149 hasConcept C121375916 @default.
- W4386722149 hasConcept C154945302 @default.
- W4386722149 hasConcept C159886148 @default.
- W4386722149 hasConcept C162324750 @default.
- W4386722149 hasConcept C177264268 @default.
- W4386722149 hasConcept C187736073 @default.
- W4386722149 hasConcept C199360897 @default.
- W4386722149 hasConcept C2780451532 @default.
- W4386722149 hasConcept C28826006 @default.
- W4386722149 hasConcept C33923547 @default.
- W4386722149 hasConcept C36299963 @default.
- W4386722149 hasConcept C41008148 @default.
- W4386722149 hasConcept C85847156 @default.
- W4386722149 hasConcept C97541855 @default.
- W4386722149 hasConcept C98045186 @default.
- W4386722149 hasConceptScore W4386722149C105795698 @default.
- W4386722149 hasConceptScore W4386722149C106189395 @default.
- W4386722149 hasConceptScore W4386722149C117251300 @default.
- W4386722149 hasConceptScore W4386722149C121375916 @default.
- W4386722149 hasConceptScore W4386722149C154945302 @default.
- W4386722149 hasConceptScore W4386722149C159886148 @default.
- W4386722149 hasConceptScore W4386722149C162324750 @default.
- W4386722149 hasConceptScore W4386722149C177264268 @default.
- W4386722149 hasConceptScore W4386722149C187736073 @default.
- W4386722149 hasConceptScore W4386722149C199360897 @default.
- W4386722149 hasConceptScore W4386722149C2780451532 @default.
- W4386722149 hasConceptScore W4386722149C28826006 @default.
- W4386722149 hasConceptScore W4386722149C33923547 @default.
- W4386722149 hasConceptScore W4386722149C36299963 @default.
- W4386722149 hasConceptScore W4386722149C41008148 @default.
- W4386722149 hasConceptScore W4386722149C85847156 @default.
- W4386722149 hasConceptScore W4386722149C97541855 @default.
- W4386722149 hasConceptScore W4386722149C98045186 @default.
- W4386722149 hasLocation W43867221491 @default.
- W4386722149 hasOpenAccess W4386722149 @default.
- W4386722149 hasPrimaryLocation W43867221491 @default.
- W4386722149 hasRelatedWork W1932117986 @default.
- W4386722149 hasRelatedWork W2802349643 @default.
- W4386722149 hasRelatedWork W2949964922 @default.
- W4386722149 hasRelatedWork W3036009608 @default.
- W4386722149 hasRelatedWork W3127085325 @default.
- W4386722149 hasRelatedWork W3171755056 @default.
- W4386722149 hasRelatedWork W3174896399 @default.
- W4386722149 hasRelatedWork W4283455536 @default.
- W4386722149 hasRelatedWork W4287123794 @default.
- W4386722149 hasRelatedWork W3101472277 @default.
- W4386722149 isParatext "false" @default.
- W4386722149 isRetracted "false" @default.
- W4386722149 workType "article" @default.