Matches in SemOpenAlex for { <https://semopenalex.org/work/W3204172010> ?p ?o ?g. }
- W3204172010 endingPage "5070" @default.
- W3204172010 startingPage "5055" @default.
- W3204172010 abstract "Reinforcement learning (RL) algorithms have been used to learn how to implement tasks in uncertain and partially unknown environments. In practice, environments are usually uncontrolled and may affect task performance in an adversarial way. In this article, we model the interaction between an RL agent and its potentially adversarial environment as a turn-based zero-sum stochastic game. The task requirements are represented both qualitatively as a subset of linear temporal logic (LTL) specifications, and quantitatively as a reward function. For each case in which the LTL specification is realizable and can be equivalently transformed into a deterministic Büchi automaton, we show that there always exists a memoryless almost-sure winning strategy that is <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink><tex-math notation=LaTeX>$varepsilon$</tex-math></inline-formula> -optimal for the discounted-sum objective for any arbitrary positive <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink><tex-math notation=LaTeX>$varepsilon$</tex-math></inline-formula> . We propose a probably approximately correct (PAC) learning algorithm that learns such a strategy efficiently in an online manner with <italic xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>a priori</i> unknown reward functions and unknown transition distributions. To the best of our knowledge, this is the first result on PAC learning in stochastic games with independent quantitative and qualitative objectives." @default.
- W3204172010 created "2021-10-11" @default.
- W3204172010 creator A5042986768 @default.
- W3204172010 creator A5068441112 @default.
- W3204172010 date "2022-10-01" @default.
- W3204172010 modified "2023-09-27" @default.
- W3204172010 title "Probably Approximately Correct Learning in Adversarial Environments With Temporal Logic Specifications" @default.
- W3204172010 cites W1496590343 @default.
- W3204172010 cites W1521551618 @default.
- W3204172010 cites W1556387789 @default.
- W3204172010 cites W1667094241 @default.
- W3204172010 cites W1738154394 @default.
- W3204172010 cites W1964007137 @default.
- W3204172010 cites W1973039793 @default.
- W3204172010 cites W2005684815 @default.
- W3204172010 cites W2117927826 @default.
- W3204172010 cites W2206317714 @default.
- W3204172010 cites W2222789563 @default.
- W3204172010 cites W2293285880 @default.
- W3204172010 cites W2341066496 @default.
- W3204172010 cites W2792090629 @default.
- W3204172010 cites W2792346428 @default.
- W3204172010 cites W2895196950 @default.
- W3204172010 cites W2913325211 @default.
- W3204172010 cites W2931553127 @default.
- W3204172010 cites W2963575966 @default.
- W3204172010 cites W2963604565 @default.
- W3204172010 cites W2963778636 @default.
- W3204172010 cites W2986800043 @default.
- W3204172010 cites W3011250830 @default.
- W3204172010 cites W3090827750 @default.
- W3204172010 cites W4233413206 @default.
- W3204172010 cites W6269798 @default.
- W3204172010 cites W67698512 @default.
- W3204172010 doi "https://doi.org/10.1109/tac.2021.3115080" @default.
- W3204172010 hasPublicationYear "2022" @default.
- W3204172010 type Work @default.
- W3204172010 sameAs 3204172010 @default.
- W3204172010 citedByCount "0" @default.
- W3204172010 crossrefType "journal-article" @default.
- W3204172010 hasAuthorship W3204172010A5042986768 @default.
- W3204172010 hasAuthorship W3204172010A5068441112 @default.
- W3204172010 hasConcept C111472728 @default.
- W3204172010 hasConcept C112505250 @default.
- W3204172010 hasConcept C11413529 @default.
- W3204172010 hasConcept C118615104 @default.
- W3204172010 hasConcept C138885662 @default.
- W3204172010 hasConcept C14036430 @default.
- W3204172010 hasConcept C154945302 @default.
- W3204172010 hasConcept C162324750 @default.
- W3204172010 hasConcept C187736073 @default.
- W3204172010 hasConcept C2780451532 @default.
- W3204172010 hasConcept C33923547 @default.
- W3204172010 hasConcept C37736160 @default.
- W3204172010 hasConcept C41008148 @default.
- W3204172010 hasConcept C45357846 @default.
- W3204172010 hasConcept C75553542 @default.
- W3204172010 hasConcept C78458016 @default.
- W3204172010 hasConcept C80444323 @default.
- W3204172010 hasConcept C86803240 @default.
- W3204172010 hasConcept C94375191 @default.
- W3204172010 hasConcept C97541855 @default.
- W3204172010 hasConceptScore W3204172010C111472728 @default.
- W3204172010 hasConceptScore W3204172010C112505250 @default.
- W3204172010 hasConceptScore W3204172010C11413529 @default.
- W3204172010 hasConceptScore W3204172010C118615104 @default.
- W3204172010 hasConceptScore W3204172010C138885662 @default.
- W3204172010 hasConceptScore W3204172010C14036430 @default.
- W3204172010 hasConceptScore W3204172010C154945302 @default.
- W3204172010 hasConceptScore W3204172010C162324750 @default.
- W3204172010 hasConceptScore W3204172010C187736073 @default.
- W3204172010 hasConceptScore W3204172010C2780451532 @default.
- W3204172010 hasConceptScore W3204172010C33923547 @default.
- W3204172010 hasConceptScore W3204172010C37736160 @default.
- W3204172010 hasConceptScore W3204172010C41008148 @default.
- W3204172010 hasConceptScore W3204172010C45357846 @default.
- W3204172010 hasConceptScore W3204172010C75553542 @default.
- W3204172010 hasConceptScore W3204172010C78458016 @default.
- W3204172010 hasConceptScore W3204172010C80444323 @default.
- W3204172010 hasConceptScore W3204172010C86803240 @default.
- W3204172010 hasConceptScore W3204172010C94375191 @default.
- W3204172010 hasConceptScore W3204172010C97541855 @default.
- W3204172010 hasIssue "10" @default.
- W3204172010 hasLocation W32041720101 @default.
- W3204172010 hasOpenAccess W3204172010 @default.
- W3204172010 hasPrimaryLocation W32041720101 @default.
- W3204172010 hasRelatedWork W118443536 @default.
- W3204172010 hasRelatedWork W1519821135 @default.
- W3204172010 hasRelatedWork W1575028430 @default.
- W3204172010 hasRelatedWork W2022606606 @default.
- W3204172010 hasRelatedWork W2625142831 @default.
- W3204172010 hasRelatedWork W2734912394 @default.
- W3204172010 hasRelatedWork W2890179775 @default.
- W3204172010 hasRelatedWork W2891191051 @default.
- W3204172010 hasRelatedWork W2894720836 @default.
- W3204172010 hasRelatedWork W2902414214 @default.
- W3204172010 hasVolume "67" @default.
- W3204172010 isParatext "false" @default.