Matches in SemOpenAlex for { <https://semopenalex.org/work/W2287341859> ?p ?o ?g. }
Showing items 1 to 73 of
73
with 100 items per page.
- W2287341859 abstract "A reinforcement learning system with limited computational resources interacts with an unrestricted, unknown environment. Its goal is to maximize cumulative reward, to be obtained throughout its limited, unknown lifetime. System policy is an arbitrary modifiable algorithm mapping environmental inputs and internal states to outputs and new internal states. The problem is: in realistic, unknown environments, each policy modification process (PMP) occurring during system life may have unpredictable influence on environmental states, rewards and PMPs at any later time. Existing reinforcement learning algorithms cannot properly deal with this. Neither can naive exhaustive search among all policy candidates --- not even in case of very small search spaces. In fact, a reasonable way of measuring performance improvements in such general (but typical) situations is missing. I define such a measure based on the novel ``reinforcement acceleration criterion'''' (RAC). At a given time, RAC is satisfied if the beginning of each completed PMP that computed a currently valid policy modification has been followed by long-term acceleration of average reinforcement intake (the computation time for later PMPs is taken into account). I present a method called ``environment-independent reinforcement acceleration'''' (EIRA) which is guaranteed to achieve RAC. EIRA does neither care whether the system''s policy allows for changing itself, nor whether there are multiple, interacting learning systems. Consequences are: (1) a sound theoretical framework for ``meta-learning'''' (because the success of a PMP recursively depends on the success of all later PMPs, for which it is setting the stage). (2) A sound theoretical framework for multi-agent learning. The principles have been implemented (1) in a single system using an assembler-like programming language to modify its own policy, and (2) a system consisting of multiple agents, where each agent is in fact just a connection in a fully recurrent reinforcement learning neural net. A by-product of this research is a general reinforcement learning algorithm for such nets. Preliminary experiments illustrate the theory." @default.
- W2287341859 created "2016-06-24" @default.
- W2287341859 creator A5044629673 @default.
- W2287341859 date "1995-05-29" @default.
- W2287341859 modified "2023-09-27" @default.
- W2287341859 title "Environment-Independent Reinforcement Acceleration" @default.
- W2287341859 hasPublicationYear "1995" @default.
- W2287341859 type Work @default.
- W2287341859 sameAs 2287341859 @default.
- W2287341859 citedByCount "3" @default.
- W2287341859 crossrefType "journal-article" @default.
- W2287341859 hasAuthorship W2287341859A5044629673 @default.
- W2287341859 hasConcept C111919701 @default.
- W2287341859 hasConcept C117896860 @default.
- W2287341859 hasConcept C119857082 @default.
- W2287341859 hasConcept C121332964 @default.
- W2287341859 hasConcept C127413603 @default.
- W2287341859 hasConcept C154945302 @default.
- W2287341859 hasConcept C15744967 @default.
- W2287341859 hasConcept C201995342 @default.
- W2287341859 hasConcept C2780451532 @default.
- W2287341859 hasConcept C41008148 @default.
- W2287341859 hasConcept C61797465 @default.
- W2287341859 hasConcept C62520636 @default.
- W2287341859 hasConcept C67203356 @default.
- W2287341859 hasConcept C74650414 @default.
- W2287341859 hasConcept C77805123 @default.
- W2287341859 hasConcept C97541855 @default.
- W2287341859 hasConcept C98045186 @default.
- W2287341859 hasConceptScore W2287341859C111919701 @default.
- W2287341859 hasConceptScore W2287341859C117896860 @default.
- W2287341859 hasConceptScore W2287341859C119857082 @default.
- W2287341859 hasConceptScore W2287341859C121332964 @default.
- W2287341859 hasConceptScore W2287341859C127413603 @default.
- W2287341859 hasConceptScore W2287341859C154945302 @default.
- W2287341859 hasConceptScore W2287341859C15744967 @default.
- W2287341859 hasConceptScore W2287341859C201995342 @default.
- W2287341859 hasConceptScore W2287341859C2780451532 @default.
- W2287341859 hasConceptScore W2287341859C41008148 @default.
- W2287341859 hasConceptScore W2287341859C61797465 @default.
- W2287341859 hasConceptScore W2287341859C62520636 @default.
- W2287341859 hasConceptScore W2287341859C67203356 @default.
- W2287341859 hasConceptScore W2287341859C74650414 @default.
- W2287341859 hasConceptScore W2287341859C77805123 @default.
- W2287341859 hasConceptScore W2287341859C97541855 @default.
- W2287341859 hasConceptScore W2287341859C98045186 @default.
- W2287341859 hasLocation W22873418591 @default.
- W2287341859 hasOpenAccess W2287341859 @default.
- W2287341859 hasPrimaryLocation W22873418591 @default.
- W2287341859 hasRelatedWork W168445157 @default.
- W2287341859 hasRelatedWork W1712734637 @default.
- W2287341859 hasRelatedWork W176505340 @default.
- W2287341859 hasRelatedWork W2111660473 @default.
- W2287341859 hasRelatedWork W2117626647 @default.
- W2287341859 hasRelatedWork W2127532832 @default.
- W2287341859 hasRelatedWork W21657860 @default.
- W2287341859 hasRelatedWork W2515409829 @default.
- W2287341859 hasRelatedWork W2542999299 @default.
- W2287341859 hasRelatedWork W2892990871 @default.
- W2287341859 hasRelatedWork W2903892364 @default.
- W2287341859 hasRelatedWork W2915060045 @default.
- W2287341859 hasRelatedWork W2965749647 @default.
- W2287341859 hasRelatedWork W3014593416 @default.
- W2287341859 hasRelatedWork W3072315125 @default.
- W2287341859 hasRelatedWork W3101704277 @default.
- W2287341859 hasRelatedWork W3104240813 @default.
- W2287341859 hasRelatedWork W3106309118 @default.
- W2287341859 hasRelatedWork W3152815381 @default.
- W2287341859 hasRelatedWork W3551423 @default.
- W2287341859 isParatext "false" @default.
- W2287341859 isRetracted "false" @default.
- W2287341859 magId "2287341859" @default.
- W2287341859 workType "article" @default.