Matches in SemOpenAlex for { <https://semopenalex.org/work/W2068306051> ?p ?o ?g. }
Showing items 1 to 68 of
68
with 100 items per page.
- W2068306051 abstract "We propose a new method of preparing various policies to distinguish main rewards from temporal rewards toward the interactive reinforcement learning method in which reward functions are given incrementally from an initial state to the goal state. Shaping is the theoretical framework of interactive reinforcement learning. Most previous shaping researches assume shaping reward function that is monotonic distance function to the main goal and that is policy invariant. However, these assumptions will not be true on interactive reinforcement learning. To solve them, it is necessary to distinguish main rewards included in an expected optimal policy from temporal rewards only to guide its learning toward the optimal policy. This paper proposes the reward discrimination method for an interactive reinforcement learning agent. First, we introduce a concept of every-visit-optimality to define various policies. Then we present a method to search various policies on an identified MDP model. Experiments to evaluate the total search cost of acquiring various policies are performed between modified-PIA and our method. As the experimental results, our method holds the total search cost against increasing the number of rewards. This suggests that our method is better than previous reinforcement learning methods for interactive reinforcement learning in which many rewards are added incrementally" @default.
- W2068306051 created "2016-06-24" @default.
- W2068306051 creator A5032242069 @default.
- W2068306051 creator A5040643695 @default.
- W2068306051 date "2006-01-01" @default.
- W2068306051 modified "2023-09-26" @default.
- W2068306051 title "Preparing various policies for interactive reinforcement learning for the SICE-ICASE International Joint Conference 2006 (SICE-ICCAS 2006)" @default.
- W2068306051 cites W2079247031 @default.
- W2068306051 cites W2132504164 @default.
- W2068306051 doi "https://doi.org/10.1109/sice.2006.315139" @default.
- W2068306051 hasPublicationYear "2006" @default.
- W2068306051 type Work @default.
- W2068306051 sameAs 2068306051 @default.
- W2068306051 citedByCount "11" @default.
- W2068306051 countsByYear W20683060512013 @default.
- W2068306051 countsByYear W20683060512015 @default.
- W2068306051 countsByYear W20683060512016 @default.
- W2068306051 countsByYear W20683060512018 @default.
- W2068306051 countsByYear W20683060512019 @default.
- W2068306051 countsByYear W20683060512022 @default.
- W2068306051 crossrefType "proceedings-article" @default.
- W2068306051 hasAuthorship W2068306051A5032242069 @default.
- W2068306051 hasAuthorship W2068306051A5040643695 @default.
- W2068306051 hasConcept C119857082 @default.
- W2068306051 hasConcept C14036430 @default.
- W2068306051 hasConcept C154945302 @default.
- W2068306051 hasConcept C15744967 @default.
- W2068306051 hasConcept C196340769 @default.
- W2068306051 hasConcept C2776716048 @default.
- W2068306051 hasConcept C2779436431 @default.
- W2068306051 hasConcept C41008148 @default.
- W2068306051 hasConcept C49774154 @default.
- W2068306051 hasConcept C67203356 @default.
- W2068306051 hasConcept C77805123 @default.
- W2068306051 hasConcept C78458016 @default.
- W2068306051 hasConcept C86803240 @default.
- W2068306051 hasConcept C97541855 @default.
- W2068306051 hasConceptScore W2068306051C119857082 @default.
- W2068306051 hasConceptScore W2068306051C14036430 @default.
- W2068306051 hasConceptScore W2068306051C154945302 @default.
- W2068306051 hasConceptScore W2068306051C15744967 @default.
- W2068306051 hasConceptScore W2068306051C196340769 @default.
- W2068306051 hasConceptScore W2068306051C2776716048 @default.
- W2068306051 hasConceptScore W2068306051C2779436431 @default.
- W2068306051 hasConceptScore W2068306051C41008148 @default.
- W2068306051 hasConceptScore W2068306051C49774154 @default.
- W2068306051 hasConceptScore W2068306051C67203356 @default.
- W2068306051 hasConceptScore W2068306051C77805123 @default.
- W2068306051 hasConceptScore W2068306051C78458016 @default.
- W2068306051 hasConceptScore W2068306051C86803240 @default.
- W2068306051 hasConceptScore W2068306051C97541855 @default.
- W2068306051 hasLocation W20683060511 @default.
- W2068306051 hasOpenAccess W2068306051 @default.
- W2068306051 hasPrimaryLocation W20683060511 @default.
- W2068306051 hasRelatedWork W1534480106 @default.
- W2068306051 hasRelatedWork W1564932097 @default.
- W2068306051 hasRelatedWork W172603552 @default.
- W2068306051 hasRelatedWork W2071035582 @default.
- W2068306051 hasRelatedWork W2105011545 @default.
- W2068306051 hasRelatedWork W2292896113 @default.
- W2068306051 hasRelatedWork W2920949972 @default.
- W2068306051 hasRelatedWork W3022038857 @default.
- W2068306051 hasRelatedWork W4308806426 @default.
- W2068306051 hasRelatedWork W4319083788 @default.
- W2068306051 isParatext "false" @default.
- W2068306051 isRetracted "false" @default.
- W2068306051 magId "2068306051" @default.
- W2068306051 workType "article" @default.