SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4200440369> ?p ?o ?g. }

Showing items 1 to 48 of 48 with 100 items per page.

W4200440369 abstract "Human knowledge is used in reinforcement learning (RL), which reduces the amount of time taken by the learning agent to achieve its goal. The TAMER (Training an Agent Manually via Evaluative Reinforcements) algorithm allows a human to provide a reward to an autonomous agent through a manual interface while watching the agent performs the action. Because a policy, the agent have, is updated based on human rewards, it approximates how a human trainer gives rewards to the agent. For policy update, events that occurred during learning were selected. Furthermore, while selecting events, the temporal distance from the event to the human reward is considered. Thus, the events that only occurred in a certain time interval before the human trainer gives a reward are selected. However, this approach of considering only the time factor demands quite many human rewards for the policy. The policy update with high complexity make the human trainer exhausted during improvement of policy. Therefore, we propose a new method of selecting events, which considers the entropy value over the distribution of Q-values, in addition to the time factor. For the policy update in our proposed event selection method, we reuse the events despite of long temporal distance since human reward when their each human reward is negative and entropy value (over the distribution of Q-values) is low. To compare the effectiveness of the proposed method with the classic TAMER, we implement an experiment with the policy initialized to an incorrect weight. The results show that the TAMER algorithm, using our proposed selection of events, efficiently improves the policy." @default.
W4200440369 created "2021-12-31" @default.
W4200440369 creator A5028220953 @default.
W4200440369 creator A5045038402 @default.
W4200440369 creator A5061812063 @default.
W4200440369 creator A5067007480 @default.
W4200440369 date "2021-10-20" @default.
W4200440369 modified "2023-09-29" @default.
W4200440369 title "An Efficient Policy Improvement in Human Interactive Learning Using Entropy" @default.
W4200440369 doi "https://doi.org/10.1109/ictc52510.2021.9620856" @default.
W4200440369 hasPublicationYear "2021" @default.
W4200440369 type Work @default.
W4200440369 citedByCount "0" @default.
W4200440369 crossrefType "proceedings-article" @default.
W4200440369 hasAuthorship W4200440369A5028220953 @default.
W4200440369 hasAuthorship W4200440369A5045038402 @default.
W4200440369 hasAuthorship W4200440369A5061812063 @default.
W4200440369 hasAuthorship W4200440369A5067007480 @default.
W4200440369 hasConcept C106301342 @default.
W4200440369 hasConcept C119857082 @default.
W4200440369 hasConcept C121332964 @default.
W4200440369 hasConcept C154945302 @default.
W4200440369 hasConcept C41008148 @default.
W4200440369 hasConcept C62520636 @default.
W4200440369 hasConceptScore W4200440369C106301342 @default.
W4200440369 hasConceptScore W4200440369C119857082 @default.
W4200440369 hasConceptScore W4200440369C121332964 @default.
W4200440369 hasConceptScore W4200440369C154945302 @default.
W4200440369 hasConceptScore W4200440369C41008148 @default.
W4200440369 hasConceptScore W4200440369C62520636 @default.
W4200440369 hasFunder F4320322006 @default.
W4200440369 hasFunder F4320323890 @default.
W4200440369 hasLocation W42004403691 @default.
W4200440369 hasOpenAccess W4200440369 @default.
W4200440369 hasPrimaryLocation W42004403691 @default.
W4200440369 hasRelatedWork W2961085424 @default.
W4200440369 hasRelatedWork W3046775127 @default.
W4200440369 hasRelatedWork W3107602296 @default.
W4200440369 hasRelatedWork W3170094116 @default.
W4200440369 hasRelatedWork W3209574120 @default.
W4200440369 hasRelatedWork W4205958290 @default.
W4200440369 hasRelatedWork W4286629047 @default.
W4200440369 hasRelatedWork W4306321456 @default.
W4200440369 hasRelatedWork W4306674287 @default.
W4200440369 hasRelatedWork W4224009465 @default.
W4200440369 isParatext "false" @default.
W4200440369 isRetracted "false" @default.
W4200440369 workType "article" @default.