Matches in SemOpenAlex for { <https://semopenalex.org/work/W4363620163> ?p ?o ?g. }
Showing items 1 to 63 of
63
with 100 items per page.
- W4363620163 abstract "This paper examines a new measure of the exploration/exploitation trade-off in reinforcement learning (RL) called the occupancy information ratio (OIR). To this end, the paper derives the Information-Directed Actor-Critic (IDAC) algorithm for solving the OIR problem, provides an overview of the rich theory underlying IDAC and related OIR policy gradient methods, and experimentally investigates the advantages of such methods. The central contribution of this paper is to provide empirical evidence that, due to the form of the OIR objective, IDAC enjoys superior performance over vanilla RL methods in sparse-reward environments." @default.
- W4363620163 created "2023-04-11" @default.
- W4363620163 creator A5025896653 @default.
- W4363620163 creator A5049225160 @default.
- W4363620163 creator A5056599663 @default.
- W4363620163 date "2023-03-22" @default.
- W4363620163 modified "2023-10-18" @default.
- W4363620163 title "Information-Directed Policy Search in Sparse-Reward Settings via the Occupancy Information Ratio" @default.
- W4363620163 cites W2094387729 @default.
- W4363620163 cites W2145339207 @default.
- W4363620163 cites W2938421504 @default.
- W4363620163 cites W2963099939 @default.
- W4363620163 cites W3109546547 @default.
- W4363620163 doi "https://doi.org/10.1109/ciss56502.2023.10089655" @default.
- W4363620163 hasPublicationYear "2023" @default.
- W4363620163 type Work @default.
- W4363620163 citedByCount "0" @default.
- W4363620163 crossrefType "proceedings-article" @default.
- W4363620163 hasAuthorship W4363620163A5025896653 @default.
- W4363620163 hasAuthorship W4363620163A5049225160 @default.
- W4363620163 hasAuthorship W4363620163A5056599663 @default.
- W4363620163 hasConcept C105795698 @default.
- W4363620163 hasConcept C124101348 @default.
- W4363620163 hasConcept C126255220 @default.
- W4363620163 hasConcept C127413603 @default.
- W4363620163 hasConcept C154945302 @default.
- W4363620163 hasConcept C160331591 @default.
- W4363620163 hasConcept C170154142 @default.
- W4363620163 hasConcept C2780009758 @default.
- W4363620163 hasConcept C33923547 @default.
- W4363620163 hasConcept C41008148 @default.
- W4363620163 hasConcept C52622258 @default.
- W4363620163 hasConcept C97541855 @default.
- W4363620163 hasConceptScore W4363620163C105795698 @default.
- W4363620163 hasConceptScore W4363620163C124101348 @default.
- W4363620163 hasConceptScore W4363620163C126255220 @default.
- W4363620163 hasConceptScore W4363620163C127413603 @default.
- W4363620163 hasConceptScore W4363620163C154945302 @default.
- W4363620163 hasConceptScore W4363620163C160331591 @default.
- W4363620163 hasConceptScore W4363620163C170154142 @default.
- W4363620163 hasConceptScore W4363620163C2780009758 @default.
- W4363620163 hasConceptScore W4363620163C33923547 @default.
- W4363620163 hasConceptScore W4363620163C41008148 @default.
- W4363620163 hasConceptScore W4363620163C52622258 @default.
- W4363620163 hasConceptScore W4363620163C97541855 @default.
- W4363620163 hasFunder F4320306076 @default.
- W4363620163 hasFunder F4320338295 @default.
- W4363620163 hasLocation W43636201631 @default.
- W4363620163 hasOpenAccess W4363620163 @default.
- W4363620163 hasPrimaryLocation W43636201631 @default.
- W4363620163 hasRelatedWork W2923653485 @default.
- W4363620163 hasRelatedWork W2952472710 @default.
- W4363620163 hasRelatedWork W2957776456 @default.
- W4363620163 hasRelatedWork W2959276766 @default.
- W4363620163 hasRelatedWork W3005560120 @default.
- W4363620163 hasRelatedWork W3037422413 @default.
- W4363620163 hasRelatedWork W4206669594 @default.
- W4363620163 hasRelatedWork W4255994452 @default.
- W4363620163 hasRelatedWork W4295941380 @default.
- W4363620163 hasRelatedWork W4361026739 @default.
- W4363620163 isParatext "false" @default.
- W4363620163 isRetracted "false" @default.
- W4363620163 workType "article" @default.