Matches in SemOpenAlex for { <https://semopenalex.org/work/W2735649811> ?p ?o ?g. }
Showing items 1 to 86 of
86
with 100 items per page.
- W2735649811 endingPage "1724" @default.
- W2735649811 startingPage "1705" @default.
- W2735649811 abstract "To learn control policies in unknown environments, learning agents need to explore by trying actions deemed suboptimal. In prior work, such exploration is performed by either perturbing the actions at each time-step independently, or by perturbing policy parameters over an entire episode. Since both of these strategies have certain advantages, a more balanced trade-off could be beneficial. We introduce a unifying view on step-based and episode-based exploration that allows for such balanced trade-offs. This trade-off strategy can be used with various reinforcement learning algorithms. In this paper, we study this generalized exploration strategy in a policy gradient method and in relative entropy policy search. We evaluate the exploration strategy on four dynamical systems and compare the results to the established step-based and episode-based exploration strategies. Our results show that a more balanced trade-off can yield faster learning and better final policies, and illustrate some of the effects that cause these performance differences." @default.
- W2735649811 created "2017-07-21" @default.
- W2735649811 creator A5057277609 @default.
- W2735649811 creator A5073193182 @default.
- W2735649811 creator A5083076675 @default.
- W2735649811 date "2017-07-13" @default.
- W2735649811 modified "2023-10-14" @default.
- W2735649811 title "Generalized exploration in policy search" @default.
- W2735649811 cites W1499669280 @default.
- W2735649811 cites W1507087299 @default.
- W2735649811 cites W1952489873 @default.
- W2735649811 cites W1977655452 @default.
- W2735649811 cites W1984381895 @default.
- W2735649811 cites W2071444114 @default.
- W2735649811 cites W2106155860 @default.
- W2735649811 cites W2107726111 @default.
- W2735649811 cites W2109910161 @default.
- W2735649811 cites W2121517924 @default.
- W2735649811 cites W2137104525 @default.
- W2735649811 cites W2138309709 @default.
- W2735649811 cites W2138537392 @default.
- W2735649811 cites W2139053308 @default.
- W2735649811 cites W2142176084 @default.
- W2735649811 cites W2142916680 @default.
- W2735649811 cites W2154032554 @default.
- W2735649811 cites W2167647761 @default.
- W2735649811 cites W2498991332 @default.
- W2735649811 cites W3103182070 @default.
- W2735649811 cites W4250979948 @default.
- W2735649811 doi "https://doi.org/10.1007/s10994-017-5657-1" @default.
- W2735649811 hasPublicationYear "2017" @default.
- W2735649811 type Work @default.
- W2735649811 sameAs 2735649811 @default.
- W2735649811 citedByCount "11" @default.
- W2735649811 countsByYear W27356498112017 @default.
- W2735649811 countsByYear W27356498112020 @default.
- W2735649811 countsByYear W27356498112021 @default.
- W2735649811 countsByYear W27356498112022 @default.
- W2735649811 crossrefType "journal-article" @default.
- W2735649811 hasAuthorship W2735649811A5057277609 @default.
- W2735649811 hasAuthorship W2735649811A5073193182 @default.
- W2735649811 hasAuthorship W2735649811A5083076675 @default.
- W2735649811 hasBestOaLocation W27356498111 @default.
- W2735649811 hasConcept C106301342 @default.
- W2735649811 hasConcept C119857082 @default.
- W2735649811 hasConcept C121332964 @default.
- W2735649811 hasConcept C126255220 @default.
- W2735649811 hasConcept C154945302 @default.
- W2735649811 hasConcept C2779436431 @default.
- W2735649811 hasConcept C33923547 @default.
- W2735649811 hasConcept C41008148 @default.
- W2735649811 hasConcept C62520636 @default.
- W2735649811 hasConcept C97541855 @default.
- W2735649811 hasConceptScore W2735649811C106301342 @default.
- W2735649811 hasConceptScore W2735649811C119857082 @default.
- W2735649811 hasConceptScore W2735649811C121332964 @default.
- W2735649811 hasConceptScore W2735649811C126255220 @default.
- W2735649811 hasConceptScore W2735649811C154945302 @default.
- W2735649811 hasConceptScore W2735649811C2779436431 @default.
- W2735649811 hasConceptScore W2735649811C33923547 @default.
- W2735649811 hasConceptScore W2735649811C41008148 @default.
- W2735649811 hasConceptScore W2735649811C62520636 @default.
- W2735649811 hasConceptScore W2735649811C97541855 @default.
- W2735649811 hasFunder F4320334960 @default.
- W2735649811 hasIssue "9-10" @default.
- W2735649811 hasLocation W27356498111 @default.
- W2735649811 hasLocation W27356498112 @default.
- W2735649811 hasOpenAccess W2735649811 @default.
- W2735649811 hasPrimaryLocation W27356498111 @default.
- W2735649811 hasRelatedWork W260766989 @default.
- W2735649811 hasRelatedWork W2959276766 @default.
- W2735649811 hasRelatedWork W2961085424 @default.
- W2735649811 hasRelatedWork W3037422413 @default.
- W2735649811 hasRelatedWork W3111983280 @default.
- W2735649811 hasRelatedWork W3139193008 @default.
- W2735649811 hasRelatedWork W4206669594 @default.
- W2735649811 hasRelatedWork W4295941380 @default.
- W2735649811 hasRelatedWork W4306674287 @default.
- W2735649811 hasRelatedWork W4319083788 @default.
- W2735649811 hasVolume "106" @default.
- W2735649811 isParatext "false" @default.
- W2735649811 isRetracted "false" @default.
- W2735649811 magId "2735649811" @default.
- W2735649811 workType "article" @default.