Matches in SemOpenAlex for { <https://semopenalex.org/work/W4280490439> ?p ?o ?g. }
Showing items 1 to 56 of
56
with 100 items per page.
- W4280490439 abstract "Maximum Tsallis entropy (MTE) framework in reinforcement learning has gained popularity recently by virtue of its flexible modeling choices including the widely used Shannon entropy and sparse entropy. However, non-Shannon entropies suffer from approximation error and subsequent underperformance either due to its sensitivity or the lack of closed-form policy expression. To improve the tradeoff between flexibility and empirical performance, we propose to strengthen their error-robustness by enforcing implicit Kullback-Leibler (KL) regularization in MTE motivated by Munchausen DQN (MDQN). We do so by drawing connection between MDQN and advantage learning, by which MDQN is shown to fail on generalizing to the MTE framework. The proposed method Tsallis Advantage Learning (TAL) is verified on extensive experiments to not only significantly improve upon Tsallis-DQN for various non-closed-form Tsallis entropies, but also exhibits comparable performance to state-of-the-art maximum Shannon entropy algorithms." @default.
- W4280490439 created "2022-05-22" @default.
- W4280490439 creator A5015522949 @default.
- W4280490439 creator A5031054137 @default.
- W4280490439 creator A5032769690 @default.
- W4280490439 creator A5042074952 @default.
- W4280490439 date "2022-05-16" @default.
- W4280490439 modified "2023-09-27" @default.
- W4280490439 title "Enforcing KL Regularization in General Tsallis Entropy Reinforcement Learning via Advantage Learning" @default.
- W4280490439 doi "https://doi.org/10.48550/arxiv.2205.07885" @default.
- W4280490439 hasPublicationYear "2022" @default.
- W4280490439 type Work @default.
- W4280490439 citedByCount "0" @default.
- W4280490439 crossrefType "posted-content" @default.
- W4280490439 hasAuthorship W4280490439A5015522949 @default.
- W4280490439 hasAuthorship W4280490439A5031054137 @default.
- W4280490439 hasAuthorship W4280490439A5032769690 @default.
- W4280490439 hasAuthorship W4280490439A5042074952 @default.
- W4280490439 hasBestOaLocation W42804904391 @default.
- W4280490439 hasConcept C106301342 @default.
- W4280490439 hasConcept C117521176 @default.
- W4280490439 hasConcept C121332964 @default.
- W4280490439 hasConcept C153180895 @default.
- W4280490439 hasConcept C154945302 @default.
- W4280490439 hasConcept C171752962 @default.
- W4280490439 hasConcept C33923547 @default.
- W4280490439 hasConcept C41008148 @default.
- W4280490439 hasConcept C97355855 @default.
- W4280490439 hasConcept C97541855 @default.
- W4280490439 hasConceptScore W4280490439C106301342 @default.
- W4280490439 hasConceptScore W4280490439C117521176 @default.
- W4280490439 hasConceptScore W4280490439C121332964 @default.
- W4280490439 hasConceptScore W4280490439C153180895 @default.
- W4280490439 hasConceptScore W4280490439C154945302 @default.
- W4280490439 hasConceptScore W4280490439C171752962 @default.
- W4280490439 hasConceptScore W4280490439C33923547 @default.
- W4280490439 hasConceptScore W4280490439C41008148 @default.
- W4280490439 hasConceptScore W4280490439C97355855 @default.
- W4280490439 hasConceptScore W4280490439C97541855 @default.
- W4280490439 hasLocation W42804904391 @default.
- W4280490439 hasLocation W42804904392 @default.
- W4280490439 hasOpenAccess W4280490439 @default.
- W4280490439 hasPrimaryLocation W42804904391 @default.
- W4280490439 hasRelatedWork W1690866424 @default.
- W4280490439 hasRelatedWork W1806343411 @default.
- W4280490439 hasRelatedWork W1977649504 @default.
- W4280490439 hasRelatedWork W1979804848 @default.
- W4280490439 hasRelatedWork W2013307082 @default.
- W4280490439 hasRelatedWork W2015333754 @default.
- W4280490439 hasRelatedWork W3044212476 @default.
- W4280490439 hasRelatedWork W4312226159 @default.
- W4280490439 hasRelatedWork W4380880413 @default.
- W4280490439 hasRelatedWork W2188424775 @default.
- W4280490439 isParatext "false" @default.
- W4280490439 isRetracted "false" @default.
- W4280490439 workType "article" @default.