Matches in SemOpenAlex for { <https://semopenalex.org/work/W3131687345> ?p ?o ?g. }
- W3131687345 abstract "Double Q-learning is a classical method for reducing overestimation bias, which is caused by taking maximum estimated values in the Bellman operation. Its variants in the deep Q-learning paradigm have shown great promise in producing reliable value prediction and improving learning performance. However, as shown by prior work, double Q-learning is not fully unbiased and suffers from underestimation bias. In this paper, we show that such underestimation bias may lead to multiple non-optimal fixed points under an approximate Bellman operator. To address the concerns of converging to non-optimal stationary solutions, we propose a simple but effective approach as a partial fix for the underestimation bias in double Q-learning. This approach leverages an approximate dynamic programming to bound the target value. We extensively evaluate our proposed method in the Atari benchmark tasks and demonstrate its significant improvement over baseline algorithms." @default.
- W3131687345 created "2021-03-01" @default.
- W3131687345 creator A5010176958 @default.
- W3131687345 creator A5012346744 @default.
- W3131687345 creator A5037494860 @default.
- W3131687345 creator A5067621896 @default.
- W3131687345 creator A5082075906 @default.
- W3131687345 creator A5087540449 @default.
- W3131687345 date "2021-09-29" @default.
- W3131687345 modified "2023-09-23" @default.
- W3131687345 title "On the Estimation Bias in Double Q-Learning" @default.
- W3131687345 cites W166862392 @default.
- W3131687345 cites W1972418517 @default.
- W3131687345 cites W2027621512 @default.
- W3131687345 cites W2048226872 @default.
- W3131687345 cites W2112081648 @default.
- W3131687345 cites W2145339207 @default.
- W3131687345 cites W2155968351 @default.
- W3131687345 cites W2159309155 @default.
- W3131687345 cites W2173564293 @default.
- W3131687345 cites W2341171179 @default.
- W3131687345 cites W2397240726 @default.
- W3131687345 cites W2436711315 @default.
- W3131687345 cites W2594466397 @default.
- W3131687345 cites W2596758708 @default.
- W3131687345 cites W2740912559 @default.
- W3131687345 cites W2809162153 @default.
- W3131687345 cites W2902098903 @default.
- W3131687345 cites W2905342215 @default.
- W3131687345 cites W2924131335 @default.
- W3131687345 cites W2927603314 @default.
- W3131687345 cites W2945159000 @default.
- W3131687345 cites W2962902376 @default.
- W3131687345 cites W2963092340 @default.
- W3131687345 cites W2963169817 @default.
- W3131687345 cites W2963267001 @default.
- W3131687345 cites W2963484919 @default.
- W3131687345 cites W2963704132 @default.
- W3131687345 cites W2963864421 @default.
- W3131687345 cites W2964082094 @default.
- W3131687345 cites W2964106499 @default.
- W3131687345 cites W2964291307 @default.
- W3131687345 cites W2964547635 @default.
- W3131687345 cites W2970961171 @default.
- W3131687345 cites W2978943496 @default.
- W3131687345 cites W2995509794 @default.
- W3131687345 cites W3011120880 @default.
- W3131687345 cites W3034440351 @default.
- W3131687345 cites W3035559482 @default.
- W3131687345 cites W3104956673 @default.
- W3131687345 cites W3121786643 @default.
- W3131687345 cites W3134294674 @default.
- W3131687345 cites W3178284873 @default.
- W3131687345 cites W51508254 @default.
- W3131687345 cites W241862747 @default.
- W3131687345 cites W2996624412 @default.
- W3131687345 cites W3089091950 @default.
- W3131687345 doi "https://doi.org/10.48550/arxiv.2109.14419" @default.
- W3131687345 hasPublicationYear "2021" @default.
- W3131687345 type Work @default.
- W3131687345 sameAs 3131687345 @default.
- W3131687345 citedByCount "0" @default.
- W3131687345 crossrefType "posted-content" @default.
- W3131687345 hasAuthorship W3131687345A5010176958 @default.
- W3131687345 hasAuthorship W3131687345A5012346744 @default.
- W3131687345 hasAuthorship W3131687345A5037494860 @default.
- W3131687345 hasAuthorship W3131687345A5067621896 @default.
- W3131687345 hasAuthorship W3131687345A5082075906 @default.
- W3131687345 hasAuthorship W3131687345A5087540449 @default.
- W3131687345 hasBestOaLocation W31316873451 @default.
- W3131687345 hasConcept C104317684 @default.
- W3131687345 hasConcept C111472728 @default.
- W3131687345 hasConcept C11413529 @default.
- W3131687345 hasConcept C119857082 @default.
- W3131687345 hasConcept C126255220 @default.
- W3131687345 hasConcept C13280743 @default.
- W3131687345 hasConcept C138885662 @default.
- W3131687345 hasConcept C154945302 @default.
- W3131687345 hasConcept C158448853 @default.
- W3131687345 hasConcept C17020691 @default.
- W3131687345 hasConcept C185592680 @default.
- W3131687345 hasConcept C185798385 @default.
- W3131687345 hasConcept C205649164 @default.
- W3131687345 hasConcept C2776291640 @default.
- W3131687345 hasConcept C2780586882 @default.
- W3131687345 hasConcept C33923547 @default.
- W3131687345 hasConcept C41008148 @default.
- W3131687345 hasConcept C55493867 @default.
- W3131687345 hasConcept C86339819 @default.
- W3131687345 hasConceptScore W3131687345C104317684 @default.
- W3131687345 hasConceptScore W3131687345C111472728 @default.
- W3131687345 hasConceptScore W3131687345C11413529 @default.
- W3131687345 hasConceptScore W3131687345C119857082 @default.
- W3131687345 hasConceptScore W3131687345C126255220 @default.
- W3131687345 hasConceptScore W3131687345C13280743 @default.
- W3131687345 hasConceptScore W3131687345C138885662 @default.
- W3131687345 hasConceptScore W3131687345C154945302 @default.
- W3131687345 hasConceptScore W3131687345C158448853 @default.
- W3131687345 hasConceptScore W3131687345C17020691 @default.
- W3131687345 hasConceptScore W3131687345C185592680 @default.