Matches in SemOpenAlex for { <https://semopenalex.org/work/W3167672220> ?p ?o ?g. }
- W3167672220 abstract "Recently many algorithms were devised for reinforcement learning (RL) with function approximation. While they have clear algorithmic distinctions, they also have many implementation differences that are algorithm-independent and sometimes under-emphasized. Such mixing of algorithmic novelty and implementation craftsmanship makes rigorous analyses of the sources of performance improvements across algorithms difficult. In this work, we focus on a series of off-policy inference-based actor-critic algorithms -- MPO, AWR, and SAC -- to decouple their algorithmic innovations and implementation decisions. We present unified derivations through a single control-as-inference objective, where we can categorize each algorithm as based on either Expectation-Maximization (EM) or direct Kullback-Leibler (KL) divergence minimization and treat the rest of specifications as implementation details. We performed extensive ablation studies, and identified substantial performance drops whenever implementation details are mismatched for algorithmic choices. These results show which implementation or code details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh Gaussian policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO's performances but also transfer to noticeable gains in SAC. We hope our work can inspire future work to further demystify sources of performance improvements across multiple algorithms and allow researchers to build on one another's both algorithmic and implementational innovations." @default.
- W3167672220 created "2021-06-22" @default.
- W3167672220 creator A5061613634 @default.
- W3167672220 creator A5064007269 @default.
- W3167672220 creator A5070075141 @default.
- W3167672220 creator A5074059447 @default.
- W3167672220 creator A5083001889 @default.
- W3167672220 date "2021-03-31" @default.
- W3167672220 modified "2023-10-02" @default.
- W3167672220 title "Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning" @default.
- W3167672220 cites W1499669280 @default.
- W3167672220 cites W1757796397 @default.
- W3167672220 cites W1771410628 @default.
- W3167672220 cites W2107464055 @default.
- W3167672220 cites W2109169869 @default.
- W3167672220 cites W2123157758 @default.
- W3167672220 cites W2145060720 @default.
- W3167672220 cites W2155772159 @default.
- W3167672220 cites W2167117957 @default.
- W3167672220 cites W2211399972 @default.
- W3167672220 cites W2290354866 @default.
- W3167672220 cites W2296360731 @default.
- W3167672220 cites W2554984891 @default.
- W3167672220 cites W2558251412 @default.
- W3167672220 cites W2591984255 @default.
- W3167672220 cites W2594103415 @default.
- W3167672220 cites W2620671107 @default.
- W3167672220 cites W2736601468 @default.
- W3167672220 cites W2747402019 @default.
- W3167672220 cites W2767002724 @default.
- W3167672220 cites W2786303200 @default.
- W3167672220 cites W2799151646 @default.
- W3167672220 cites W2902286283 @default.
- W3167672220 cites W2904246096 @default.
- W3167672220 cites W2914920107 @default.
- W3167672220 cites W2952191563 @default.
- W3167672220 cites W2953708620 @default.
- W3167672220 cites W2962902376 @default.
- W3167672220 cites W2963120839 @default.
- W3167672220 cites W2963184621 @default.
- W3167672220 cites W2963267001 @default.
- W3167672220 cites W2963285578 @default.
- W3167672220 cites W2963641140 @default.
- W3167672220 cites W2963674921 @default.
- W3167672220 cites W2963864421 @default.
- W3167672220 cites W2963884015 @default.
- W3167672220 cites W2963923407 @default.
- W3167672220 cites W2963956018 @default.
- W3167672220 cites W2963962369 @default.
- W3167672220 cites W2964062135 @default.
- W3167672220 cites W2964291307 @default.
- W3167672220 cites W2964319760 @default.
- W3167672220 cites W2970599228 @default.
- W3167672220 cites W2971262355 @default.
- W3167672220 cites W2978455699 @default.
- W3167672220 cites W2995181668 @default.
- W3167672220 cites W2995706821 @default.
- W3167672220 cites W2995894173 @default.
- W3167672220 cites W2996251520 @default.
- W3167672220 cites W3014137283 @default.
- W3167672220 cites W3022566517 @default.
- W3167672220 cites W3027501728 @default.
- W3167672220 cites W3028821797 @default.
- W3167672220 cites W3032727894 @default.
- W3167672220 cites W3032773894 @default.
- W3167672220 cites W3034440351 @default.
- W3167672220 cites W3034448784 @default.
- W3167672220 cites W3034786558 @default.
- W3167672220 cites W3037440645 @default.
- W3167672220 cites W3081674421 @default.
- W3167672220 cites W3101283005 @default.
- W3167672220 cites W3120947768 @default.
- W3167672220 cites W3157836284 @default.
- W3167672220 hasPublicationYear "2021" @default.
- W3167672220 type Work @default.
- W3167672220 sameAs 3167672220 @default.
- W3167672220 citedByCount "0" @default.
- W3167672220 crossrefType "posted-content" @default.
- W3167672220 hasAuthorship W3167672220A5061613634 @default.
- W3167672220 hasAuthorship W3167672220A5064007269 @default.
- W3167672220 hasAuthorship W3167672220A5070075141 @default.
- W3167672220 hasAuthorship W3167672220A5074059447 @default.
- W3167672220 hasAuthorship W3167672220A5083001889 @default.
- W3167672220 hasBestOaLocation W31676722201 @default.
- W3167672220 hasConcept C11413529 @default.
- W3167672220 hasConcept C119857082 @default.
- W3167672220 hasConcept C136886441 @default.
- W3167672220 hasConcept C138885662 @default.
- W3167672220 hasConcept C144024400 @default.
- W3167672220 hasConcept C154945302 @default.
- W3167672220 hasConcept C19165224 @default.
- W3167672220 hasConcept C27206212 @default.
- W3167672220 hasConcept C2776214188 @default.
- W3167672220 hasConcept C2778738651 @default.
- W3167672220 hasConcept C41008148 @default.
- W3167672220 hasConcept C80444323 @default.
- W3167672220 hasConcept C97541855 @default.
- W3167672220 hasConceptScore W3167672220C11413529 @default.
- W3167672220 hasConceptScore W3167672220C119857082 @default.
- W3167672220 hasConceptScore W3167672220C136886441 @default.