Matches in SemOpenAlex for { <https://semopenalex.org/work/W3104994094> ?p ?o ?g. }
- W3104994094 abstract "Policy gradients-based reinforcement learning has proven to be a promising approach for directly optimizing non-differentiable evaluation metrics for language generation tasks. However, optimizing for a specific metric reward leads to improvements in mostly that metric only, suggesting that the model is gaming the formulation of that metric in a particular way without often achieving real qualitative improvements. Hence, it is more beneficial to make the model optimize multiple diverse metric rewards jointly. While appealing, this is challenging because one needs to manually decide the importance and scaling weights of these metric rewards. Further, it is important to consider using a dynamic combination and curriculum of metric rewards that flexibly changes over time. Considering the above aspects, in our work, we automate the optimization of multiple metric rewards simultaneously via a multi-armed bandit approach (DORB), where at each round, the bandit chooses which metric reward to optimize next, based on expected arm gains. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit). We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks: question generation and data-to-text generation, including on an unseen-test transfer setup. Finally, we present interpretable analyses of the learned bandit curriculum over the optimized rewards." @default.
- W3104994094 created "2020-11-23" @default.
- W3104994094 creator A5001987532 @default.
- W3104994094 creator A5041591669 @default.
- W3104994094 creator A5075564427 @default.
- W3104994094 date "2020-11-15" @default.
- W3104994094 modified "2023-09-27" @default.
- W3104994094 title "DORB: Dynamically Optimizing Multiple Rewards with Bandits" @default.
- W3104994094 cites W1514535095 @default.
- W3104994094 cites W1591706642 @default.
- W3104994094 cites W1636687449 @default.
- W3104994094 cites W164706946 @default.
- W3104994094 cites W1843891098 @default.
- W3104994094 cites W1895577753 @default.
- W3104994094 cites W1902237438 @default.
- W3104994094 cites W1948566616 @default.
- W3104994094 cites W1985610876 @default.
- W3104994094 cites W1995945562 @default.
- W3104994094 cites W1998498767 @default.
- W3104994094 cites W2016522586 @default.
- W3104994094 cites W2049934117 @default.
- W3104994094 cites W2054362711 @default.
- W3104994094 cites W2064675550 @default.
- W3104994094 cites W2077902449 @default.
- W3104994094 cites W2096389047 @default.
- W3104994094 cites W2101105183 @default.
- W3104994094 cites W2107726111 @default.
- W3104994094 cites W2108738385 @default.
- W3104994094 cites W2112420033 @default.
- W3104994094 cites W2116716943 @default.
- W3104994094 cites W2121863487 @default.
- W3104994094 cites W2130942839 @default.
- W3104994094 cites W2133459682 @default.
- W3104994094 cites W2139501017 @default.
- W3104994094 cites W2142971854 @default.
- W3104994094 cites W2149327368 @default.
- W3104994094 cites W2154652894 @default.
- W3104994094 cites W2154764394 @default.
- W3104994094 cites W2165713838 @default.
- W3104994094 cites W2168405694 @default.
- W3104994094 cites W2204302769 @default.
- W3104994094 cites W2257238407 @default.
- W3104994094 cites W2300537905 @default.
- W3104994094 cites W2410983263 @default.
- W3104994094 cites W2467173223 @default.
- W3104994094 cites W2507756961 @default.
- W3104994094 cites W2525778437 @default.
- W3104994094 cites W2578330760 @default.
- W3104994094 cites W2593765977 @default.
- W3104994094 cites W2605243085 @default.
- W3104994094 cites W2606333299 @default.
- W3104994094 cites W2606974598 @default.
- W3104994094 cites W2607151106 @default.
- W3104994094 cites W2610891036 @default.
- W3104994094 cites W2612675303 @default.
- W3104994094 cites W2739046565 @default.
- W3104994094 cites W2740747242 @default.
- W3104994094 cites W2753613501 @default.
- W3104994094 cites W2786660442 @default.
- W3104994094 cites W2797563284 @default.
- W3104994094 cites W2798552002 @default.
- W3104994094 cites W2804292122 @default.
- W3104994094 cites W2888812214 @default.
- W3104994094 cites W2889670144 @default.
- W3104994094 cites W2890166583 @default.
- W3104994094 cites W2891946694 @default.
- W3104994094 cites W2897164851 @default.
- W3104994094 cites W2907386760 @default.
- W3104994094 cites W2911857455 @default.
- W3104994094 cites W2927688238 @default.
- W3104994094 cites W2950178297 @default.
- W3104994094 cites W2950700230 @default.
- W3104994094 cites W2951309718 @default.
- W3104994094 cites W2962717047 @default.
- W3104994094 cites W2962832505 @default.
- W3104994094 cites W2962883855 @default.
- W3104994094 cites W2962937869 @default.
- W3104994094 cites W2962944953 @default.
- W3104994094 cites W2962950136 @default.
- W3104994094 cites W2962977247 @default.
- W3104994094 cites W2962985882 @default.
- W3104994094 cites W2963084599 @default.
- W3104994094 cites W2963091658 @default.
- W3104994094 cites W2963141266 @default.
- W3104994094 cites W2963167310 @default.
- W3104994094 cites W2963177403 @default.
- W3104994094 cites W2963248296 @default.
- W3104994094 cites W2963341956 @default.
- W3104994094 cites W2963351113 @default.
- W3104994094 cites W2963385935 @default.
- W3104994094 cites W2963695785 @default.
- W3104994094 cites W2963748441 @default.
- W3104994094 cites W2963815651 @default.
- W3104994094 cites W2963846996 @default.
- W3104994094 cites W2963929190 @default.
- W3104994094 cites W2964024811 @default.
- W3104994094 cites W2964121744 @default.
- W3104994094 cites W2964165364 @default.
- W3104994094 cites W2964308564 @default.
- W3104994094 cites W2964327384 @default.