Matches in SemOpenAlex for { <https://semopenalex.org/work/W3100103823> ?p ?o ?g. }
- W3100103823 abstract "Policy gradients-based reinforcement learning has proven to be a promising approach for directly optimizing non-differentiable evaluation metrics for language generation tasks. However, optimizing for a specific metric reward leads to improvements in mostly that metric only, suggesting that the model is gaming the formulation of that metric in a particular way without often achieving real qualitative improvements. Hence, it is more beneficial to make the model optimize multiple diverse metric rewards jointly. While appealing, this is challenging because one needs to manually decide the importance and scaling weights of these metric rewards. Further, it is important to consider using a dynamic combination and curriculum of metric rewards that flexibly changes over time. Considering the above aspects, in our work, we automate the optimization of multiple metric rewards simultaneously via a multi-armed bandit approach (DORB), where at each round, the bandit chooses which metric reward to optimize next, based on expected arm gains. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit). We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks: question generation and data-to-text generation. Finally, we present interpretable analyses of the learned bandit curriculum over the optimized rewards." @default.
- W3100103823 created "2020-11-23" @default.
- W3100103823 creator A5001987532 @default.
- W3100103823 creator A5041591669 @default.
- W3100103823 creator A5075564427 @default.
- W3100103823 date "2020-01-01" @default.
- W3100103823 modified "2023-09-24" @default.
- W3100103823 title "DORB: Dynamically Optimizing Multiple Rewards with Bandits" @default.
- W3100103823 cites W1514535095 @default.
- W3100103823 cites W1591706642 @default.
- W3100103823 cites W164706946 @default.
- W3100103823 cites W1843891098 @default.
- W3100103823 cites W1895577753 @default.
- W3100103823 cites W1902237438 @default.
- W3100103823 cites W1948566616 @default.
- W3100103823 cites W1985610876 @default.
- W3100103823 cites W1995945562 @default.
- W3100103823 cites W1998498767 @default.
- W3100103823 cites W2016522586 @default.
- W3100103823 cites W2049934117 @default.
- W3100103823 cites W2054362711 @default.
- W3100103823 cites W2064675550 @default.
- W3100103823 cites W2077902449 @default.
- W3100103823 cites W2096389047 @default.
- W3100103823 cites W2101105183 @default.
- W3100103823 cites W2107726111 @default.
- W3100103823 cites W2108738385 @default.
- W3100103823 cites W2112420033 @default.
- W3100103823 cites W2116716943 @default.
- W3100103823 cites W2121863487 @default.
- W3100103823 cites W2130942839 @default.
- W3100103823 cites W2133459682 @default.
- W3100103823 cites W2139501017 @default.
- W3100103823 cites W2142971854 @default.
- W3100103823 cites W2149327368 @default.
- W3100103823 cites W2154652894 @default.
- W3100103823 cites W2154764394 @default.
- W3100103823 cites W2165713838 @default.
- W3100103823 cites W2168405694 @default.
- W3100103823 cites W2204302769 @default.
- W3100103823 cites W2257238407 @default.
- W3100103823 cites W2300537905 @default.
- W3100103823 cites W2467173223 @default.
- W3100103823 cites W2507756961 @default.
- W3100103823 cites W2578330760 @default.
- W3100103823 cites W2593765977 @default.
- W3100103823 cites W2605243085 @default.
- W3100103823 cites W2606333299 @default.
- W3100103823 cites W2606974598 @default.
- W3100103823 cites W2607151106 @default.
- W3100103823 cites W2610891036 @default.
- W3100103823 cites W2739046565 @default.
- W3100103823 cites W2740747242 @default.
- W3100103823 cites W2753613501 @default.
- W3100103823 cites W2786660442 @default.
- W3100103823 cites W2798552002 @default.
- W3100103823 cites W2804292122 @default.
- W3100103823 cites W2888812214 @default.
- W3100103823 cites W2889670144 @default.
- W3100103823 cites W2890166583 @default.
- W3100103823 cites W2891946694 @default.
- W3100103823 cites W2897164851 @default.
- W3100103823 cites W2907386760 @default.
- W3100103823 cites W2911857455 @default.
- W3100103823 cites W2914397182 @default.
- W3100103823 cites W2927688238 @default.
- W3100103823 cites W2949376505 @default.
- W3100103823 cites W2951309718 @default.
- W3100103823 cites W2962717047 @default.
- W3100103823 cites W2962832505 @default.
- W3100103823 cites W2962883855 @default.
- W3100103823 cites W2962937869 @default.
- W3100103823 cites W2962944953 @default.
- W3100103823 cites W2962950136 @default.
- W3100103823 cites W2962977247 @default.
- W3100103823 cites W2962985882 @default.
- W3100103823 cites W2963084599 @default.
- W3100103823 cites W2963091658 @default.
- W3100103823 cites W2963141266 @default.
- W3100103823 cites W2963167310 @default.
- W3100103823 cites W2963177403 @default.
- W3100103823 cites W2963248296 @default.
- W3100103823 cites W2963341956 @default.
- W3100103823 cites W2963351113 @default.
- W3100103823 cites W2963385935 @default.
- W3100103823 cites W2963695785 @default.
- W3100103823 cites W2963748441 @default.
- W3100103823 cites W2963815651 @default.
- W3100103823 cites W2963846996 @default.
- W3100103823 cites W2963929190 @default.
- W3100103823 cites W2964024811 @default.
- W3100103823 cites W2964121744 @default.
- W3100103823 cites W2964165364 @default.
- W3100103823 cites W2964308564 @default.
- W3100103823 cites W2964327384 @default.
- W3100103823 cites W2964330417 @default.
- W3100103823 cites W2965373594 @default.
- W3100103823 cites W2983655111 @default.
- W3100103823 cites W3034987089 @default.
- W3100103823 cites W35251828 @default.