Matches in SemOpenAlex for { <https://semopenalex.org/work/W4319346057> ?p ?o ?g. }
- W4319346057 abstract "Formulating dialogue policy as a reinforcement learning (RL) task enables a dialogue system to act optimally by interacting with humans. However, typical RL-based methods normally suffer from challenges such as sparse and delayed reward problems. Besides, with user goal unavailable in real scenarios, the reward estimator is unable to generate reward reflecting action validity and task completion. Those issues may slow down and degrade the policy learning significantly. In this paper, we present a novel scheduled knowledge distillation framework for dialogue policy learning, which trains a compact student reward estimator by distilling the prior knowledge of user goals from a large teacher model. To further improve the stability of dialogue policy learning, we propose to leverage self-paced learning to arrange meaningful training order for the student reward estimator. Comprehensive experiments on Microsoft Dialogue Challenge and MultiWOZ datasets indicate that our approach significantly accelerates the learning speed, and the task-completion success rate can be improved from 0.47%∼9.01% compared with several strong baselines." @default.
- W4319346057 created "2023-02-08" @default.
- W4319346057 creator A5013385005 @default.
- W4319346057 creator A5068072785 @default.
- W4319346057 creator A5080814061 @default.
- W4319346057 date "2023-02-07" @default.
- W4319346057 modified "2023-09-26" @default.
- W4319346057 title "Reward estimation with scheduled knowledge distillation for dialogue policy learning" @default.
- W4319346057 cites W1491843047 @default.
- W4319346057 cites W2120045257 @default.
- W4319346057 cites W2145339207 @default.
- W4319346057 cites W2296073425 @default.
- W4319346057 cites W2765111838 @default.
- W4319346057 cites W2765811365 @default.
- W4319346057 cites W2782554945 @default.
- W4319346057 cites W2792533596 @default.
- W4319346057 cites W2798494119 @default.
- W4319346057 cites W2887842788 @default.
- W4319346057 cites W2889186204 @default.
- W4319346057 cites W2904206140 @default.
- W4319346057 cites W2923622379 @default.
- W4319346057 cites W2949769095 @default.
- W4319346057 cites W2951805158 @default.
- W4319346057 cites W2963068985 @default.
- W4319346057 cites W2964006684 @default.
- W4319346057 cites W2964195121 @default.
- W4319346057 cites W2970473306 @default.
- W4319346057 cites W2998458199 @default.
- W4319346057 cites W3032061229 @default.
- W4319346057 cites W3034368386 @default.
- W4319346057 cites W3035755323 @default.
- W4319346057 cites W3037879762 @default.
- W4319346057 cites W3105781833 @default.
- W4319346057 cites W3110290422 @default.
- W4319346057 cites W3173417753 @default.
- W4319346057 cites W3174681272 @default.
- W4319346057 cites W3175095351 @default.
- W4319346057 cites W3175458729 @default.
- W4319346057 cites W3175835943 @default.
- W4319346057 cites W3184489105 @default.
- W4319346057 cites W3188982020 @default.
- W4319346057 cites W3190743547 @default.
- W4319346057 cites W3205071568 @default.
- W4319346057 cites W3207133764 @default.
- W4319346057 cites W3212099586 @default.
- W4319346057 cites W4210322717 @default.
- W4319346057 cites W4285336835 @default.
- W4319346057 cites W4287854308 @default.
- W4319346057 cites W4287854678 @default.
- W4319346057 cites W4290742115 @default.
- W4319346057 doi "https://doi.org/10.1080/09540091.2023.2174078" @default.
- W4319346057 hasPublicationYear "2023" @default.
- W4319346057 type Work @default.
- W4319346057 citedByCount "0" @default.
- W4319346057 crossrefType "journal-article" @default.
- W4319346057 hasAuthorship W4319346057A5013385005 @default.
- W4319346057 hasAuthorship W4319346057A5068072785 @default.
- W4319346057 hasAuthorship W4319346057A5080814061 @default.
- W4319346057 hasBestOaLocation W43193460571 @default.
- W4319346057 hasConcept C105795698 @default.
- W4319346057 hasConcept C112972136 @default.
- W4319346057 hasConcept C119857082 @default.
- W4319346057 hasConcept C153083717 @default.
- W4319346057 hasConcept C154945302 @default.
- W4319346057 hasConcept C162324750 @default.
- W4319346057 hasConcept C185429906 @default.
- W4319346057 hasConcept C187736073 @default.
- W4319346057 hasConcept C190839683 @default.
- W4319346057 hasConcept C205649164 @default.
- W4319346057 hasConcept C2779436431 @default.
- W4319346057 hasConcept C2780451532 @default.
- W4319346057 hasConcept C33923547 @default.
- W4319346057 hasConcept C41008148 @default.
- W4319346057 hasConcept C58640448 @default.
- W4319346057 hasConcept C97541855 @default.
- W4319346057 hasConceptScore W4319346057C105795698 @default.
- W4319346057 hasConceptScore W4319346057C112972136 @default.
- W4319346057 hasConceptScore W4319346057C119857082 @default.
- W4319346057 hasConceptScore W4319346057C153083717 @default.
- W4319346057 hasConceptScore W4319346057C154945302 @default.
- W4319346057 hasConceptScore W4319346057C162324750 @default.
- W4319346057 hasConceptScore W4319346057C185429906 @default.
- W4319346057 hasConceptScore W4319346057C187736073 @default.
- W4319346057 hasConceptScore W4319346057C190839683 @default.
- W4319346057 hasConceptScore W4319346057C205649164 @default.
- W4319346057 hasConceptScore W4319346057C2779436431 @default.
- W4319346057 hasConceptScore W4319346057C2780451532 @default.
- W4319346057 hasConceptScore W4319346057C33923547 @default.
- W4319346057 hasConceptScore W4319346057C41008148 @default.
- W4319346057 hasConceptScore W4319346057C58640448 @default.
- W4319346057 hasConceptScore W4319346057C97541855 @default.
- W4319346057 hasIssue "1" @default.
- W4319346057 hasLocation W43193460571 @default.
- W4319346057 hasOpenAccess W4319346057 @default.
- W4319346057 hasPrimaryLocation W43193460571 @default.
- W4319346057 hasRelatedWork W2591697182 @default.
- W4319346057 hasRelatedWork W2751744993 @default.
- W4319346057 hasRelatedWork W3038067716 @default.
- W4319346057 hasRelatedWork W3044461295 @default.
- W4319346057 hasRelatedWork W3105333898 @default.