Matches in SemOpenAlex for { <https://semopenalex.org/work/W3105184920> ?p ?o ?g. }
Showing items 1 to 93 of
93
with 100 items per page.
- W3105184920 abstract "Reinforcement learning methods have emerged as a popular choice for training an efficient and effective dialogue policy. However, these methods suffer from sparse and unstable reward signals returned by a user simulator only when a dialogue finishes. Besides, the reward signal is manually designed by human experts, which requires domain knowledge. Recently, a number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy. However, to alternatively update the dialogue policy and the reward model on the fly, we are limited to policy-gradient-based algorithms, such as REINFORCE and PPO. Moreover, the alternating training of a dialogue agent and the reward model can easily get stuck in local optima or result in mode collapse. To overcome the listed issues, we propose to decompose the adversarial training into two steps. First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common reinforcement learning method to guide the dialogue policy learning. This approach is applicable to both on-policy and off-policy reinforcement learning methods. Based on our extensive experimentation, we can conclude the proposed method: (1) achieves a remarkable task success rate using both on-policy and off-policy reinforcement learning methods; and (2) has potential to transfer knowledge from existing domains to a new domain." @default.
- W3105184920 created "2020-11-23" @default.
- W3105184920 creator A5014167035 @default.
- W3105184920 creator A5024631418 @default.
- W3105184920 creator A5026947055 @default.
- W3105184920 creator A5037204705 @default.
- W3105184920 creator A5047233371 @default.
- W3105184920 creator A5059051692 @default.
- W3105184920 creator A5061162179 @default.
- W3105184920 creator A5066404470 @default.
- W3105184920 date "2020-01-01" @default.
- W3105184920 modified "2023-10-17" @default.
- W3105184920 title "Guided Dialogue Policy Learning without Adversarial Learning in the Loop" @default.
- W3105184920 cites W10548402 @default.
- W3105184920 cites W2047335008 @default.
- W3105184920 cites W2062175565 @default.
- W3105184920 cites W2099471712 @default.
- W3105184920 cites W2145339207 @default.
- W3105184920 cites W2396229782 @default.
- W3105184920 cites W2438667436 @default.
- W3105184920 cites W2547875792 @default.
- W3105184920 cites W2571927164 @default.
- W3105184920 cites W2594726847 @default.
- W3105184920 cites W2736601468 @default.
- W3105184920 cites W2765111838 @default.
- W3105184920 cites W2798494119 @default.
- W3105184920 cites W2806936550 @default.
- W3105184920 cites W2889186204 @default.
- W3105184920 cites W2962996309 @default.
- W3105184920 cites W2963068985 @default.
- W3105184920 cites W2963277051 @default.
- W3105184920 cites W2963712524 @default.
- W3105184920 cites W2964044380 @default.
- W3105184920 cites W2964268978 @default.
- W3105184920 cites W2970828515 @default.
- W3105184920 cites W3104546989 @default.
- W3105184920 cites W3121541553 @default.
- W3105184920 cites W648786980 @default.
- W3105184920 doi "https://doi.org/10.18653/v1/2020.findings-emnlp.209" @default.
- W3105184920 hasPublicationYear "2020" @default.
- W3105184920 type Work @default.
- W3105184920 sameAs 3105184920 @default.
- W3105184920 citedByCount "8" @default.
- W3105184920 countsByYear W31051849202021 @default.
- W3105184920 countsByYear W31051849202023 @default.
- W3105184920 crossrefType "proceedings-article" @default.
- W3105184920 hasAuthorship W3105184920A5014167035 @default.
- W3105184920 hasAuthorship W3105184920A5024631418 @default.
- W3105184920 hasAuthorship W3105184920A5026947055 @default.
- W3105184920 hasAuthorship W3105184920A5037204705 @default.
- W3105184920 hasAuthorship W3105184920A5047233371 @default.
- W3105184920 hasAuthorship W3105184920A5059051692 @default.
- W3105184920 hasAuthorship W3105184920A5061162179 @default.
- W3105184920 hasAuthorship W3105184920A5066404470 @default.
- W3105184920 hasBestOaLocation W31051849201 @default.
- W3105184920 hasConcept C114614502 @default.
- W3105184920 hasConcept C119857082 @default.
- W3105184920 hasConcept C154945302 @default.
- W3105184920 hasConcept C184670325 @default.
- W3105184920 hasConcept C186886427 @default.
- W3105184920 hasConcept C2779436431 @default.
- W3105184920 hasConcept C33923547 @default.
- W3105184920 hasConcept C37736160 @default.
- W3105184920 hasConcept C38652104 @default.
- W3105184920 hasConcept C41008148 @default.
- W3105184920 hasConceptScore W3105184920C114614502 @default.
- W3105184920 hasConceptScore W3105184920C119857082 @default.
- W3105184920 hasConceptScore W3105184920C154945302 @default.
- W3105184920 hasConceptScore W3105184920C184670325 @default.
- W3105184920 hasConceptScore W3105184920C186886427 @default.
- W3105184920 hasConceptScore W3105184920C2779436431 @default.
- W3105184920 hasConceptScore W3105184920C33923547 @default.
- W3105184920 hasConceptScore W3105184920C37736160 @default.
- W3105184920 hasConceptScore W3105184920C38652104 @default.
- W3105184920 hasConceptScore W3105184920C41008148 @default.
- W3105184920 hasLocation W31051849201 @default.
- W3105184920 hasLocation W31051849202 @default.
- W3105184920 hasOpenAccess W3105184920 @default.
- W3105184920 hasPrimaryLocation W31051849201 @default.
- W3105184920 hasRelatedWork W2903917280 @default.
- W3105184920 hasRelatedWork W2961085424 @default.
- W3105184920 hasRelatedWork W2980092132 @default.
- W3105184920 hasRelatedWork W3024390022 @default.
- W3105184920 hasRelatedWork W4229335043 @default.
- W3105184920 hasRelatedWork W4286629047 @default.
- W3105184920 hasRelatedWork W4306674287 @default.
- W3105184920 hasRelatedWork W4308860828 @default.
- W3105184920 hasRelatedWork W4312306468 @default.
- W3105184920 hasRelatedWork W4224009465 @default.
- W3105184920 isParatext "false" @default.
- W3105184920 isRetracted "false" @default.
- W3105184920 magId "3105184920" @default.
- W3105184920 workType "article" @default.