Matches in SemOpenAlex for { <https://semopenalex.org/work/W2951222010> ?p ?o ?g. }
- W2951222010 abstract "In this work, we propose an adversarial learning method for reward estimation in reinforcement learning (RL) based task-oriented dialog models. Most of the current RL based task-oriented dialog systems require the access to a reward signal from either user feedback or user ratings. Such user ratings, however, may not always be consistent or available in practice. Furthermore, online dialog policy learning with RL typically requires a large number of queries to users, suffering from sample efficiency problem. To address these challenges, we propose an adversarial learning method to learn dialog rewards directly from dialog samples. Such rewards are further used to optimize the dialog policy with policy gradient based RL. In the evaluation in a restaurant search domain, we show that the proposed adversarial dialog learning method achieves advanced dialog success rate comparing to strong baseline methods. We further discuss the covariate shift problem in online adversarial dialog learning and show how we can address that with partial access to user feedback." @default.
- W2951222010 created "2019-06-27" @default.
- W2951222010 creator A5014528965 @default.
- W2951222010 creator A5028182466 @default.
- W2951222010 date "2018-05-29" @default.
- W2951222010 modified "2023-10-16" @default.
- W2951222010 title "Adversarial Learning of Task-Oriented Neural Dialog Models" @default.
- W2951222010 cites W1522301498 @default.
- W2951222010 cites W1932421248 @default.
- W2951222010 cites W1975244201 @default.
- W2951222010 cites W1996957559 @default.
- W2951222010 cites W2061562262 @default.
- W2951222010 cites W2099471712 @default.
- W2951222010 cites W2214131199 @default.
- W2951222010 cites W2251058040 @default.
- W2951222010 cites W2311783643 @default.
- W2951222010 cites W2412715517 @default.
- W2951222010 cites W2412899141 @default.
- W2951222010 cites W2434014514 @default.
- W2951222010 cites W2594726847 @default.
- W2951222010 cites W2601324753 @default.
- W2951222010 cites W2610395436 @default.
- W2951222010 cites W2765111838 @default.
- W2951222010 cites W2949252816 @default.
- W2951222010 cites W2949964922 @default.
- W2951222010 cites W2950483141 @default.
- W2951222010 cites W2951520714 @default.
- W2951222010 cites W2951523806 @default.
- W2951222010 cites W2962883855 @default.
- W2951222010 cites W2962957031 @default.
- W2951222010 cites W2963043030 @default.
- W2951222010 cites W2963068985 @default.
- W2951222010 cites W2963567240 @default.
- W2951222010 cites W2964044380 @default.
- W2951222010 cites W2964210218 @default.
- W2951222010 doi "https://doi.org/10.48550/arxiv.1805.11762" @default.
- W2951222010 hasPublicationYear "2018" @default.
- W2951222010 type Work @default.
- W2951222010 sameAs 2951222010 @default.
- W2951222010 citedByCount "2" @default.
- W2951222010 countsByYear W29512220102020 @default.
- W2951222010 countsByYear W29512220102021 @default.
- W2951222010 crossrefType "posted-content" @default.
- W2951222010 hasAuthorship W2951222010A5014528965 @default.
- W2951222010 hasAuthorship W2951222010A5028182466 @default.
- W2951222010 hasBestOaLocation W29512220101 @default.
- W2951222010 hasConcept C107457646 @default.
- W2951222010 hasConcept C111368507 @default.
- W2951222010 hasConcept C119857082 @default.
- W2951222010 hasConcept C12725497 @default.
- W2951222010 hasConcept C127313418 @default.
- W2951222010 hasConcept C134306372 @default.
- W2951222010 hasConcept C136764020 @default.
- W2951222010 hasConcept C154945302 @default.
- W2951222010 hasConcept C162324750 @default.
- W2951222010 hasConcept C173853756 @default.
- W2951222010 hasConcept C185592680 @default.
- W2951222010 hasConcept C187736073 @default.
- W2951222010 hasConcept C190954187 @default.
- W2951222010 hasConcept C198531522 @default.
- W2951222010 hasConcept C2779436431 @default.
- W2951222010 hasConcept C2780451532 @default.
- W2951222010 hasConcept C33923547 @default.
- W2951222010 hasConcept C36503486 @default.
- W2951222010 hasConcept C37736160 @default.
- W2951222010 hasConcept C41008148 @default.
- W2951222010 hasConcept C43617362 @default.
- W2951222010 hasConcept C97541855 @default.
- W2951222010 hasConceptScore W2951222010C107457646 @default.
- W2951222010 hasConceptScore W2951222010C111368507 @default.
- W2951222010 hasConceptScore W2951222010C119857082 @default.
- W2951222010 hasConceptScore W2951222010C12725497 @default.
- W2951222010 hasConceptScore W2951222010C127313418 @default.
- W2951222010 hasConceptScore W2951222010C134306372 @default.
- W2951222010 hasConceptScore W2951222010C136764020 @default.
- W2951222010 hasConceptScore W2951222010C154945302 @default.
- W2951222010 hasConceptScore W2951222010C162324750 @default.
- W2951222010 hasConceptScore W2951222010C173853756 @default.
- W2951222010 hasConceptScore W2951222010C185592680 @default.
- W2951222010 hasConceptScore W2951222010C187736073 @default.
- W2951222010 hasConceptScore W2951222010C190954187 @default.
- W2951222010 hasConceptScore W2951222010C198531522 @default.
- W2951222010 hasConceptScore W2951222010C2779436431 @default.
- W2951222010 hasConceptScore W2951222010C2780451532 @default.
- W2951222010 hasConceptScore W2951222010C33923547 @default.
- W2951222010 hasConceptScore W2951222010C36503486 @default.
- W2951222010 hasConceptScore W2951222010C37736160 @default.
- W2951222010 hasConceptScore W2951222010C41008148 @default.
- W2951222010 hasConceptScore W2951222010C43617362 @default.
- W2951222010 hasConceptScore W2951222010C97541855 @default.
- W2951222010 hasLocation W29512220101 @default.
- W2951222010 hasOpenAccess W2951222010 @default.
- W2951222010 hasPrimaryLocation W29512220101 @default.
- W2951222010 hasRelatedWork W1963944933 @default.
- W2951222010 hasRelatedWork W2001050921 @default.
- W2951222010 hasRelatedWork W203169905 @default.
- W2951222010 hasRelatedWork W2567345728 @default.
- W2951222010 hasRelatedWork W2755402024 @default.
- W2951222010 hasRelatedWork W2806936550 @default.
- W2951222010 hasRelatedWork W2951222010 @default.