0
Discriminative Deep Dyna-Q Robust Planning for Dialogue Policy Learning
本文是Deep Dyna-Q Integrating Planning for Task-Completion Dialogue Policy Learning 团队的续作,主要解决的是原始DDQ模型对world model生成的simulated dialogues质量好坏的严重依赖,通过引入一个区分真实对话和模拟对话的判别器,进而提高DDQ模型的鲁棒性和有效性。paper linkcode link