# Wizard of Wikipedia Knowledge-powered conversational agents

## Introduction

Wizard of Wikipedia对话数据集属于开放域对话系统，一个对话者随机选择一个初始话题，对话双方可以在此基础上进行对话，但在对话过程中话题也可以拓展。对话双方的角色是不同的，分为 wizardapprentice

• wizard：wizard的目的是通知apprentice关于对话主题相关的背景知识，在对话开始之前，会给定一些相关的wiki段落，这些对于apprentice不可见。同时，wizard不允许直接复制拷贝wiki里的文本句子作为回复，而是需要自己进行组合生成融合知识的回答。
• apprentice：apprentice的目的是深入的询问与对话主题相关的问题，这与普通的闲聊有所区别。

Conversation Flow The flow of the conversation thus takes place as follows.

1. Either the wizard or apprentice is picked to choose the topic and speak first. The other player receives the topic information, and the conversation begins.
2. When the apprentice sends the wizard a message, the wizard is shown relevant knowledge(described below), and chooses a relevant sentence in order to construct a response, or else chooses the no sentence used option.
3. The Wizard responds to the apprentice basing their response on their chosen sentence.
4. The conversation repeats until one of the conversation partners ends the chat (after a minimum of 4 or 5 turns each, randomly chosen beforehand).

## Models

RETRIEVAL TRANSFORMER MEMORY NETWORK

The model is trained to minimize the cross-entropy loss, where the negative candidates for each example are the responses to the other examples in the batch (Henderson et al., 2017).

GENERATIVE TRANSFORMER MEMORY NETWORK

• End-to-end : 与检索模型类似，得到context对knowledge的注意力分布后，选择概率最大的知识 $m_{best}$，将其与context encoding拼接，然后再经过Transformer decoder解码生成。作者额外添加了辅助交叉熵loss，以帮助选择合适的知识：$\mathcal{L}=(1-\lambda) \mathcal{L}_{\mathrm{NLL}}+\lambda \mathcal{L}_{\mathrm{know} \mathrm{ledge}}$
• Two-stage：这种模式下，模型分为两个单独的子任务knowledge selectionutterance prediction，二者分开训练。knowledge selection的训练方式与end-to-end没有区别，在选择出知识$m_{best}$后，需要用另一个Transformer对context和选择的知识进行编码，再经过Transformer decoder解码生成。作者还提出了一种knowledge dropout的机制，能够避免knowledge selection错误传播。