Training Millions of Personalized Dialogue Agents

本文是FAIR发表于EMNLP2018上的文章,主要提出了一个基于Reddit的大规模开放域对话数据集,附带大量的用户个性,实验证明用户个性有助于提高对话系统的性能;同时,基于本数据集的预训练模型也有助于各种任务(FAIR的另一篇文章Wizard of Wikipedia Knowledge-powered conversational agents 使用了基于本数据集的预训练Transformer Encoder)。

paper link



Building a dataset of millions of persona-based dialogues

以下是persona-based dialog的一个例子:


We construct the persona of a user by gathering all the comments they wrote, splitting them into sentences, and selecting the sentences that satisfy the following rules:

  • each sentence must contain between 4 and 20 words or punctuation marks
  • it contains either the word _I_ or _my_
    • at least one verb
    • at least one noun, pronoun or adjective.


  • _rule_:在所有满足上述规则的句子中,随机选择至多N个句子作为用户个性。
  • _rule+classifier_:首先使用上述规则初步过滤,之后再用一个分类器计算得分,手工设定阈值,选择前topN个作为个性句。这个分类器使用PERSONA-CHAT数据集中的persona句与随机抽取的reddit comments作为训练数据。
  • _random from user_:从同一个用户Responser句中随机抽取(只需满足长度的要求,忽略其它),作为该用户的个性
  • _random from dataset_:从整个数据集中随机抽取,有可能来自于不同用户,作为对比实验。

We take each pair of successive comments in a thread to form the context and response of an example.

End-to-end dialogue models

Figure  1:  Persona-based  network  architecture.

As in Zhang et al. (2018), we combine the encoded context and persona using a 1-hop memory network with a residual connection, using the context as query and the set of persona sentences as memory.

We use mini-batches of training examples and, for each example therein, all the responses of the other examples of the same batch are used as negative responses.


  • Bag-of-words:对词向量过一个全连接层,然后对所有位置做平均池化,除以长度的平方根,得到encoding
  • LSTM:applies a 2-layer bidirectional LSTM. We use the last hidden state as encoded sentence.
  • Transformer encoding:We subsequently average the resulting representation across all positions in the sentence, yielding a fixed-size representation.


Table  1:  Test  results  when  classifying  the  correct  answer  among  a  total  of  100  possible  answers.

Table  2:  Sample  predictions  from  the  best  model.  In  all  selected  cases  the  persona  consists  of  a  single  sentence. The  answer  is constrained  to  be  at  most  10  tokens  and  is  retrieved  among  1M  candidates  sampled  randomly  from the  training  set.

Table  3:  Retrieval  precision  on  the  REDDIT test  set using  a  Transformer  and  different  persona  selection  systems.  N:  maximum  number  of  sentences  per  persona.

Table  4:  hits@1  results  for  the  best  found  Transformer architecture  on  different  test  sets.  FT-PC:  REDDIT-trained  model  fine-tuned  on  the  PERSONA-CHAT training  set.  To  be  comparable  to  the  state  of  the  art  on  each dataset,  results  on  PERSONA-CHATare  computed  using 20  candidates,  while  results  on  REDDIT use  100.