# Personalizing Dialogue Agents I have a dog, do you have pets too?

## Introduction

1. agent缺乏一致的个性
2. 缺乏对长期对话历史的理解
3. 容易生成通用回复

PERSONA-CHAT数据集是一个众包数据集，每个对话者根据给定的profile与另一位对话者进行对话，收集过程主要包括以下三个步骤：

1. Personas: we crowdsource a set of 1155 possible personas, each consisting of at least 5 profile sentences, setting aside 100 never seen before personas for validation, and 100 for test.
2. Revised personas: to avoid modeling that takes advantage of trivial word overlap, we crowdsource additional rewritten sets of the same 1155 personas, with related sentences that are rephrases, generalizations or specializations, rendering the task much more challenging.
3. Persona chat: we pair two Turkers and assign them each a random (original) persona from the pool, and ask them to chat. This resulted in a dataset of 162,064 utterances over 10,907 dialogs, 15,602 utterances (1000 dialogs) of which are set aside for validation, and 15,024 utterances (968 dialogs) for test.

We asked the workers to make each sentence short, with a maximum of 15 words per sentence.

In an early study we noticed the crowdworkers tending to talk about themselves(their own persona) too much, so we also added the instructions “both ask questions and answer questions of your chat partner” which seemed to
help.

We consider this in four possible scenarios: conditioning on no persona, your own persona, their persona, or both. These scenarios can be tried using either the original personas, or the revised ones. We then evaluate the task using three metrics: (i) the log likelihood of the correct sequence, measured via perplexity, (ii) F1 score, and (iii) next utterance classification loss, following Lowe et al. (2015). The latter consists of choosing N random distractor responses from other dialogues (in our setting, N=19) and the model selecting the best response among them, resulting in a score of one if the model chooses the correct response, and zero otherwise (called hits@1 in the experiments).

## Models

1. Baseline ranking models：IR and StarSpace
2. Ranking Profile Memory Network：将profile sentence $p_{i}$ 作为memory，dialogue history作为query q

最后再计算 $q^{+}$ 与候选回复的相似度。

3. Key-Value Profile Memory Network：第一跳采取和前面Profile Memory Network 相同的方式，得到$q^{+}$后，再用$q^{+}$做另外的注意力（keys为对话历史，values为对应的下一句回复），最后得到$q^{++}$，再计算与候选回复的相似度。

4. Seq2Seq
5. Generative Profile Memory Network

## Experiments

### Automated metrics

Most models improve significantly when conditioning prediction on their own persona at least for the original (non-revised) versions, which is an easier task than the re-vised ones which have no word overlap.

Table 3中的结果都是基于说话者自身的persona，作者做了对比试验，来验证基于对方persona或者双方的persona的模型效果：

We can also condition a model on the other speaker’s persona, or both personas at once, the results of which are in Tables 5 and 6 in the Appendix. Using “Their persona” has less impact on this dataset. We believe this is because most speakers tend to focus on themselves when it comes to their interests. It would be interesting how often this is the case in other datasets. Certainly this is skewed by the particular instructions one could give to the crowdworkers. For example if we gavethe instructions “try not to talk about yourself, but about the other’s interests’
likely these metrics would change.

_Details in origin paper_

### Human Evaluation

Finding the balance between fluency, engage-ment, consistency, and a persistent persona re-mains a strong challenge for future research.

Two tasks could naturally be considered using PERSONACHAT: (1) next utterance prediction during dialogue, and (2) profile prediction given dialogue history.