An Affect-Rich Neural Conversational Model with Biased Attention and Weighted Cross-Entropy Loss

本文主要研究的是融合情感的开放域对话系统,在seq2seq的基础上增加了VAD (Valence, Arousal and Dominance)编码,引入了情感注意力机制来建模否定词和加强词的影响,使用加权交叉熵损失函数来鼓励模型生成包含情感的词。AAAI2019

  1. 因为否定词和加强词会改变情感的极性,所以导致情感识别仍然存在困难
  2. 如何在生成的时候同时兼顾语法和情感两方面


Our main contributions are summarized as follows:

  • For the first time, we propose a novel affective attention mechanism to incorporate the effect of negators and intensifiers in conversation modeling. Our mechanism introduces only a small number of additional parameters.
  • For the first time, we apply weighted cross-entropy loss in conversation modeling. Our affect-incorporated weights achieve a good balance between language fluency and emotion quality in model responses. Our empirical study does not show performance degradation in language fluency while producing affect-rich words.
  • Overall, we propose Affect-Rich Seq2Seq (AR-S2S), a novel end-to-end affect-rich open-domain neural conversational model incorporating external affect knowledge. Human preference test shows that our model is preferred over the state-of-the-art baseline model in terms of both content quality and emotion quality by a large margin.

Affect-Rich Seq2Seq Model

Figure  2:  Overall  architecture  of  our  proposed  AR-S2S.  This  diagram  illustrates  decoding  “fine”  and  affect  bias  for  “bad”.

Affective Embedding

模型使用VAD情感编码,VAD代表情感的三个因素,每个因素的得分范围在[1, 9]:

For example, word “nice” is associated with the clipped VAD values: (V: 6.95, A: 3.53, D: 6.47).

作者对原始的VAD情感分数做了限制[3, 7],目的是避免在生成的时候重复出现VAD值偏大或偏小的词。

Table  1:  Interpretations  of  clipped  VAD  embeddings.



其中,$\lambda \in R_{+}$ 是一个超参数,用来调节情感embedding的强度。

Affective Attention

To incorporate affect into attention naturally, we make the intuitive assumption that humans pay extra attention on affect-rich words during conversations.

Affective Attention核心是在seq2seq + attention基础上,增加了一个情感偏置项:

其中 $\bigotimes$ 表示逐元素相乘,$||…||_{k}$ 表示$l_{k}$正则化,$\beta\in R^{3}$ 是一个缩放因子,取值在[-1, 1]。

$\mu(x_{t}) \in R, [0, 1]$ 用来衡量一个词的重要性,作者共使用了三种计算方式:

其中 $p(x_{t})$ 代表训练集中词的词频,$a, \epsilon$ 代表平滑因子。

We take the log function in $u_{li}(x_{t})$ to prevent rare words from dominating the importance.

$\beta$ 是用来建模否定词和加强词对情感极性的影响:

Note that our affective attention only considers unigram negators and intensifiers

Affective Objective Function


Our proposed affective loss is essentially a weighted cross-entropy loss. The weights are constant and positively correlated with VAD strengths in l2 norm. Intuitively, our affective loss encourages affect-rich words to obtain higher output probability, which effectively introduces a probability bias into the decoder language model towards affect-rich words.


Table  2:  Model  test  perplexity.  Symbol  y  indicates  in-domain  perplexity  obtained  on  10K  test  pairs  from  the  Open-Subtitles  dataset.  Symbolzindicates  out-domain  perplexity obtained  on  10K  test  pairs  from  the  DailyDialog  dataset.