# An Affect-Rich Neural Conversational Model with Biased Attention and Weighted Cross-Entropy Loss

## Introduction

1. 因为否定词和加强词会改变情感的极性，所以导致情感识别仍然存在困难
2. 如何在生成的时候同时兼顾语法和情感两方面

Our main contributions are summarized as follows:

• For the first time, we propose a novel affective attention mechanism to incorporate the effect of negators and intensifiers in conversation modeling. Our mechanism introduces only a small number of additional parameters.
• For the first time, we apply weighted cross-entropy loss in conversation modeling. Our affect-incorporated weights achieve a good balance between language fluency and emotion quality in model responses. Our empirical study does not show performance degradation in language fluency while producing affect-rich words.
• Overall, we propose Affect-Rich Seq2Seq (AR-S2S), a novel end-to-end affect-rich open-domain neural conversational model incorporating external affect knowledge. Human preference test shows that our model is preferred over the state-of-the-art baseline model in terms of both content quality and emotion quality by a large margin.

## Affect-Rich Seq2Seq Model

### Affective Embedding

For example, word “nice” is associated with the clipped VAD values: (V: 6.95, A: 3.53, D: 6.47).

### Affective Attention

To incorporate affect into attention naturally, we make the intuitive assumption that humans pay extra attention on affect-rich words during conversations.

Affective Attention核心是在seq2seq + attention基础上，增加了一个情感偏置项：

$\mu(x_{t}) \in R, [0, 1]$ 用来衡量一个词的重要性，作者共使用了三种计算方式：

We take the log function in $u_{li}(x_{t})$ to prevent rare words from dominating the importance.

$\beta$ 是用来建模否定词和加强词对情感极性的影响：

Note that our affective attention only considers unigram negators and intensifiers

### Affective Objective Function

Our proposed affective loss is essentially a weighted cross-entropy loss. The weights are constant and positively correlated with VAD strengths in l2 norm. Intuitively, our affective loss encourages affect-rich words to obtain higher output probability, which effectively introduces a probability bias into the decoder language model towards affect-rich words.