# DRr-Net Dynamic Re-read Network for Sentence Semantic Matching

## Introduction

For example, when judging the relation between “a person with a purple shirt is painting an image of a woman on a white wall” and “a woman paints a portrait of her best friend”, the important words will change from “person, purple, shirt, painting, image, woman” to “ person, image, woman” in the first sentence, and from “woman, paints, portrait, best friend” to “woman, portrait, best friend” in the second sentence. As the Chinese proverb says: “The gist of an article will come to you after reading it over 100 times ”.

### Input Embedding

Word Embedding：针对于句子中的每个词表征，模型使用预训练的词向量、字特征、句法特征拼接，最终得到句子序列表示： $\{a_{i}|i=1,2,…,l_{a}\}, \{b_{j}|j=1,2,…,l_{b}\}$

The character features are obtained by applying a convolutional neural network with a max pooling layer to the learned character embeddings, which can represent words in a finer-granularity and help to avoid the Out-Of-Vocabulary (OOV) problem that pre-trained word vectors suffer from. The syntactical features consist of the embedding of part-of-speech tagging feature, binary exact match feature, and binary antonym feature, which have been proved useful for sentence semantic understanding (Chen et al. 2017a; Gururangan et al. 2018).

Attention Stack-GRU(ASG)：得到句子序列表示后，通过一个stack GRU

$H_{l}$ 代表第l层GRU，得到最终的隐层状态输出 $\{h_{i}^{a}|i=1,2,…,l_{a}\}, \{h_{j}^{b}|j=1,2,…,l_{b}\}$ （把所有层的输出拼接）。

Moreover, with an in-depth understanding of the sentence, the important words that should be concerned are dynam-ically changing, even the words that did not get attention before.

T是动态读取的次数。对于F，使用注意力机制计算：

$\beta$ 是一个任意大的值，目的是让最重要的词的权重趋向于1，其它词趋向于0。

### Label Prediction

where $p^{h}$ and $p^{v}$ denote the probability distribution of different classes with original sentence representations and dynamic sentence representations separately.

## Experiment

SNLI: The SNLI (Bowman et al. 2015) contains570,152 human annotated sentence pairs. Each sentence pair is labeled with one of the following relations:Entailment,Contradiction,orNeutral.
SICK: The SICK (Marelli et al. 2014) contains10,000 sentence pairs. The labels are the same as SNLI dataset.
Quora: The Quora Question Pair (Iyer, Dandekar, and Csernai 2017) dataset consists of over 400,000 potential question duplicate pairs. Each pair has a binary value that indicates whether the line truly contains a duplicate pair.

When the re-read length is between 5 to 7, DRr-Net achieves the best performance. This phenomenon is consistent with the psychological findings that human attention focuses on nearly 7 words (Tononi 2008).

## Conclusion and Future Work

In this paper, we proposed a Dynamic Re-read Network(DRr-Net) approach for sentence semantic matching, a novel architecture that was able to pay close attention to a small region of sentences at each time and re-read the important information for better sentence semantic matching.

In the future, we will focus on providing more information for attention mechanism to select important part more precisely and reduce the situation of repeated reading of one word.