# Universal Sentence Encoder

## Introduction

In this paper, we present two models for producing sentence embeddings that demonstrate good transfer to a number of other of other NLP tasks.

_This module is about 1GB. Depending on your network speed, it might take a while to load the first time you instantiate it. After that, loading the model should be faster as modules are cached by default (learn more about caching). Further, once a module is loaded to memory, inference time should be relatively fast._

Our two encoders have different design goals. One based on the transformer architecture targets high accuracy at
the cost of greater model complexity and resource consumption. The other targets efficient inference with slightly reduced accuracy.

## Encoders

### Deep Averaging Network (DAN)

The second encoding model makes use of a deep averaging network (DAN) (Iyyer et al.,2015) whereby input embeddings for words and bi-grams are first averaged together and then passed through a feedforward deep neural network (DNN) to produce sentence embeddings.

## Transfer Learning Models

• 对于文本分类任务，将两种结构的sentence encoder的输出作为分类模型的输入；
• 对于语义相似度任务，直接通过sentence encoder的输出向量计算相似度：

As shown Eq. 1, we first compute the cosine similarity of the two sentence embeddings and then use arccos to convert the cosine similarity into an angular distance.We find that using a similarity based on angular distance
performs better on average than raw cosine similarity.
$$sim(u, v) = (1 - arccos(\frac{u \cdot v}{\left | u \right | \left | v \right |})/\pi ) \: \: \: \:\: \: \: \:\: \: \: \: (1)$$

### Baselines

• 使用word2vec的baseline
• 未使用任何预训练模型

## Experiments

• MR : Movie review snippet sentiment on a five star scale (Pang and Lee, 2005).

• CR : Sentiment of sentences mined from customer reviews (Hu and Liu, 2004).

• SUBJ : Subjectivity of sentences from movie reviews and plot summaries (Pang and Lee, 2004).

• MPQA : Phrase level opinion polarity from news data (Wiebe et al., 2005).

• TREC : Fine grained question classification sourced from TREC (Li and Roth, 2002).

• SST : Binary phrase level sentiment classification (Socher et al., 2013).

• STS Benchmark : Semantic textual similarity (STS) between sentence pairs scored by Pearson correlation with human judgments (Cer et al.,2017).

1. 基于Transform的USE往往优于DAN
2. USE优于仅仅使用word level encoder
3. 最优结果往往是sentence level和word level结合

Table 3 illustrates transfer task performance for varying amounts of training data. We observe that, for smaller quantities of data, sentence level transfer learning can achieve surprisingly good task performance. As the training set size increases, models that do not make use of transfer learning approach the performance of the other models.