# Adversarial Active Learning for Sequence Labeling and Generation

## Introduction

Active Learning from wikipedia) : Active learning is a special case of machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design.

Consider a label sequence with p tokens and each token can belong to k possible classes, then there are $k^{p}$ possible combinations of the label sequence. This complexity can grow exponentially with the length of the output.

The proposed adversarial active learning framework incorporates a neural network to explicitly assert each sample’s informa-tiveness with regard to labeled data.

## Background: Active Learning for Sequences

1. least confidence (LC) score: $y^{*}$ 是未标注样本$x^{U}$最有可能的预测结果（实际是一个标签序列），一般通过维特比算法计算得到最大概率的标签序列。

2. margin term: $y^{_}_{1}, y^{_}_{2}$分别是第一和第二高概率的标签序列。

3. 序列的交叉熵（这里的交叉熵是指标签序列的概率分布与其本身的交叉熵，实际上等于其自身的熵$H(p,q)=H(p)+KL(p,q), KL(p,p)=0$）: $y^{p}$ 是所有可能的标签序列

实际中为了减小计算量，选取前N个概率最大的标签序列（可以通过Beam Search）N-best sequence entropy (NSE)。

The labeling priority should be given to sam-ples with high entropy (corresponding to low confidence).

When the candidate samples’ quantity is large, the calculation of such complexity uncertainty measures can take a quite long while in scoring all individual samples from the data pool.

## Adversarial Active Learning for Sequences

A small similarity score implies the certain unlabeled sample is not related to any labeled samples in training set and vice versa. The labeling priority is offered to samples with low similarity scores.

Encoder M（图中两个是同一个网络，共享参数）负责得到隐变量表征，Discriminator D负责区分M的隐变量表征是否来自于标注样本（1代表为已标注，0代表未标注）。

1. Encoder&&Decoder：Mathematically, it encourages the discriminator D to output a score 1 for both $z^{L}$ and $z^{U}$ .

2. Discriminator:

Therefore, the score from this discriminator already serves as an informativeness similarity score that could be directly used for Eq.7.

Apparently, those samples with lowest scores should be sent out for labeling because they carry most valuable information in complementary to the cur-rent labeled data.

ALISE does not generate any fake sample and just borrows the adversarial learning objective for sample scoring.

## Experiments

### Slot Filling

Encoder和Decoder均为基本的RNN，Discriminator是全连接网络。总共是3000个样本，每次迭代时选择其中的300个样本进行标注，Random代表随机选取，使用所有已标注的数据进行训练。当3000个样本全部标注，所有方法的结果理论上应该是相同的。

### Image Captioning

Computational Complexity：