Adversarial Active Learning for Sequence Labeling and Generation

本文发表在IJCAI2018上,主要是关于active learning在序列问题上的应用,现有的active learning方法大多依赖于基于概率的分类器,而这些方法不适合于序列问题(标签序列的空间太大),作者提出了一种基于adversarial learning的框架解决了该问题。



Active Learning from wikipedia) : Active learning is a special case of machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points. In statistics literature it is sometimes also called optimal experimental design.

简言之,Active Learning是用来解决监督学习中标注样本的缺乏问题,现有的大多数Active Learning方法都是基于概率的分类器实现的,通过分类器预测的概率分布来衡量一个无标注样本的不确定性,如果一个无标注样本的不确定性很高,则证明这个样本包含对当前分类器的有效信息,选出这个样本进行标注,这个过程叫做query sample selection


Consider a label sequence with p tokens and each token can belong to k possible classes, then there are $k^{p}$ possible combinations of the label sequence. This complexity can grow exponentially with the length of the output.

而本文提出的adversarial active learning model for sequences (ALISE) 则使用对抗学习代替了该过程:

The proposed adversarial active learning framework incorporates a neural network to explicitly assert each sample’s informa-tiveness with regard to labeled data.

Background: Active Learning for Sequences

现有的针对序列问题的active learning方法有以下几种度量不确定性的计算方式:

  1. least confidence (LC) score: $y^{*}$ 是未标注样本$x^{U}$最有可能的预测结果(实际是一个标签序列),一般通过维特比算法计算得到最大概率的标签序列。

  2. margin term: $y^{_}_{1}, y^{_}_{2}$分别是第一和第二高概率的标签序列。

  3. 序列的交叉熵(这里的交叉熵是指标签序列的概率分布与其本身的交叉熵,实际上等于其自身的熵$H(p,q)=H(p)+KL(p,q), KL(p,p)=0$): $y^{p}$ 是所有可能的标签序列

    实际中为了减小计算量,选取前N个概率最大的标签序列(可以通过Beam Search)N-best sequence entropy (NSE)。


The labeling priority should be given to sam-ples with high entropy (corresponding to low confidence).


When the candidate samples’ quantity is large, the calculation of such complexity uncertainty measures can take a quite long while in scoring all individual samples from the data pool.

Adversarial Active Learning for Sequences



A small similarity score implies the certain unlabeled sample is not related to any labeled samples in training set and vice versa. The labeling priority is offered to samples with low similarity scores.

Figure  1:  An  overview  of  Adversarial  Active  Learning  for  sequences (ALISE).  The  black  and  blue  arrows  respectively  indicate  flows  for labeled  and  unlabeled  samples.

Encoder M(图中两个是同一个网络,共享参数)负责得到隐变量表征,Discriminator D负责区分M的隐变量表征是否来自于标注样本(1代表为已标注,0代表未标注)。


  1. Encoder&&Decoder:Mathematically, it encourages the discriminator D to output a score 1 for both $z^{L}$ and $z^{U}$ .

  2. Discriminator:

Therefore, the score from this discriminator already serves as an informativeness similarity score that could be directly used for Eq.7.


Apparently, those samples with lowest scores should be sent out for labeling because they carry most valuable information in complementary to the cur-rent labeled data.


ALISE does not generate any fake sample and just borrows the adversarial learning objective for sample scoring.


Slot Filling

Figure  3:  Image  captioning  results  by  active  learning


Image Captioning

Figure  4:  Image  captioning  results  in  the  active  learning  setting  by  ALISE,  ALISE+NSE  and  NSE-based  approaches.  The  novel  plausible descriptions  are  annotated  with  blue  color  while  wrong  descriptions  are  colored  in  red.

Computational Complexity:
Table  1:  The  active  selection  costs  for  different  algorithms


本文提出了一种对抗学习的sequence-based active learning框架,避免了传统的基于预测概率的方式,有效地提高模型的效率,并且可以应用到很多序列模型上。