# Data Augmentation for Spoken Language Understanding via Joint Variational Generation

## Introduction

• 本文定义了一种针对于SLU任务的通用GDA框架，并且提出了一种基于monte carlo的采样方法。
• 本文提出了一种联合生成utterance和label的生成模型，实验证明可以生成自然的语句，并且可以正确的标注；同时提高了SLU模型的准确率。
• 作者通过大量的实验证明本文提出的GDA方法适用于各种SLU数据集和模型。

## Model

### GDA Framework

Notations
$w=(w_{1},…,w_{T})$ 是一个utterance，T是这个序列的长度。在一个已标注的SLU数据集中，$s=(s_{1},…,s_{T})$ 是序列w对应的slot标注，序列的意图标注则用y表示。D是一个全部标注过的SLU数据集 $\left\{\left(\mathbf{w}_{1}, \mathbf{s}_{1}, y_{1}\right), \dots,\left(\mathbf{w}_{n}, \mathbf{s}_{n}, y_{n}\right)\right\}$，n是数据集的大小，从D中采样的一个样本为$x=(w,s,y)$，$D_{w}, D_{s}, D_{y}$ 分别代表D中所有utterances、slot labels、intent labels。

Spoken Language Understanding

$$\mathcal{L}_{L U}(\psi ; \mathbf{w}, \mathbf{s}, y)=-\log p_{\psi}(\mathbf{s}, y | \mathbf{w})$$

Generative Data Augmentation

### Joint Generative Model

#### Standard VAE

The Sampling Problem

$$\hat{\mathbf{w}} \sim p_{\theta_{\mathcal{D}}, \phi_{\mathcal{D}}}(\mathbf{w})=\int p_{\theta_{\mathcal{D}}}(\mathbf{w} | \mathbf{z}) p_{\theta_{\mathcal{D}}, \phi_{\mathcal{D}}}(\mathbf{z}) d \mathbf{z}$$
$$p_{\theta_{\mathcal{D}}, \phi_{\mathcal{D}}}(\mathbf{z})=\mathbb{E}_{\mathbf{w} \sim p(\mathbf{w})}\left[q_{\phi_{\mathcal{D}}}(\mathbf{z} | \mathbf{w})\right]$$

• VAE中最基础的方法是直接用z的先验分布（标准正态分布）来近似，直接从正态分布中采样z。而这种方法会生成大量同质的和无意义的样本，因为这种假设过于简单。

In real world scenarios, the KLD loss term in ELBO loss is still large after convergence.

• 另一种是基于Monte Carlo的方法

According to the law of large numbers, the marginal likelihood $p_{\theta_{\mathcal{D}}, \phi_{\mathcal{D}}}(\mathbf{w})$ converges to the empirical mean, thereby providing an unbiased distribution for sampling w.

#### Joint Language Understanding VAE

$$\mathcal{L}_{L U}(\phi, \psi ; \mathbf{w}, \mathbf{s}, y)=-\mathbb{E}_{\mathbf{z} \sim q_{\phi}}\left[\log p_{\psi}(\mathbf{s}, y | \hat{\mathbf{w}}, \mathbf{z})\right]$$
JLUVA的联合loss为：

We obtain the optimal parameters $\theta^{*}, \phi^{*}, \psi^{*}$ by minimizing Equation 6 (i.e. $\arg \min_{\theta, \phi, \psi} \mathcal{L}$) with respect to a real dataset D.

## Experiments

### Datasets

• ATIS: Airline Travel Information System (ATIS) (Hemphill, Godfrey, and Doddington 1990) is a representative dataset in the SLU task, providing well-founded comparative environment for our experiments.
• Snips: The snips dataset is an open source virtual-assistant corpus. The dataset contains user queries from various domains such as manipulating playlists or booking restaurants.
• MIT Restaurant (MR): This single-domain dataset specializes in spoken queries related to booking restaurants.
• MIT Movie: The MIT movie corpus consists of two single-domain datasets: the movie eng (ME) and movie trivia (MT) datasets. While both datasets contain queries about film information, the trivia queries are more complex and specific.

### Experimental Settings

Since we observe a high variance in performance gains among different runs of the same generative model, we need to approach the experimental designs with a more conservative stance.

• 在相同的训练集下，使用不同的随机数种子来训练$N_{G}$个相同的生成模型
• 从$N_{G}$中每一个模型采样得到m个utterances，得到$N_{G}$个增强后的数据集$\mathcal{D}_{1}^{\prime}, \ldots, \mathcal{D}_{N_{G}}^{\prime}$
• 在每一个数据集上训练$N_{L}$个相同的SLU模型，所有的模型都是在相同的验证集和测试集上评估
• 最终得到$N_{G} \times N_{L}$个结果

### Generative Data Augmentation Results

GDA on Other SLU Models and Datasets

• 数据集本身的难度
• 模型的表达能力

Comparison to Other State-of-the-art Results

### Ablation Studies

#### Sampling Methods

1. Exploratory Monte-Carlo Posterior Sampling (Ours): z is sampled from the empirical expectation of the model, which is estimated by inferring posteriors from random utterance samples. (Algorithm 1)

2. Standard Gaussian: z is sampled from the assumed prior, the standard multivariate Gaussian.

3. Additive Sampling: First, the latent representation $z_{w}$ of a random utterance w is sampled. Then $z_{w}$ is disturbed by a perturbation vector α ∼ U (−0.2,0.2). It was proposed for the deterministic model in (Kurata, Xiang, and Zhou 2016).

## Conclusion

toencoder (JLUVA)模型，在此基础上分析了各种VAE采样的方法。作者最后提到这类方法也可以应用到其它NLP任务中，但是这些工作还需要更多的理论上的解释。