# Sequence-to-Nuggets Nested Entity Mention Detection via Anchor-Region Networks

## Introduction

• 尽管实体可能会有嵌套关系，但是不会共享同一个head word(anchor word)，即不同的实体具有不同的head word，同时head word与实体类型有很强的语义关系。例如图1中The minister of the department of educationthe department of education 分别对应head words minister department，并且分别对应实体类型ORGPER
• 大部分的实体具有规则的表达结构。例如图1中的两个实体共享DET NN of NP的结构，NN即为head words。

• 用anchor detector network找出anchor word同时判断其对应的实体类别
• 用region recognizer network识别出以不同anchor word为中心的实体的边界

## Anchor-Region Networks for Nested Entity Mention Detection

### Anchor Detector

Anchor Detector是一个基于BiLSTM的softmax分类器，给定序列 $x_{1},…, x_{n}$，先得到其向量表示（由词向量，词性，字符向量组成），然后再通过BiLSTM层，最后对隐层状态做分类：

$$\overrightarrow{h_{i}^{A}} =\operatorname{LSTM}\left(x_{i}, \overrightarrow{h_{i-1}^{A}}\right)$$

$$\hat{h_{i}^{A}} =\operatorname{LSTM}\left(x_{i}, \overleftarrow{h_{i+1}^{A}}\right)$$

$$h_{i}^{A} =\left[\overrightarrow{h_{i}^{A}} ; \overleftarrow{h_{i}^{A}}\right]$$

$$O_{i}^{A}=\operatorname{MLP}\left(h_{i}^{A}\right)$$

### Region Recognizer

$$\boldsymbol{r}_{\boldsymbol{i}}=\tanh \left(\boldsymbol{W} \boldsymbol{h}_{\boldsymbol{i}-\boldsymbol{k} : \boldsymbol{i}+\boldsymbol{k}}^{\boldsymbol{R}}+\boldsymbol{b}\right)$$
$\boldsymbol{h}_{\boldsymbol{i}-\boldsymbol{k} : \boldsymbol{i}+\boldsymbol{k}}$是 $h_{i-k}^{R}$ 到 $h_{i+k}^{R}$的拼接，W是卷积核，k是窗口大小。最后计算achor word $w_{i}$ 左右边界在 word $w_{j}$上的分数：
$$L_{i j} =\tanh \left(r_{j}^{T} \Lambda_{1} h_{i}^{R}+U_{1} r_{j}+b_{1}\right)$$
$$R_{i j} =\tanh \left(r_{j}^{T} \Lambda_{2} h_{i}^{R}+U_{2} r_{j}+b_{2}\right)$$

### Model Learning with Bag Loss

• 如果$x_{i}$是anchor word，loss是anchor detector loss与region recognizer loss之和。

• 如果$x_{i}$不是anchor word，$x_{i}$应该被分为NIL，loss只有anchor detector loss一部分。

$$\omega_{i}=\left[\frac{P\left(c_{i} | x_{i}\right)}{\max _{x_{t} \in B_{i}} P\left(c_{i} | x_{t}\right)}\right]^{\alpha}$$

Compared with other words in the same bag, a word $x_{i}$ should have larger $w_{i}$ if it has a tighter association with the bag type.
$\alpha = 0$ means that all words are annotated with the bag type. And $\alpha \rightarrow+\infty$ means that Bag Loss will only choose the word with highest P(cijxi) as anchor word, while all other words in the same bag will be regarded as NIL.

## Experiments

1）从LSTM-CRF与Multi-CRF模型的对比，可以看出嵌套命名实体的识别对于实体识别有着很大的影响，需要重视。

2）本文的Anchor-Region Networks 可以有效地识别嵌套命名实体，在ACE2005、GENIA、ARNS数据集上达到了最好的效果。