# Imposing Label-Relational Inductive Bias for Extremely Fine-Grained Entity Typing

## Methodology

### Mention-Context Interaction

$$m_{p r o j}=\tanh \left(W_{1}^{T} \mathcal{M}\right)$$

$$\mathcal{A}=m_{p r o j} \times W_{a} \times \mathcal{C}_{h}$$

$\rho(\cdot)$ 是gaussian error linear unit Bridging nonlinearities and stochastic regularizers with gaussian error linear units，$\sigma(\cdot)$ 是sigmoid()函数。

### Imposing Label-Relational Inductive Bias

$$p=\sigma\left(W_{o} f\right), W_{o} \in \mathbb{R}^{N \times d_{f}}$$

Label Graph Construction 作者考虑开放领域中的实体类别，使用图来表示标签之间的共现关系，其中节点为类别，如果两个类别出现在同一个mention span中，则这两个类别节点连接成一条边。

Correlation Encoding via Graph Convolution 基于类别的共现矩阵$A$，给定随机初始化的$W_{o}$，论文使用GCN来获取节点的表示：

$$\tilde{D}_{i i}=\sum_{j} \tilde{A}_{i j}$$

$$W_{o}^{\prime}=\tilde{D}^{-1} \tilde{A} W_{o} T$$

$$W_{o}^{\prime}[i, :]=\frac{1}{\sum_{j} \tilde{A}_{i j}}\left(\sum_{j} \tilde{A}_{i j} W_{o}[j, :] T\right)$$

Compared to original GCNs that often use multi-hop propagations (i.e., multiple graph layers connected by nonlinear functions) to capture higher-order neighbor structures. We only apply one-hop propagation and argue that high-order label dependency is not necessarily beneficial in our scenario and might introduce false bias. A simple illustration is shown in Figure 2. We can see that propagating 2-hop information introduces undesired inductive bias, since types that are more than 1-hop away (e.g., “Engineer” and “Politician”) usually do not have any dependencies.