# Advanced Architectures and Memory Networks

Model overview and combinations, Dynamic memory networks. CS224n lecture 16.

## Model overview and combinations

Model comparison :

• Bag of Vectors: Surprisingly good baseline for simple text classification problems. Especially if followed by a few relu layers!
• Window Model: Good for single word classification for problems that do not need wide context, e.g. POS
• CNNs: good for classification, unclear how to incorporate phrase level annotation (can only take a single label), need zero padding for shorter phrases, hard to interpret, easy to parallelize on GPUs, can be very efficient and versatile
• Recurrent Neural Networks: Cognitively plausible (reading from left to right, keeping a state), not best for classification (n-gram), slower than CNNs, can do sequence tagging and classification, very active research, amazing with attention mechanisms
• TreeRNNs: Linguistically plausible, hard to parallelize, tree structures are discrete and harder to optimize, need a parser
• Combinations and extensions!

Rarely do we use the vanilla models as is.

### TreeLSTMs

• LSTMs are great
• TreeRNNs can benefit from gates too ->TreeRNNs + LSTMs
• Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks by Kai Sheng Tai, Richard Socher, Christopher D. Manning

### Quasi-Recurrent Neural Network

• Manual process of finding best units requires a lot of expertise
• What if we could use AI to find the right architecture for any problem?
• Neural architecture search with reinforcement learning by Zoph and Le, 2016

## Dynamic Memory Network

### Architecture of DMN

Question Module计算出一个Question Vector q，根据q应用attention机制，回顾input的不同时刻。根据attention强度的不同，忽略了一些input，而注意到另一些input。这些input进入Episodic Memory Module，注意到问题是关于足球位置的，那么所有与足球及位置的input被送入该模块。该模块每个隐藏状态输入Answer module，softmax得到答案序列。

Episodic Memory Module中有两条线，分别代表带着问题q第一次阅读input的记忆，以及带着问题q第二次阅读的记忆。

### The Modules: Input

Further Improvement: BiGRU

### The Modules: Question

$$q_{t} = GRU(v_{t}, q_{t-1})$$

### The Modules: Episodic Memory

Gates are activated if sentence relevant to the question or memory：

If summary is insufficient to answer the question, repeat sequence over input.

• $a_{t}$ : $h_{t}$
• $y_{t-1}$ : 上一时刻的输出