RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.
数据
1.原始数据来自于四个不同的领域:hotel,laptop,reataurant,tv
2.每个数据集分为三类:train,valid,test
3.数据格式为三元组:[MR/Dialogue Act, Human Authored Response, HDC baseline]
# example
[
"inform(name='trattoria contadina';pricerange=moderate)",
"trattoria contadina is a nice restaurant in the moderate price range",
"trattoria contadina is a nice place it is in the moderate price range"
]
scLSTM的输入
1.数据预处理:将原始数据中的三元组扩充为四元组1
2
3
4
5
6
7
8
9
10
11
12
13# example
[
[('a', u'inform'), (u'area', '_1'), (u'name', '_1'), (u'pricerange', '_1')],
u"inform(name='alamo square seafood grill';area='friendship village';pricerange=moderate)",
u'SLOT_NAME is a nice restaurant in the area of SLOT_AREA and it is in the SLOT_PRICERANGE price range',
u'SLOT_NAME is a nice place , it is in the area of SLOT_AREA and it is in the SLOT_PRICERANGE price range'
]
[
[('a', u'?request'), (u'near', '?')],
u'?request(near)',
u'please confirm your area of interest',
u'where would you like it to be near to'
]
2.feat_template: a, sv, s, v1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29# a代表句子的意图
a = [
a.?compare,a.?confirm,a.?reqmore,a.?request,a.?select, a.bye,
a.goodbye,a.inform,a.inform_all,a.inform_count,a.inform_no_info,
a.inform_no_match,a.inform_only_match,a.recommend,a.suggest
]
# sv : sv.slot.slot_value
sv = [
sv.acceptscreditcards.dontcare,sv.acceptscreditcards.no,
sv.acceptscreditcards.yes,sv.accessories._1,sv.accessories._2,
sv.accessories.none,sv.address._1,sv.area.?,sv.area._1,
sv.area.dontcare,sv.audio._1,sv.audio._2,sv.audio.none,
sv.battery._1,sv.battery._2,
etc...
]
# s : s.slot
s = [
s.acceptscreditcards,s.accessories,s.address,s.area,
s.audio,s.battery,s.batteryrating,s.color,s.count,
s.design,s.dimension
etc...
]
# v : slot_value
v = [
v.?,v._1,v._2,v._3,v.dontcare,v.no,v.none,v.yes,v.NONE
]
3.input of scLSTM for training
- a: 句子意图(inform,request,recommend,etc)在feat_template中a集合的位置索引
- sv: sv is the index of sv.slot_name.slot_value in all the sv.slot_name.slot_values in feat_template.txt
- s: same as sv
- v: same as sv
- words: 四元组中第三项Human Authored Response句子的单词index集合
- b_size: batch size
- lengs: len(a,sv,s,v,sent),numpy二维数组[[1], [3], [3], [3], [20]]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29# example
# [('a', u'inform_count'), (u'count', '_1'), (u'goodformeal', '_1'), (u'type', '_1')],
# u"inform_count(type=restaurant;count='2';goodformeal=brunch)",
# u'there are SLOT_COUNT SLOT_TYPE -s good for SLOT_GOODFORMEAL',
# u'there are SLOT_COUNT SLOT_TYPE -s good for SLOT_GOODFORMEAL'
array([[7]], dtype=int32),
array([[ 53, 78, 82, 101]], dtype=int32),
array([[16, 22, 24, 31]], dtype=int32),
array([[1, 7, 1, 1]], dtype=int32),
array([[ 1],
[ 5],
[ 2],
[ 4],
[ 22],
[140],
[117],
[113],
[105],
[ 41],
[136],
[147],
[ 10],
[ 1]], dtype=int32),
1,
array([[ 1],
[ 4],
[ 4],
[ 4],
[14]], dtype=int32)
scLSTM的输出
1.input for test : a sv s v
2.output of scLSTM: 20 * (penalty + sentence)1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21# ('gens : ', [(0.13399766372142932, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 77, 15, 132, 1]), (0.14933728562324292, [1, 5, 138, 113, 126, 8, 2, 77, 15, 132, 1]), (0.1549744254930803, [1, 5, 2, 4, 113, 105, 41, 2, 77, 15, 132, 1]), (0.15852280774253719, [1, 5, 138, 113, 126, 15, 132, 1]), (0.19026726564154084, [1, 5, 2, 4, 33, 105, 41, 138, 113, 8, 2, 77, 15, 132, 1]), (0.20233138058456338, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 15, 132, 1]), (0.20940732168664519, [1, 5, 2, 4, 33, 105, 41, 138, 113, 132, 8, 2, 77, 15, 132, 1]), (0.21145074659354887, [1, 5, 2, 4, 113, 105, 41, 138, 132, 1]), (0.21476023212302103, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 14, 2, 77, 15, 132, 1]), (0.22121748802971725, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 138, 132, 1]), (0.22331384956149691, [1, 5, 2, 4, 33, 105, 41, 138, 113, 8, 14, 2, 77, 15, 132, 1]), (0.22577840448106021, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 4, 77, 202, 15, 132, 1]), (0.23494928307737384, [1, 5, 2, 4, 33, 105, 41, 138, 113, 15, 132, 1]), (0.2414091617709512, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 77, 15, 132, 8, 2, 77, 15, 132, 1]), (0.25039630774077898, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 111, 15, 132, 1]), (0.26162732976045144, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 14, 2, 4, 77, 202, 15, 132, 1]), (0.27529049174629477, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 4, 77, 132, 1]), (0.30046496217825086, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 14, 2, 4, 77, 132, 1]), (0.3006930035044566, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 1]), (0.31477015858324936, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 77, 15, 1])])
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME serves SLOT_FOOD food and is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a SLOT_FOOD restaurant that is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME serves SLOT_FOOD food for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD and is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD SLOT_GOODFORMEAL and is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a SLOT_FOOD restaurant that serves SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and it is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and serves SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD and it is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is a good place for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is good for SLOT_GOODFORMEAL and is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is great for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and it is a good place for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is a good SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and it is a good SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is good for')
3.由penalty得到score,选取top-k
Evaluation for scLSTM
_explanation for bleu:_