RNNLG

2017-09-04

NLG

RNNLG is an open source benchmark toolkit for Natural Language Generation (NLG) in spoken dialogue system application domains. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.

数据

1.原始数据来自于四个不同的领域：hotel,laptop,reataurant,tv
2.每个数据集分为三类：train,valid,test
3.数据格式为三元组：[MR/Dialogue Act, Human Authored Response, HDC baseline]

# example
[
    "inform(name='trattoria contadina';pricerange=moderate)",
    "trattoria contadina is a nice restaurant in the moderate price range",
    "trattoria contadina is a nice place it is in the moderate price range"
]

scLSTM的输入

1.数据预处理：将原始数据中的三元组扩充为四元组

# example
[
    [('a', u'inform'), (u'area', '_1'), (u'name', '_1'), (u'pricerange', '_1')],
    u"inform(name='alamo square seafood grill';area='friendship village';pricerange=moderate)",
    u'SLOT_NAME is a nice restaurant in the area of SLOT_AREA and it is in the SLOT_PRICERANGE price range',
    u'SLOT_NAME is a nice place , it is in the area of SLOT_AREA and it is in the SLOT_PRICERANGE price range'
]
[
    [('a', u'?request'), (u'near', '?')],
    u'?request(near)',
    u'please confirm your area of interest',
    u'where would you like it to be near to'
]

2.feat_template: a, sv, s, v

# a代表句子的意图
a = [
         a.?compare,a.?confirm,a.?reqmore,a.?request,a.?select, a.bye,
         a.goodbye,a.inform,a.inform_all,a.inform_count,a.inform_no_info,
         a.inform_no_match,a.inform_only_match,a.recommend,a.suggest
]

# sv : sv.slot.slot_value
sv = [
        sv.acceptscreditcards.dontcare,sv.acceptscreditcards.no,
        sv.acceptscreditcards.yes,sv.accessories._1,sv.accessories._2,
        sv.accessories.none,sv.address._1,sv.area.?,sv.area._1,
        sv.area.dontcare,sv.audio._1,sv.audio._2,sv.audio.none,
        sv.battery._1,sv.battery._2,
        etc...
]

# s : s.slot
s = [
        s.acceptscreditcards,s.accessories,s.address,s.area,
        s.audio,s.battery,s.batteryrating,s.color,s.count,
        s.design,s.dimension
        etc...
]

# v : slot_value
v = [
        v.?,v._1,v._2,v._3,v.dontcare,v.no,v.none,v.yes,v.NONE
]

3.input of scLSTM for training

a: 句子意图(inform,request,recommend,etc)在feat_template中a集合的位置索引
sv: sv is the index of sv.slot_name.slot_value in all the sv.slot_name.slot_values in feat_template.txt
s: same as sv
v: same as sv
words: 四元组中第三项Human Authored Response句子的单词index集合
b_size: batch size

lengs: len(a,sv,s,v,sent),numpy二维数组[[1], [3], [3], [3], [20]]

# example
# [('a', u'inform_count'), (u'count', '_1'), (u'goodformeal', '_1'), (u'type', '_1')],
# u"inform_count(type=restaurant;count='2';goodformeal=brunch)",
# u'there are SLOT_COUNT SLOT_TYPE -s good for SLOT_GOODFORMEAL',
# u'there are SLOT_COUNT SLOT_TYPE -s good for SLOT_GOODFORMEAL'
array([[7]], dtype=int32),
array([[ 53,  78,  82, 101]], dtype=int32),
array([[16, 22, 24, 31]], dtype=int32),
array([[1, 7, 1, 1]], dtype=int32),
array([[  1],
       [  5],
       [  2],
       [  4],
       [ 22],
       [140],
       [117],
       [113],
       [105],
       [ 41],
       [136],
       [147],
       [ 10],
       [  1]], dtype=int32),
1,
array([[ 1],
       [ 4],
       [ 4],
       [ 4],
       [14]], dtype=int32)

scLSTM的输出

1.input for test : a sv s v
2.output of scLSTM: 20 * (penalty + sentence)

# ('gens : ', [(0.13399766372142932, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 77, 15, 132, 1]), (0.14933728562324292, [1, 5, 138, 113, 126, 8, 2, 77, 15, 132, 1]), (0.1549744254930803, [1, 5, 2, 4, 113, 105, 41, 2, 77, 15, 132, 1]), (0.15852280774253719, [1, 5, 138, 113, 126, 15, 132, 1]), (0.19026726564154084, [1, 5, 2, 4, 33, 105, 41, 138, 113, 8, 2, 77, 15, 132, 1]), (0.20233138058456338, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 15, 132, 1]), (0.20940732168664519, [1, 5, 2, 4, 33, 105, 41, 138, 113, 132, 8, 2, 77, 15, 132, 1]), (0.21145074659354887, [1, 5, 2, 4, 113, 105, 41, 138, 132, 1]), (0.21476023212302103, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 14, 2, 77, 15, 132, 1]), (0.22121748802971725, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 138, 132, 1]), (0.22331384956149691, [1, 5, 2, 4, 33, 105, 41, 138, 113, 8, 14, 2, 77, 15, 132, 1]), (0.22577840448106021, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 4, 77, 202, 15, 132, 1]), (0.23494928307737384, [1, 5, 2, 4, 33, 105, 41, 138, 113, 15, 132, 1]), (0.2414091617709512, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 77, 15, 132, 8, 2, 77, 15, 132, 1]), (0.25039630774077898, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 111, 15, 132, 1]), (0.26162732976045144, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 14, 2, 4, 77, 202, 15, 132, 1]), (0.27529049174629477, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 4, 77, 132, 1]), (0.30046496217825086, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 14, 2, 4, 77, 132, 1]), (0.3006930035044566, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 1]), (0.31477015858324936, [1, 5, 2, 4, 33, 105, 41, 138, 113, 126, 8, 2, 77, 15, 1])])
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME serves SLOT_FOOD food and is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a SLOT_FOOD restaurant that is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME serves SLOT_FOOD food for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD and is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD SLOT_GOODFORMEAL and is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a SLOT_FOOD restaurant that serves SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and it is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and serves SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD and it is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is a good place for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is good for SLOT_GOODFORMEAL and is good for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is great for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and it is a good place for SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is a good SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and it is a good SLOT_GOODFORMEAL')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food')
('gen : ', 'SLOT_NAME is a nice restaurant that serves SLOT_FOOD food and is good for')

3.由penalty得到score,选取top-k

Evaluation for scLSTM

Evaluation

_explanation for bleu:_

BLEU

Helic He

uncategorized

RNNLG

数据

scLSTM的输入

scLSTM的输出

Evaluation for scLSTM