News
- 2022.12: We have four papers accepted by EMNLP2022, including
UniNL: Aligning Representation Learning with Scoring Function for OOD Detection via Unified Neighborhood Learning
;Watch the Neighbors: A Unified K-Nearest Neighbor Contrastive Learning Framework for OOD Intent Discovery
;Semi-Supervised Knowledge-Grounded Pre-training for Task-Oriented Dialog Systems
;Disentangling Confidence Score Distribution for Out-of-Domain Intent Detection with Energy-Based Learning
- 2022.8: We have three papers accepted by COLING2022, including
Generalized Intent Discovery: Learning from Open World Dialogue System
;Distribution Calibration for Out-of-Domain Detection with Bayesian Approximation
;PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling
- 2022.8: We have one paper accepted by CIKM2022, including
Unified Knowledge Prompt Pretraining for Customer Service Dialogues
- 2022.4: We have two papers accepted by NAACL2022, including
Revisit Overconfidence for OOD Detection: Reassigned Contrastive Learning with Adaptive Class-dependent Threshold
;Domain-Oriented Prefix-Tuning: Towards Efficient and Generalizable Fine-tuning for Zero-Shot Dialogue Summarization
. - 2022.3: We have one paper accepted by SIGIR2022, including
ADPL: Adversarial Prompt-based Domain Adaptation for Dialogue Summarization with Knowledge Disentanglement
. - 2022.2: We have one paper accepted by ACL2022, including
Disentangled Knowledge Transfer for OOD Intent Discovery with Unified Contrastive Learning
. - 2021.9: We have two papers accepted by EMNLP2021, including
Bridge to Target Domain by Prototypical Contrastive Learning and Label Confusion: Re-explore Zero-Shot Learning for Slot Filling
;A Finer-grain Universal Dialogue Semantic Structures based Model For Abstractive Dialogue Summarization
- 2021.5: We have three papers accepted by ACL2021, including
Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System
;Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning
;Scheduled Dialog Policy Learning: An Automatic Curriculum Learning Framework for Task-oriented Dialog System
. - 2021.3: We have two papers accepted by NAACL2021, including
Adversarial Self-Supervised Learning for Out-of-Domain Detection
;Dynamically Disentangling Social Bias from Task-Oriented Representations with Adversarial Attack
.
Research Area
Currently, I am working on neural conversational AI:
Natural Language Understanding
: Natural language understanding parses (speech) input to the semantic meaning, including intent classification and slot tagging. The tough challenge is the diversity of natural language and poor supervision resources. I’m focusing on transferring external knowledge to enable the few-shot even zero-shot learning. Here the external knowledge is defined as three-fold: Cross-lingual resources, Output latent structure, Background knowledge, etc. Please refer to the following Publication section for details.Dialog Policy Learning
: We take the task-oriented dialogue as the optimal decision-making process to find optimal policy $\pi$, which could be modeled as a typical reinforcement learning(RL) problem. By maximizing average long-term reward, we could learn the optimal action $a$ to state $s$. I’m focusing on improving the user simulator and sampling better user goals.Graph Convolutional Network
: GCNs have demonstrated their effectiveness in capturing graph structure. We propose a novel joint model that applies a graph convolution network over dependency trees to integrate the syntactic structure for learning slot filling and intent detection jointly.Out-of-Domain Detection
: Detecting unknown or OOD (Out-of-Domain) intents from user queries is an essential component that aims to know when a query falls outside their range of predefined supported intents. We focus on the unsupervised OOD detection scenario where there are no labeled OOD samples except for labeled in-domain data and propose a strong generative distance-based classifier to detect OOD samples.Dialogue Summarization
: Traditional document summarization models cannot handle dialogue summarization tasks perfectly because of multiple speakers and complex personal pronouns referential relationships in the conversation. We propose a hierarchical transformer-based model for dialogue summarization. It encodes the dialogue from words to utterances and distinguishes the relationships between speakers and their corresponding personal pronouns clearly. Experiments show that our model can generate summaries more accurately and relieve the confusion of personal pronouns.
Education
- 2021-Now, Working in Meituan Group, Beijing
- 2018-2021, Master in Artificial Intelligence, BEIJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS
- 2014-2018, Bachelor in Communication Engineering, BEIJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS
Experience
Research Intern in Alibaba DAMO, Jun 2020 - Oct 2020:
- Research area in recommendation system.
- I mainly focus on the content-based recommendation system and cold start problems.
Research Intern in Tencent Wechat AI Lab, Mar 2020 - Jun 2020:
- Research area in zero-shot learning and slot filling.
- I mainly focus on zero-shot slot filling and propose a Contrastive Zero-Shot Learning with Adversarial Attack (CZSL-Adv) method.
Research Intern in Meituan NLP Group, Oct 2019 - Mar 2020:
- Research area in GCN and dialogue system.
- I mainly focus on leveraging GCNs to enhance the dialogue system and propose a GCN over dependency trees to integrate syntax for SLU.
Research and engineering Intern in GBSAA, IBM, SEP 2017 - FEB 2018
- Research area in object detection and tracking
- Participated in the sports video analysis system of Ministry of Culture and the General Administration of Sport.
Research Intern in PRIS LAB, MAR 2017 - SEP 2017
- Research area in task-oriented dialogue system and deep reinforcement learning.
- Maintained and organized the Automatic Task-Oriented Dialogue System.
Publication
UniNL: Aligning Representation Learning with Scoring Function for OOD Detection via Unified Neighborhood Learning, EMNLP2022
- Yutao Mou*, Pei Wang*, Keqing He*, Yanan Wu, Jingang Wang, Wei Wu, Weiran Xu
- Abstract: Detecting out-of-domain (OOD) intents from user queries is essential for avoiding wrong operations in task-oriented dialogue systems. The key challenge is how to distinguish in-domain (IND) and OOD intents. Previous methods ignore the alignment between representation learning and scoring function, limit-ing the OOD detection performance. In this paper, we propose a unified neighborhood learning framework (UniNL) to detect OOD intents. Specifically, we design a K-nearest neighbor contrastive learning (KNCL) objective for representation learning and introduce a KNN-based scoring function for OOD detection. We aim to align representation learning with scoring function. Experiments and analysis on two benchmark datasets show the effectiveness of our method.
- paper, code
Watch the Neighbors: A Unified K-Nearest Neighbor Contrastive Learning Framework for OOD Intent Discovery, EMNLP2022 oral
- Yutao Mou*, Keqing He*, Pei Wang, Yanan Wu, Jingang Wang, Wei Wu, Weiran Xu
- Abstract: Discovering out-of-domain (OOD) intent is important for developing new skills in task-oriented dialogue systems. The key challenges lie in how to transfer prior in-domain (IND) knowledge to OOD clustering, as well as jointly learn OOD representations and cluster assignments. Previous methods suffer from in-domain overfitting problem, and there is a natural gap between representation learning and clustering objectives. In this paper, we propose a unified K-nearest neighbor contrastive learning framework to discover OOD intents. Specifically, for IND pre-training stage, we propose a KCL objective to learn inter-class discriminative features, while maintaining intra-class diversity, which alleviates the in-domain overfitting problem. For OOD clustering stage, we propose a KCC method to form compact clusters by mining true hard negative samples, which bridges the gap between clustering and representation learning. Extensive experiments on three benchmark datasets show that our method achieves substantial improvements over the state-of-the-art methods.
- paper, code
Semi-Supervised Knowledge-Grounded Pre-training for Task-Oriented Dialog Systems, EMNLP2022 SereTOD Workshop (Championship of Track II)
- Weihao Zeng*, Keqing He*, Zechen Wang*, Dayuan Fu, Guanting Dong, Ruotong Geng, Pei Wang, Jingang Wang, Chaobo Sun, Wei Wu, Weiran Xu
- Abstract: Traditional intent classification models are based on a pre-defined intent set and only recognize limited in-domain (IND) intent classes. But users may input out-of-domain (OOD) queries in a practical dialogue system. Such OOD queries can provide directions for future improvement. In this paper, we define a new task, Generalized Intent Discovery (GID), which aims to extend an IND intent classifier to an open-world intent set including IND and OOD intents. We hope to simultaneously classify a set of labeled IND intent classes while discovering and recognizing new unlabeled OOD types incrementally. We construct three public datasets for different application scenarios and propose two kinds of frameworks, pipeline-based and end-to-end for future work. Further, we conduct exhaustive experiments and qualitative analysis to comprehend key challenges and provide new guidance for future GID research.
- paper, code
Disentangling Confidence Score Distribution for Out-of-Domain Intent Detection with Energy-Based Learning, EMNLP2022 SereTOD Workshop
- Yanan Wu*, Zhiyuan Zeng*, Keqing He*, Yutao Mou, Pei Wang, Yuanmeng Yan, Weiran Xu
- Abstract: Out-of-Domain (OOD) detection is a key component in a task-oriented dialog system, which aims to identify whether a query falls outside the predefined supported intent set. Previous softmax-based detection algorithms are proved to be overconfident for OOD samples. In this paper, we analyze overconfident OOD comes from distribution uncertainty due to the mismatch between the training and test distributions, which makes the model can’t confidently make predictions thus probably causes abnormal softmax scores. We propose a Bayesian OOD detection framework to calibrate distribution uncertainty using Monte-Carlo Dropout. Our method is flexible and easily pluggable to existing softmax-based baselines and gains 33.33% OOD F1 improvements with increasing only 0.41% inference time compared to MSP. Further analyses show the effectiveness of Bayesian learning for OOD detection
- paper, code
Distribution Calibration for Out-of-Domain Detection with Bayesian Approximation, COLING2022
- Yanan Wu*, Zhiyuan Zeng*, Keqing He*, Yutao Mou, Pei Wang, Weiran Xu
- Abstract: Out-of-Domain (OOD) detection is a key component in a task-oriented dialog system, which aims to identify whether a query falls outside the predefined supported intent set. Previous softmax-based detection algorithms are proved to be overconfident for OOD samples. In this paper, we analyze overconfident OOD comes from distribution uncertainty due to the mismatch between the training and test distributions, which makes the model can’t confidently make predictions thus probably causing abnormal softmax scores. We propose a Bayesian OOD detection framework to calibrate distribution uncertainty using Monte-Carlo Dropout. Our method is flexible and easily pluggable into existing softmax-based baselines and gains 33.33% OOD F1 improvements with increasing only 0.41% inference time compared to MSP. Further analyses show the effectiveness of Bayesian learning for OOD detection.
- paper, code
PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling, COLING2022
- Guanting Dong*, Daichi Guo*, LiWen Wang*, Xuefeng Li*, Zechen Wang, Chen Zeng, Keqing He, Jinzheng Zhao, Hao Lei, Xinyue Cui, Yi Huang, Junlan Feng, Weiran Xu
- Abstract: Most existing slot filling models tend to memorize inherent patterns of entities and corresponding contexts from training data. However, these models can lead to system failure or undesirable outputs when being exposed to spoken language perturbation or variation in practice. We propose a perturbed semantic structure awareness transferring method for training perturbation-robust slot filling models. Specifically, we introduce two MLM-based training strategies to respectively learn contextual semantic structure and word distribution from unsupervised language perturbation corpus. Then, we transfer semantic knowledge learned from upstream training procedure into the original samples and filter generated data by consistency processing. These procedures aim to enhance the robustness of slot filling models. Experimental results show that our method consistently outperforms the previous basic methods and gains strong generalization while preventing the model from memorizing inherent patterns of entities and contexts.
- paper
Unified Knowledge Prompt Pretraining for Customer Service Dialogues, CIKM2022
- Keqing He, Jingang Wang, Chaobo Sun, Wei Wu
- Abstract: Dialogue bots have been widely applied in customer service scenarios to provide timely and user-friendly experience. These bots must classify the appropriate domain of a dialogue, understand the intent of users, and generate proper responses. Existing dialogue pre-training models are designed only for several dialogue tasks and ignore weakly-supervised expert knowledge in customer service dialogues. In this paper, we propose a novel unified knowledge prompt pre-training framework, UFA (Unified Model For All Tasks), for customer service dialogues. We formulate all the tasks of customer service dialogues as a unified text-to-text generation task and introduce a knowledge-driven prompt strategy to jointly learn from a mixture of distinct dialogue tasks. We pre-train UFA on a large-scale Chinese customer service corpus collected from practical scenarios and get significant improvements on both natural language understanding (NLU) and natural language generation (NLG) benchmarks.
- paper
Domain-Oriented Prefix-Tuning: Towards Efficient and Generalizable Fine-tuning for Zero-Shot Dialogue Summarization, NAACL2022 oral
- Lulu Zhao*, Fujia Zheng*, Weihao Zeng, Keqing He, Weiran Xu, Huixing Jiang, Wei Wu, Yanan Wu
- Abstract: The most advanced abstractive dialogue summarizers lack generalization ability on new domains and the existing researches for domain adaptation in summarization generally rely on large-scale pre-trainings. To explore the lightweight fine-tuning methods for domain adaptation of dialogue summarization, in this paper, we propose an efficient and generalizable Domain-Oriented Prefix-tuning model, which utilizes a domain word initialized prefix module to alleviate domain entanglement and adopts discrete prompts to guide the model to focus on key contents of dialogues and enhance model generalization. We conduct zero-shot experiments and build domain adaptation benchmarks on two multi-domain dialogue summarization datasets, TODSum and QMSum. Adequate experiments and qualitative analysis prove the effectiveness of our methods.
- paper, code
Revisit Overconfidence for OOD Detection: Reassigned Contrastive Learning with Adaptive Class-dependent Threshold, NAACL2022
- Yanan Wu*, Keqing He*, Yuanmeng Yan, Qixiang Gao, Zhiyuan Zeng, Fujia Zheng, Lulu Zhao, Huixing Jiang, Wei Wu, Weiran Xu
- Abstract: Detecting Out-of-Domain (OOD) or unknown intents from user queries is essential in a task-oriented dialog system. A key challenge of OOD detection is the overconfidence of neural models. In this paper, we comprehensively analyze overconfidence and classify it into two perspectives: over-confident OOD and in-domain (IND). Then according to intrinsic reasons, we respectively propose a novel reassigned contrastive learning (RCL) to discriminate IND intents for over-confident OOD and an adaptive class-dependent local threshold mechanism to separate similar IND and OOD intents for over-confident IND. Experiments and analyses show the effectiveness of our proposed method for both aspects of overconfidence issues.
- paper, code
ADPL: Adversarial Prompt-based Domain Adaptation for Dialogue Summarization with Knowledge Disentanglement, SIGIR2022
- Lulu Zhao*, Fujia Zheng*, Weihao Zeng, Keqing He, Ruotong Geng, Huixing Jiang, Wei Wu, Weiran Xu
- Abstract: Traditional dialogue summarization models rely on a large-scale manually-labeled corpus, lacking generalization ability to new domains, and domain adaptation from a labeled source domain to an unlabeled target domain is important in practical summarization scenarios. However, existing domain adaptation works in dialogue summarization generally require large-scale pre-training using extensive external data. To explore the lightweight fine-tuning methods, in this paper, we propose an efficient Adversarial Disentangled Prompt Learning (ADPL) model for domain adaptation in dialogue summarization. We introduce three kinds of prompts including domain-invariant prompt (DIP), domain-specific prompt (DSP), and task-oriented prompt (TOP). DIP aims to disentangle and transfer the shared knowledge from the source domain and target domain in an adversarial way, which improves the accuracy of prediction about domain-invariant information and enhances the ability for generalization to new domains. DSP is designed to guide our model to focus on domain-specific knowledge using domain-related features. TOP is to capture task-oriented knowledge to generate high-quality summaries. Instead of fine-tuning the whole pre-trained language model (PLM), we only update the prompt networks but keep PLM fixed. We conduct zero-shot experiments and build domain adaptation benchmarks on two multi-domain dialogue summarization datasets, TODSum and QMSum. Adequate experiments and analysis prove our method significantly outperforms full-parameter fine-tuning with even fewer parameters.
- paper
Disentangled Knowledge Transfer for OOD Intent Discovery with Unified Contrastive Learning, ACL2022
- Yutao Mou*, Keqing He*, Yanan Wu*, Zhiyuan Zeng, Hong Xu, Huixing Jiang, Wei Wu, Weiran Xu
- Abstract: Discovering Out-of-Domain(OOD) intents is essential for developing new skills in a task-oriented dialogue system. The key challenge is how to transfer prior IND knowledge to OOD clustering. Different from existing work based on shared intent representation, we propose a novel disentangled knowledge transfer method via a unified multi-head contrastive learning framework. We aim to bridge the gap between IND pre-training and OOD clustering. Experiments and analysis on two benchmark datasets show the effectiveness of our method.
- paper, code
Bridge to Target Domain by Prototypical Contrastive Learning and Label Confusion: Re-explore Zero-Shot Learning for Slot Filling, EMNLP2021 oral
- Liwen Wang*, Xuefeng Li*, Jiachi Liu, Keqing He, Yuanmeng Yan, Weiran Xu
- Abstract: Zero-shot cross-domain slot filling alleviates the data dependence in the case of data scarcity in the target domain, which has aroused extensive research. However, as most of the existing methods do not achieve effective knowledge transfer to the target domain, they just fit the distribution of the seen slot and show poor performance on unseen slot in the target domain. To solve this, we propose a novel approach based on prototypical contrastive learning with a dynamic label confusion strategy for zero-shot slot filling. The prototypical contrastive learning aims to reconstruct the semantic constraints of labels, and we introduce the label confusion strategy to establish the label dependence between the source domains and the target domain on-the-fly. Experimental results show that our model achieves significant improvement on the unseen slots, while also set new state-of-the-arts on slot filling task.
- paper, code
A Finer-grain Universal Dialogue Semantic Structures based Model For Abstractive Dialogue Summarization, EMNLP2021 Findings
- Yuejie Lei*, Fujia Zheng*, Yuanmeng Yan, Keqing He, Weiran Xu
- Abstract: Although abstractive summarization models have achieved impressive results on document summarization tasks, their performance on dialogue modeling is much less satisfactory due to the crude and straight methods for dialogue encoding. To address this question, we propose a novel end-to-end Transformer-based model FinDS for abstractive dialogue summarization that leverages Finer-grain universal Dialogue semantic Structures to model dialogue and generates better summaries. Experiments on the SAMsum dataset show that FinDS outperforms various dialogue summarization approaches and achieves new state-of-the-art (SOTA) ROUGE results. Finally, we apply FinDS to a more complex scenario, showing the robustness of our model.
- paper, code
Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System, ACL2021 oral
- Yanan Wu*, Zhiyuan Zeng*, Keqing He*, Hong Xu, Yuanmeng Yan, Huixing Jiang and Weiran Xu
- Abstract: Existing slot filling models can only recognize pre-defined in-domain slot types from a limited slot set. In the practical application, a reliable dialogue system should know what it does not know. In this paper, we introduce a new task, Novel Slot Detection (NSD), in the task-oriented dialogue system. NSD aims to discover unknown or out-of-domain slot types to strengthen the capability of a dialogue system based on in-domain training data. Besides, we construct two public NSD datasets, propose several strong NSD baselines, and establish a benchmark for future work. Finally, we conduct exhaustive experiments and qualitative analysis to comprehend key challenges and provide new guidance for future directions.
- paper, code
Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning, ACL2021
- Zhiyuan Zeng*, Keqing He*, Yuanmeng Yan, Zijun Liu, Yanan Wu, Hong Xu, Huixing Jiang and Weiran Xu
- Abstract: Detecting Out-of-Domain (OOD) or unknown intents from user queries is essential in a task-oriented dialog system. A key challenge of OOD detection is to learn discriminative semantic features. Traditional cross-entropy loss only focuses on whether a sample is correctly classified, and does not explicitly distinguish the margins between categories. In this paper, we propose a supervised contrastive learning objective to minimize intra-class variance by pulling together in-domain intents belonging to the same class and maximize inter-class variance by pushing apart samples from different classes. Besides, we employ an adversarial augmentation mechanism to obtain pseudo diverse views of a sample in the latent space. Experiments on two public datasets prove the effectiveness of our method capturing discriminative representations for OOD detection.
- paper, code
Scheduled Dialog Policy Learning: An Automatic Curriculum Learning Framework for Task-oriented Dialog System, ACL2021 Findings
- Sihong Liu, Jinchao Zhang, Keqing He, Weiran Xu and Jie Zhou
- Abstract: In reinforcement learning (RL) based task-oriented dialogue systems, users act as the environment and the agent learns the policy by interacting with users. However, due to the subjectivity of different users, the complexity of user-generated training conversations varies greatly, which leads to different difficulties for the agent to learn. Therefore, it is necessary for modeling dialogue complexity and make a reasonable learning schedule for efficiently training the agent. Towards that, we propose Scheduled Dialog Policy Learning, an automatic curriculum learning framework for jointing curriculum learning and policy optimization in the task-oriented dialog system. To our best knowledge, it is the first RL framework that improves dialogue policy learning by scheduling its learning process. Specifically, we introduce an automatic measurement to evaluate the dialogue complexity, and based on this automatic measurement, we train the dialog agent from easy dialogues to complex ones. Experiments demonstrate that our approach can be applied to the task-oriented dialogue policy learning and outperforms the previous state-of-the-art model, which increases 9.6% and 10.0% in the accuracy on the dialog success rate, respectively on the MultiWoz and Movie-Ticket Booking datasets.
- paper
Adversarial Self-Supervised Learning for Out-of-Domain Detection, NAACL2021 oral
- Zhiyuan Zeng, Keqing He, Yuanmeng Yan, Hong Xu, Weiran Xu
- Abstract: Detecting out-of-domain (OOD) intents is crucial for the deployed task-oriented dialogue system. Previous unsupervised OOD detection methods only extract discriminative features of different in-domain intents while supervised counterparts can directly distinguish OOD and in-domain intents but require extensive labeled OOD data. To combine the benefits of both types, we propose a self-supervised contrastive learning framework to model discriminative semantic features of both in-domain intents and OOD intents from unlabeled data. Besides, we introduce an adversarial augmentation neural module to improve the efficiency and robustness of contrastive learning. Experiments on two public benchmark datasets show that our method can consistently outperform the baselines with a statistically significant margin.
- paper, code
Dynamically Disentangling Social Bias from Task-Oriented Representations with Adversarial Attack, NAACL2021
- Liwen Wang*, Yuanmeng Yan*, Keqing He, Yanan Wu, Weiran Xu
- Abstract: Representation learning is widely used in NLP for a vast range of tasks. However, representations derived from text corpora often reflect social biases. This phenomenon is pervasive and consistent across different neural models, causing serious concern. Previous methods mostly rely on a pre-specified, user-provided direction or suffer from unstable training. In this paper, we propose an adversarial disentangled debiasing model to dynamically decouple social bias attributes from the intermediate representations trained on the main task. We aim to denoise bias information while training on the downstream task, rather than completely remove social bias and pursue static unbiased representations. Experiments show the effectiveness of our method, both on the effect of debiasing and the main task performance.
- paper, code
Hierarchical Speaker-Aware Sequence-to-Sequence Model for Dialogue Summarization, ICASSP2021
- Yuejie Lei, Yuanmeng Yan, Zhiyuan Zeng, Keqing He, Ximing Zhang, Weiran Xu
- Abstract: Traditional document summarization models cannot handle dialogue summarization tasks perfectly. In situations with multiple speakers and complex personal pronouns referential relationships in the conversation. The predicted summaries of these models are always full of personal pronoun confusion. In this paper, we propose a hierarchical transformer-based model for dialogue summarization. It encodes dialogues from words to utterances and distinguishes the relationships between speakers and their corresponding personal pronouns clearly. In such a from-coarse-to-fine procedure, our model can generate summaries more accurately and relieve the confusion of personal pronouns. Experiments are based on a dialogue summarization dataset SAMsum, and the results show that the proposed model achieved a comparable result against other strong baselines. Empirical experiments have shown that our method can relieve the confusion of personal pronouns in predicted summaries.
- paper
Adversarial Generative Distance-Based Classifier for Robust Out-of-Domain Detection, ICASSP2021
- Zhiyuan Zeng*, Hong Xu*, Keqing He, Yuanmeng Yan, Sihong Liu, Zijun Liu, Weiran Xu
- Abstract: Detecting out-of-domain (OOD) intents is critical in a task-oriented dialog system. Existing methods rely heavily on extensive manually labeled OOD samples and lack robustness. In this paper, we propose an efficient adversarial attack mechanism to augment hard OOD samples and design a novel generative distance-based classifier to detect OOD samples instead of a traditional threshold-based discriminator classifier. Experiments on two public benchmark datasets show that our method can consistently outperform the baselines with a statistically significant margin.
- paper
Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack, COLING2020 oral
- Keqing He, Jinchao Zhang, Yuanmeng Yan, Weiran XU, Cheng Niu, Jie Zhou
- Abstract: Zero-shot slot filling has widely arisen to cope with data scarcity in target domains. However, previous approaches often ignore constraints between slot value representation and related slot description representation in the latent space and lack enough model robustness. In this paper, we propose a Contrastive Zero-Shot Learning with Adversarial Attack (CZSL-Adv) method for the cross-domain slot filling. The contrastive loss aims to map slot value contextual representations to the corresponding slot description representations. And we introduce an adversarial attack training strategy to improve model robustness. Experimental results show that our model significantly outperforms state-of-the-art baselines under both zero-shot and few-shot settings.
- paper
Syntactic Graph Convolution Network for Spoken Language Understanding, COLING2020
- Keqing He*, Shuyu Lei*, Jiangnan Xia, Yushu Yang, Huixing Jiang, Zhongyuan Wang
- Abstract: Slot filling and intent detection are two major tasks for spoken language understanding. In most existing work, these two tasks are built as joint models with multi-task learning with no consideration of prior linguistic knowledge. In this paper, we propose a novel joint model that applies a graph convolutional network over dependency trees to integrate the syntactic structure for learning slot filling and intent detection jointly. Experimental results show that our proposed model achieves state-of-the-art performance on two public benchmark datasets and outperforms existing work. At last, we apply the BERT model to further improve the performance on both slot filling and intent detection.
- paper
A Deep Generative Distance-Based Classifier for Out-of-Domain Detection with Mahalanobis Space, COLING2020 oral
- Hong Xu, Keqing He, Yuanmeng Yan, Sihong Liu, Zijun Liu, Weiran XU
- Abstract: Detecting out-of-domain (OOD) input intents is critical in the task-oriented dialog system. Different from most existing methods that rely heavily on manually labeled OOD samples, we focus on the unsupervised OOD detection scenario where there are no labeled OOD samples except for labeled in-domain data. In this paper, we propose a simple but strong generative distance-based classifier to detect OOD samples. We estimate the class-conditional distribution on feature spaces of DNNs via Gaussian discriminant analysis (GDA) to avoid over-confidence problems. And we use two distance functions, Euclidean and Mahalanobis distances, to measure the confidence score of whether a test sample belongs to OOD. Experiments on four benchmark datasets show that our method can consistently outperform the baselines.
- paper, code
Adversarial Semantic Decoupling for Recognizing Open-Vocabulary Slots, EMNLP2020 oral
- Yuanmeng Yan*, Keqing He*, Hong Xu, Sihong Liu, Fanyu Meng, Min Hu, Weiran XU
- Abstract: Open-vocabulary slots, such as file name, album name, or schedule title, significantly degrade the performance of neural-based slot filling models since these slots can take on values from a virtually unlimited set and have no semantic restriction nor a length limit. In this paper, we propose a robust adversarial model-agnostic slot filling method that explicitly decouples local semantics inherent in open-vocabulary slot words from the global context. We aim to depart entangled contextual semantics and focus more on the holistic context at the level of the whole sentence. Experiments on two public datasets show that our method consistently outperforms other methods with a statistically significant margin on all the open-vocabulary slots without deteriorating the performance of normal slots.
- paper, code
Learning to Tag OOV Tokens by Integrating Contextual Representation and Background Knowledge, ACL2020
- Keqing He, Yuanmeng Yan, Hong Xu, Sihong Liu, Weiran Xu
- Abstract: Neural-based context-aware models for slot tagging have achieved state-of-the-art performance. However, the presence of OOV(out-of-vocab) words significantly degrades the performance of neural-based models, especially in a few-shot scenario. In this paper, we propose a novel knowledge-enhanced slot tagging model to integrate contextual representation of input text and the large-scale lexical background knowledge. Besides, we use multi-level graph attention to explicitly model lexical relations. The experiments show that our proposed knowledge integration mechanism achieves consistent improvements across settings with different sizes of training data on two public benchmark datasets.
- paper
Learning Label-Relational Output Structure for Adaptive Sequence Labeling, IJCNN2020
- Keqing He, Yuanmeng Yan, Hong Xu, Weiran Xu
- Abstract: Sequence labeling is a fundamental task of natural language understanding. Recent neural models for sequence labeling task achieve significant success with the availability of sufficient training data. However, in practical scenarios, entity types to be annotated even in the same domain are continuously evolving. To transfer knowledge from the source model pre-trained on previously annotated data, we propose an approach which learns label-relational output structure to explicitly capturing label correlations in the latent space. Additionally, we construct the target-to-source interaction between the source model MS and the target model MT and apply a gate mechanism to control how much information in MS and MT should be passed down. Experiments show that our method consistently outperforms the state-of-the-art methods with a statistically significant margin and effectively facilitates to recognize rare new entities in the target data especially.
- paper
Contact
- Address: Beijing, China
- Email: kqin@bupt.cn
- Blog: https://helicqin.github.io