- 2021.9: We have two papers accepted by EMNLP2021, including
Bridge to Target Domain by Prototypical Contrastive Learning and Label Confusion: Re-explore Zero-Shot Learning for Slot Filling;
A Finer-grain Universal Dialogue Semantic Structures based Model For Abstractive Dialogue Summarization
- 2021.5: We have three papers accepted by ACL2021, including
Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System;
Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning;
Scheduled Dialog Policy Learning: An Automatic Curriculum Learning Framework for Task-oriented Dialog System.
- 2021.3: We have two papers accepted by NAACL2021, including
Adversarial Self-Supervised Learning for Out-of-Domain Detection;
Dynamically Disentangling Social Bias from Task-Oriented Representations with Adversarial Attack.
Currently, I am working on neural conversational AI:
Natural Language Understanding: Natural language understanding parses (speech) input to the semantic meaning, including intent classification and slot tagging. The tough challenge is the diversity of natural language and poor supervision resources. I’m focusing on transferring external knowledge to enable the few-shot even zero-shot learning. Here the external knowledge is defined as three-fold: Cross-lingual resources, Output latent structure, Background knowledge, etc. Please refer to the following Publication section for details.
Dialog Policy Learning: We take the task-oriented dialogue as the optimal decision-making process to find optimal policy $\pi$, which could be modeled as a typical reinforcement learning(RL) problem. By maximizing average long-term reward, we could learn the optimal action $a$ to state $s$. I’m focusing on improving the user simulator and sampling better user goals.
Graph Convolutional Network: GCNs have demonstrated their effectiveness in capturing graph structure. We propose a novel joint model that applies a graph convolution network over dependency trees to integrate the syntactic structure for learning slot filling and intent detection jointly.
Out-of-Domain Detection: Detecting unknown or OOD (Out-of-Domain) intents from user queries is an essential component that aims to know when a query falls outside their range of predefined supported intents. We focus on the unsupervised OOD detection scenario where there are no labeled OOD samples except for labeled in-domain data and propose a strong generative distance-based classifier to detect OOD samples.
Dialogue Summarization: Traditional document summarization models cannot handle dialogue summarization tasks perfectly because of multiple speakers and complex personal pronouns referential relationships in the conversation. We propose a hierarchical transformer-based model for dialogue summarization. It encodes the dialogue from words to utterances and distinguishes the relationships between speakers and their corresponding personal pronouns clearly. Experiments show that our model can generate summaries more accurately and relieve the confusion of personal pronouns.
- 2018-now, Master in Artificial Intelligence, BEIJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS
- 2014-2018, Bachelor in Communication Engineering, BEIJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS
Research Intern in Alibaba DAMO, Jun 2020 - Oct 2020:
- Research area in recommendation system.
- I mainly focus on the content-based recommendation system and cold start problems.
Research Intern in Tencent Wechat AI Lab, Mar 2020 - Jun 2020:
- Research area in zero-shot learning and slot filling.
- I mainly focus on zero-shot slot filling and propose a Contrastive Zero-Shot Learning with Adversarial Attack (CZSL-Adv) method.
Research Intern in Meituan NLP Group, Oct 2019 - Mar 2020:
- Research area in GCN and dialogue system.
- I mainly focus on leveraging GCNs to enhance the dialogue system and propose a GCN over dependency trees to integrate syntax for SLU.
Research and engineering Intern in GBSAA, IBM, SEP 2017 - FEB 2018
- Research area in object detection and tracking
- Participated in the sports video analysis system of Ministry of Culture and the General Administration of Sport.
Research Intern in PRIS LAB, MAR 2017 - SEP 2017
- Research area in task-oriented dialogue system and deep reinforcement learning.
- Maintained and organized the Automatic Task-Oriented Dialogue System.
Bridge to Target Domain by Prototypical Contrastive Learning and Label Confusion: Re-explore Zero-Shot Learning for Slot Filling, EMNLP2021 oral
- Liwen Wang*, Xuefeng Li*, Jiachi Liu, Keqing He, Yuanmeng Yan, Weiran Xu
- Abstract: Zero-shot cross-domain slot filling alleviates the data dependence in the case of data scarcity in the target domain, which has aroused extensive research. However, as most of the existing methods do not achieve effective knowledge transfer to the target domain, they just fit the distribution of the seen slot and show poor performance on unseen slot in the target domain. To solve this, we propose a novel approach based on prototypical contrastive learning with a dynamic label confusion strategy for zero-shot slot filling. The prototypical contrastive learning aims to reconstruct the semantic constraints of labels, and we introduce the label confusion strategy to establish the label dependence between the source domains and the target domain on-the-fly. Experimental results show that our model achieves significant improvement on the unseen slots, while also set new state-of-the-arts on slot filling task.
A Finer-grain Universal Dialogue Semantic Structures based Model For Abstractive Dialogue Summarization, EMNLP2021 Findings
- Yuejie Lei*, Fujia Zheng*, Yuanmeng Yan, Keqing He, Weiran Xu
- Abstract: Although abstractive summarization models have achieved impressive results on document summarization tasks, their performance on dialogue modeling is much less satisfactory due to the crude and straight methods for dialogue encoding. To address this question, we propose a novel end-to-end Transformer-based model FinDS for abstractive dialogue summarization that leverages Finer-grain universal Dialogue semantic Structures to model dialogue and generates better summaries. Experiments on the SAMsum dataset show that FinDS outperforms various dialogue summarization approaches and achieves new state-of-the-art (SOTA) ROUGE results. Finally, we apply FinDS to a more complex scenario, showing the robustness of our model.
Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System, ACL2021 oral
- Yanan Wu*, Zhiyuan Zeng*, Keqing He*, Hong Xu, Yuanmeng Yan, Huixing Jiang and Weiran Xu
- Abstract: Existing slot filling models can only recognize pre-defined in-domain slot types from a limited slot set. In the practical application, a reliable dialogue system should know what it does not know. In this paper, we introduce a new task, Novel Slot Detection (NSD), in the task-oriented dialogue system. NSD aims to discover unknown or out-of-domain slot types to strengthen the capability of a dialogue system based on in-domain training data. Besides, we construct two public NSD datasets, propose several strong NSD baselines, and establish a benchmark for future work. Finally, we conduct exhaustive experiments and qualitative analysis to comprehend key challenges and provide new guidance for future directions.
- paper, code
Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning, ACL2021
- Zhiyuan Zeng*, Keqing He*, Yuanmeng Yan, Zijun Liu, Yanan Wu, Hong Xu, Huixing Jiang and Weiran Xu
- Abstract: Detecting Out-of-Domain (OOD) or unknown intents from user queries is essential in a task-oriented dialog system. A key challenge of OOD detection is to learn discriminative semantic features. Traditional cross-entropy loss only focuses on whether a sample is correctly classified, and does not explicitly distinguish the margins between categories. In this paper, we propose a supervised contrastive learning objective to minimize intra-class variance by pulling together in-domain intents belonging to the same class and maximize inter-class variance by pushing apart samples from different classes. Besides, we employ an adversarial augmentation mechanism to obtain pseudo diverse views of a sample in the latent space. Experiments on two public datasets prove the effectiveness of our method capturing discriminative representations for OOD detection.
- paper, code
Scheduled Dialog Policy Learning: An Automatic Curriculum Learning Framework for Task-oriented Dialog System, ACL2021 Findings
- Sihong Liu, Jinchao Zhang, Keqing He, Weiran Xu and Jie Zhou
- Abstract: In reinforcement learning (RL) based task-oriented dialogue systems, users act as the environment and the agent learns the policy by interacting with users. However, due to the subjectivity of different users, the complexity of user-generated training conversations varies greatly, which leads to different difficulties for the agent to learn. Therefore, it is necessary for modeling dialogue complexity and make a reasonable learning schedule for efficiently training the agent. Towards that, we propose Scheduled Dialog Policy Learning, an automatic curriculum learning framework for jointing curriculum learning and policy optimization in the task-oriented dialog system. To our best knowledge, it is the first RL framework that improves dialogue policy learning by scheduling its learning process. Specifically, we introduce an automatic measurement to evaluate the dialogue complexity, and based on this automatic measurement, we train the dialog agent from easy dialogues to complex ones. Experiments demonstrate that our approach can be applied to the task-oriented dialogue policy learning and outperforms the previous state-of-the-art model, which increases 9.6% and 10.0% in the accuracy on the dialog success rate, respectively on the MultiWoz and Movie-Ticket Booking datasets.
Adversarial Self-Supervised Learning for Out-of-Domain Detection, NAACL2021 oral
- Zhiyuan Zeng, Keqing He, Yuanmeng Yan, Hong Xu, Weiran Xu
- Abstract: Detecting out-of-domain (OOD) intents is crucial for the deployed task-oriented dialogue system. Previous unsupervised OOD detection methods only extract discriminative features of different in-domain intents while supervised counterparts can directly distinguish OOD and in-domain intents but require extensive labeled OOD data. To combine the benefits of both types, we propose a self-supervised contrastive learning framework to model discriminative semantic features of both in-domain intents and OOD intents from unlabeled data. Besides, we introduce an adversarial augmentation neural module to improve the efficiency and robustness of contrastive learning. Experiments on two public benchmark datasets show that our method can consistently outperform the baselines with a statistically significant margin.
- paper, code
Dynamically Disentangling Social Bias from Task-Oriented Representations with Adversarial Attack, NAACL2021
- Liwen Wang*, Yuanmeng Yan*, Keqing He, Yanan Wu, Weiran Xu
- Abstract: Representation learning is widely used in NLP for a vast range of tasks. However, representations derived from text corpora often reflect social biases. This phenomenon is pervasive and consistent across different neural models, causing serious concern. Previous methods mostly rely on a pre-specified, user-provided direction or suffer from unstable training. In this paper, we propose an adversarial disentangled debiasing model to dynamically decouple social bias attributes from the intermediate representations trained on the main task. We aim to denoise bias information while training on the downstream task, rather than completely remove social bias and pursue static unbiased representations. Experiments show the effectiveness of our method, both on the effect of debiasing and the main task performance.
- paper, code
Hierarchical Speaker-Aware Sequence-to-Sequence Model for Dialogue Summarization, ICASSP2021
- Yuejie Lei, Yuanmeng Yan, Zhiyuan Zeng, Keqing He, Ximing Zhang, Weiran Xu
- Abstract: Traditional document summarization models cannot handle dialogue summarization tasks perfectly. In situations with multiple speakers and complex personal pronouns referential relationships in the conversation. The predicted summaries of these models are always full of personal pronoun confusion. In this paper, we propose a hierarchical transformer-based model for dialogue summarization. It encodes dialogues from words to utterances and distinguishes the relationships between speakers and their corresponding personal pronouns clearly. In such a from-coarse-to-fine procedure, our model can generate summaries more accurately and relieve the confusion of personal pronouns. Experiments are based on a dialogue summarization dataset SAMsum, and the results show that the proposed model achieved a comparable result against other strong baselines. Empirical experiments have shown that our method can relieve the confusion of personal pronouns in predicted summaries.
Adversarial Generative Distance-Based Classifier for Robust Out-of-Domain Detection, ICASSP2021
- Zhiyuan Zeng*, Hong Xu*, Keqing He, Yuanmeng Yan, Sihong Liu, Zijun Liu, Weiran Xu
- Abstract: Detecting out-of-domain (OOD) intents is critical in a task-oriented dialog system. Existing methods rely heavily on extensive manually labeled OOD samples and lack robustness. In this paper, we propose an efficient adversarial attack mechanism to augment hard OOD samples and design a novel generative distance-based classifier to detect OOD samples instead of a traditional threshold-based discriminator classifier. Experiments on two public benchmark datasets show that our method can consistently outperform the baselines with a statistically significant margin.
Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack, COLING2020 oral
- Keqing He, Jinchao Zhang, Yuanmeng Yan, Weiran XU, Cheng Niu, Jie Zhou
- Abstract: Zero-shot slot filling has widely arisen to cope with data scarcity in target domains. However, previous approaches often ignore constraints between slot value representation and related slot description representation in the latent space and lack enough model robustness. In this paper, we propose a Contrastive Zero-Shot Learning with Adversarial Attack (CZSL-Adv) method for the cross-domain slot filling. The contrastive loss aims to map slot value contextual representations to the corresponding slot description representations. And we introduce an adversarial attack training strategy to improve model robustness. Experimental results show that our model significantly outperforms state-of-the-art baselines under both zero-shot and few-shot settings.
Syntactic Graph Convolution Network for Spoken Language Understanding, COLING2020
- Keqing He*, Shuyu Lei*, Jiangnan Xia, Yushu Yang, Huixing Jiang, Zhongyuan Wang
- Abstract: Slot filling and intent detection are two major tasks for spoken language understanding. In most existing work, these two tasks are built as joint models with multi-task learning with no consideration of prior linguistic knowledge. In this paper, we propose a novel joint model that applies a graph convolutional network over dependency trees to integrate the syntactic structure for learning slot filling and intent detection jointly. Experimental results show that our proposed model achieves state-of-the-art performance on two public benchmark datasets and outperforms existing work. At last, we apply the BERT model to further improve the performance on both slot filling and intent detection.
A Deep Generative Distance-Based Classifier for Out-of-Domain Detection with Mahalanobis Space, COLING2020 oral
- Hong Xu, Keqing He, Yuanmeng Yan, Sihong Liu, Zijun Liu, Weiran XU
- Abstract: Detecting out-of-domain (OOD) input intents is critical in the task-oriented dialog system. Different from most existing methods that rely heavily on manually labeled OOD samples, we focus on the unsupervised OOD detection scenario where there are no labeled OOD samples except for labeled in-domain data. In this paper, we propose a simple but strong generative distance-based classifier to detect OOD samples. We estimate the class-conditional distribution on feature spaces of DNNs via Gaussian discriminant analysis (GDA) to avoid over-confidence problems. And we use two distance functions, Euclidean and Mahalanobis distances, to measure the confidence score of whether a test sample belongs to OOD. Experiments on four benchmark datasets show that our method can consistently outperform the baselines.
- paper, code
Adversarial Semantic Decoupling for Recognizing Open-Vocabulary Slots, EMNLP2020 oral
- Yuanmeng Yan*, Keqing He*, Hong Xu, Sihong Liu, Fanyu Meng, Min Hu, Weiran XU
- Abstract: Open-vocabulary slots, such as file name, album name, or schedule title, significantly degrade the performance of neural-based slot filling models since these slots can take on values from a virtually unlimited set and have no semantic restriction nor a length limit. In this paper, we propose a robust adversarial model-agnostic slot filling method that explicitly decouples local semantics inherent in open-vocabulary slot words from the global context. We aim to depart entangled contextual semantics and focus more on the holistic context at the level of the whole sentence. Experiments on two public datasets show that our method consistently outperforms other methods with a statistically significant margin on all the open-vocabulary slots without deteriorating the performance of normal slots.
- paper, code
Learning to Tag OOV Tokens by Integrating Contextual Representation and Background Knowledge, ACL2020
- Keqing He, Yuanmeng Yan, Hong Xu, Sihong Liu, Weiran Xu
- Abstract: Neural-based context-aware models for slot tagging have achieved state-of-the-art performance. However, the presence of OOV(out-of-vocab) words significantly degrades the performance of neural-based models, especially in a few-shot scenario. In this paper, we propose a novel knowledge-enhanced slot tagging model to integrate contextual representation of input text and the large-scale lexical background knowledge. Besides, we use multi-level graph attention to explicitly model lexical relations. The experiments show that our proposed knowledge integration mechanism achieves consistent improvements across settings with different sizes of training data on two public benchmark datasets.
Learning Label-Relational Output Structure for Adaptive Sequence Labeling, IJCNN2020
- Keqing He, Yuanmeng Yan, Hong Xu, Weiran Xu
- Abstract: Sequence labeling is a fundamental task of natural language understanding. Recent neural models for sequence labeling task achieve significant success with the availability of sufficient training data. However, in practical scenarios, entity types to be annotated even in the same domain are continuously evolving. To transfer knowledge from the source model pre-trained on previously annotated data, we propose an approach which learns label-relational output structure to explicitly capturing label correlations in the latent space. Additionally, we construct the target-to-source interaction between the source model MS and the target model MT and apply a gate mechanism to control how much information in MS and MT should be passed down. Experiments show that our method consistently outperforms the state-of-the-art methods with a statistically significant margin and effectively facilitates to recognize rare new entities in the target data especially.