| Keqing | LLM Scaling & Foundation Models |
I am Keqing, currently focusing on LLM Scaling. I worked at Meituan Foundation Model Team (2021-2025), Alibaba DAMO Academy (2020, Intern), WeChat AI Lab (2020, Intern). I have published 20+ papers at top venues with over 2000 Google Scholar citations.
My work spans the full LLM scaling pipeline:
- Model Scaling: Large-scale MOE pretraining (1T+), including training stability, training efficiency, and scaling law modeling.
- Data Scaling: Building 40T-token pretraining data systems with quality stratification, synthetic expansion, and multi-stage mixing; data-efficient SFT selection achieving full-data performance with minimal high-value samples.
- Test-time Scaling: RLVR training pipeline — initialization, online difficulty filtering, and entropy control for stable policy evolution.
Technical Highlights
Model Scaling
- Trained Trillion-scale MOE models end-to-end with systematic stability monitoring (grad-norm, hidden-norm, qk-max-logits, maxvio)
- Achieved 50% training efficiency improvement via Muon optimizer + split architecture (qkv-per-head-split, split MLP)
- Established MOE Scaling Laws: optimal hyperparameters follow power law, enabling accurate prediction from 1B to 7B [EMNLP]
Data Scaling
- Built large-scale pretrain data system (Chinese/English/Code/STEM) with multi-stage mixing, supporting 40T-scale training
- Synthetic data accounts for 50%+ of total training data, covering document rewriting, QA synthesis, repo-level code synthesis, and agent trajectory synthesis
- Proposed data-efficient SFT selection combining quality, complexity, and diversity metrics [ICLR]
Test-time Scaling
- Validated thinking pretrain as a necessary prerequisite for RLVR convergence [COLM]
- Progressive online difficulty filtering: fewer samples, better performance
- Training-inference consistency constraints to prevent entropy collapse
Experience
- Full-time at Startup, Aug 2025 - Present:
- Data Scaling (pretraining data), Model Scaling (large-scale pretraining)
- Full-time at Meituan Foundation Model Team, Sep 2024 - Aug 2025:
- Test-time Scaling (RLVR)
- Full-time at Meituan Foundation Model Team, Mar 2023 - Aug 2024:
- Data Scaling (SFT & reasoning data), Model Scaling (MOE pretraining)
- Full-time at Meituan NLP Team, Jun 2021 - Feb 2023:
- Dialogue pretraining, end-to-end dialogue systems
- Research Intern at Alibaba DAMO Academy, Jun 2020 - Oct 2020:
- Recommender cold-start, multimodal distillation
- Research Intern at Tencent WeChat AI Lab, Mar 2020 - Jun 2020:
- Zero-shot learning, contrastive learning
- Research Intern at Meituan NLP Team, Oct 2019 - Mar 2020:
- Natural language understanding
Selected Publications
Full paper list on Google Scholar
Scaling
LongCat-Flash Technical Report, 2025
LongCat-Flash-Thinking Technical Report, 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild, COLM2025
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models, EMNLP2024
Coding & Agent
AgentRefine: Enhancing Agent Generalization through Refinement Tuning, ICLR2025
Knowledge Editing on Black-box Large Language Models, WWW2025
DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning, ACL2024
How Do Your Code LLMs perform? Empowering Code Instruction Tuning with Really Good Data, EMNLP2024
Data & SFT
- What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, ICLR2024
Dialog
Unified Knowledge Prompt Pretraining for Customer Service Dialogues, CIKM2022
FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue, ACL2023
DivTOD: Unleashing the Power of LLMs for Diversifying Task-Oriented Dialogue Representations, NAACL2024
Semi-Supervised Knowledge-Grounded Pre-training for Task-Oriented Dialog Systems, EMNLP2022 SereTOD Workshop (Championship of Track II)
Earlier Work (OOD Detection, Slot Filling, Dialogue)
Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT, EMNLP2023
DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task, EMNLP2023 Findings
Continual Generalized Intent Discovery: Marching Towards Dynamic and Open-world Intent Recognition, EMNLP2023 Findings
APP: Adaptive Prototypical Pseudo-Labeling for Few-shot OOD Detection, EMNLP2023 Findings
Decoupling Pseudo Label Disambiguation and Representation Learning for Generalized Intent Discovery, ACL2023
Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation, ACL2023
Generative Zero-Shot Prompt Learning for Cross-Domain Slot Filling with Inverse Prompting, ACL2023 Findings
UniNL: Aligning Representation Learning with Scoring Function for OOD Detection via Unified Neighborhood Learning, EMNLP2022
Watch the Neighbors: A Unified K-Nearest Neighbor Contrastive Learning Framework for OOD Intent Discovery, EMNLP2022 oral
Disentangling Confidence Score Distribution for Out-of-Domain Intent Detection with Energy-Based Learning, EMNLP2022
Distribution Calibration for Out-of-Domain Detection with Bayesian Approximation, COLING2022
PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling, COLING2022
Domain-Oriented Prefix-Tuning: Towards Efficient and Generalizable Fine-tuning for Zero-Shot Dialogue Summarization, NAACL2022 oral
Revisit Overconfidence for OOD Detection: Reassigned Contrastive Learning with Adaptive Class-dependent Threshold, NAACL2022
ADPL: Adversarial Prompt-based Domain Adaptation for Dialogue Summarization with Knowledge Disentanglement, SIGIR2022
Disentangled Knowledge Transfer for OOD Intent Discovery with Unified Contrastive Learning, ACL2022
Bridge to Target Domain by Prototypical Contrastive Learning and Label Confusion: Re-explore Zero-Shot Learning for Slot Filling, EMNLP2021 oral
Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System, ACL2021 oral
Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning, ACL2021
Adversarial Self-Supervised Learning for Out-of-Domain Detection, NAACL2021 oral
Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack, COLING2020 oral
Syntactic Graph Convolution Network for Spoken Language Understanding, COLING2020
A Deep Generative Distance-Based Classifier for Out-of-Domain Detection with Mahalanobis Space, COLING2020 oral
Adversarial Semantic Decoupling for Recognizing Open-Vocabulary Slots, EMNLP2020 oral
Learning to Tag OOV Tokens by Integrating Contextual Representation and Background Knowledge, ACL2020
Contact
- Address: Beijing, China
- Email: helicbupt@gmail.com
