KeqingLLM Scaling & Foundation Models

I am Keqing, currently focusing on LLM Scaling. I worked at Meituan Foundation Model Team (2021-2025), Alibaba DAMO Academy (2020, Intern), WeChat AI Lab (2020, Intern). I have published 20+ papers at top venues with over 2000 Google Scholar citations.

My work spans the full LLM scaling pipeline:

  • Model Scaling: Large-scale MOE pretraining (1T+), including training stability, training efficiency, and scaling law modeling.
  • Data Scaling: Building 40T-token pretraining data systems with quality stratification, synthetic expansion, and multi-stage mixing; data-efficient SFT selection achieving full-data performance with minimal high-value samples.
  • Test-time Scaling: RLVR training pipeline — initialization, online difficulty filtering, and entropy control for stable policy evolution.

Technical Highlights

Model Scaling

  • Trained Trillion-scale MOE models end-to-end with systematic stability monitoring (grad-norm, hidden-norm, qk-max-logits, maxvio)
  • Achieved 50% training efficiency improvement via Muon optimizer + split architecture (qkv-per-head-split, split MLP)
  • Established MOE Scaling Laws: optimal hyperparameters follow power law, enabling accurate prediction from 1B to 7B [EMNLP]

Data Scaling

  • Built large-scale pretrain data system (Chinese/English/Code/STEM) with multi-stage mixing, supporting 40T-scale training
  • Synthetic data accounts for 50%+ of total training data, covering document rewriting, QA synthesis, repo-level code synthesis, and agent trajectory synthesis
  • Proposed data-efficient SFT selection combining quality, complexity, and diversity metrics [ICLR]

Test-time Scaling

  • Validated thinking pretrain as a necessary prerequisite for RLVR convergence [COLM]
  • Progressive online difficulty filtering: fewer samples, better performance
  • Training-inference consistency constraints to prevent entropy collapse

Experience

  1. Full-time at Startup, Aug 2025 - Present:
    • Data Scaling (pretraining data), Model Scaling (large-scale pretraining)
  2. Full-time at Meituan Foundation Model Team, Sep 2024 - Aug 2025:
    • Test-time Scaling (RLVR)
  3. Full-time at Meituan Foundation Model Team, Mar 2023 - Aug 2024:
    • Data Scaling (SFT & reasoning data), Model Scaling (MOE pretraining)
  4. Full-time at Meituan NLP Team, Jun 2021 - Feb 2023:
    • Dialogue pretraining, end-to-end dialogue systems
  5. Research Intern at Alibaba DAMO Academy, Jun 2020 - Oct 2020:
    • Recommender cold-start, multimodal distillation
  6. Research Intern at Tencent WeChat AI Lab, Mar 2020 - Jun 2020:
    • Zero-shot learning, contrastive learning
  7. Research Intern at Meituan NLP Team, Oct 2019 - Mar 2020:
    • Natural language understanding

Selected Publications

Full paper list on Google Scholar

Scaling

  1. LongCat-Flash Technical Report, 2025

  2. LongCat-Flash-Thinking Technical Report, 2025

  3. SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild, COLM2025

  4. Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models, EMNLP2024

Coding & Agent

  1. AgentRefine: Enhancing Agent Generalization through Refinement Tuning, ICLR2025

  2. Knowledge Editing on Black-box Large Language Models, WWW2025

  3. DolphCoder: Echo-Locating Code Large Language Models with Diverse and Multi-Objective Instruction Tuning, ACL2024

  4. How Do Your Code LLMs perform? Empowering Code Instruction Tuning with Really Good Data, EMNLP2024

Data & SFT

  1. What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, ICLR2024

Dialog

  1. Unified Knowledge Prompt Pretraining for Customer Service Dialogues, CIKM2022

  2. FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue, ACL2023

  3. DivTOD: Unleashing the Power of LLMs for Diversifying Task-Oriented Dialogue Representations, NAACL2024

  4. Semi-Supervised Knowledge-Grounded Pre-training for Task-Oriented Dialog Systems, EMNLP2022 SereTOD Workshop (Championship of Track II)

Earlier Work (OOD Detection, Slot Filling, Dialogue)

  1. Large Language Models Meet Open-World Intent Discovery and Recognition: An Evaluation of ChatGPT, EMNLP2023

  2. DemoNSF: A Multi-task Demonstration-based Generative Framework for Noisy Slot Filling Task, EMNLP2023 Findings

  3. Continual Generalized Intent Discovery: Marching Towards Dynamic and Open-world Intent Recognition, EMNLP2023 Findings

  4. APP: Adaptive Prototypical Pseudo-Labeling for Few-shot OOD Detection, EMNLP2023 Findings

  5. Decoupling Pseudo Label Disambiguation and Representation Learning for Generalized Intent Discovery, ACL2023

  6. Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation, ACL2023

  7. Generative Zero-Shot Prompt Learning for Cross-Domain Slot Filling with Inverse Prompting, ACL2023 Findings

  8. UniNL: Aligning Representation Learning with Scoring Function for OOD Detection via Unified Neighborhood Learning, EMNLP2022

  9. Watch the Neighbors: A Unified K-Nearest Neighbor Contrastive Learning Framework for OOD Intent Discovery, EMNLP2022 oral

  10. Disentangling Confidence Score Distribution for Out-of-Domain Intent Detection with Energy-Based Learning, EMNLP2022

  11. Distribution Calibration for Out-of-Domain Detection with Bayesian Approximation, COLING2022

  12. PSSAT: A Perturbed Semantic Structure Awareness Transferring Method for Perturbation-Robust Slot Filling, COLING2022

  13. Domain-Oriented Prefix-Tuning: Towards Efficient and Generalizable Fine-tuning for Zero-Shot Dialogue Summarization, NAACL2022 oral

  14. Revisit Overconfidence for OOD Detection: Reassigned Contrastive Learning with Adaptive Class-dependent Threshold, NAACL2022

  15. ADPL: Adversarial Prompt-based Domain Adaptation for Dialogue Summarization with Knowledge Disentanglement, SIGIR2022

  16. Disentangled Knowledge Transfer for OOD Intent Discovery with Unified Contrastive Learning, ACL2022

  17. Bridge to Target Domain by Prototypical Contrastive Learning and Label Confusion: Re-explore Zero-Shot Learning for Slot Filling, EMNLP2021 oral

  18. Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System, ACL2021 oral

  19. Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning, ACL2021

  20. Adversarial Self-Supervised Learning for Out-of-Domain Detection, NAACL2021 oral

  21. Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack, COLING2020 oral

  22. Syntactic Graph Convolution Network for Spoken Language Understanding, COLING2020

  23. A Deep Generative Distance-Based Classifier for Out-of-Domain Detection with Mahalanobis Space, COLING2020 oral

  24. Adversarial Semantic Decoupling for Recognizing Open-Vocabulary Slots, EMNLP2020 oral

  25. Learning to Tag OOV Tokens by Integrating Contextual Representation and Background Knowledge, ACL2020

Contact

  • Address: Beijing, China
  • Email: helicbupt@gmail.com