The Homepage of Banghua Zhu

Principal Research Scientist at Nvidia. Incoming Assistant Professor at University of Washington.

bio2.jpeg

I am a principal research scientist at Nvidia. I work on star-Nemotron post-training, with a focus on reinforcement learning, agentic systems, and science of model evaluation.

I’m also an incoming assistant professor at University of Washington. I lead the Foundation Model and Reinforcement Learning Research Lab (FMRL2) at UW.

Prior to that, I co-founded Nexusflow AI in 2023, which provides reliable AI agent solutions for enterprise use-cases.

I received my PhD from the Department of EECS, UC Berkeley. I am very fortunate to have been advised by Prof. Jiantao Jiao and Prof. Michael I. Jordan. I am a recipient of the 2023 David J. Sakrison Memorial Prize from Berkeley EECS for truly outstanding PhD research.

News: Checkout our new short course on Post-training of LLMs, co-taught with Andrew Ng on Deeplearning.ai!

Research Interests

I’m currently interested in the theoretical foundations, training, serving, evaluation, and applications of foundation models. In the past, I have also been working on statistics, information theory, and machine learning, with applications in game theory, robust statistics, reinforcement learning, and human-AI interactions in the past.


Training

  • Starling-7B:
    Check out our open 7B model, Starling-7B, which ranks first among all existing Mistral-based 7B models according to human evaluation in Chatbot Arena!
    • Starling-7B is trained with our open-source high-quality preference dataset, Nectar, using our new reward-training and policy-finetuning algorithms.
  • Athene Series:
    • Athene-70B: Our first chat model fine-tuned from Llama-3-70B, which increased 30+ ELO on Chatbot Arena and greatly improved its multi-lingual capability.
    • Athene-V2-72B-Chat: Fine-tuned from Qwen-2.5-72B. It ranks only behind Deepseek V3 & R1 (671B) among all non-reasoning open models on Chatbot Arena and is competitive with GPT-4o on benchmarks like MMLU-Pro, GPQA, AIME, IFEval, BigcodeBench, LiveBench, etc.
    • Athene-V2-72B-Agent: An agent model specializing in function calling and agentic use cases, surpassing GPT-4o in complex function-calling tasks, especially in parallel and nested calls.

Evaluation

  • Huggingface Function Calling Leaderboard: Used in the Llama-3.1 technical report for evaluating function-calling capabilities.
  • Chatbot Arena: One of the most reliable platforms for evaluating models with human preferences.
  • Arena-Hard-Auto: An automatic benchmark creation pipeline that uses LLM-as-a-judge to quickly evaluate model performance.
  • Preference Proxy Evaluations: A high-quality evaluation pipeline for reward models in RLHF that correlates very well with downstream RL performance.
  • MMMG: A comprehensive and reliable evaluation suite for Multitask Multimodal Generation.

Theoretical Foundations

  • Fundamental Limits of RLHF:
    We identify the fundamental limits of RLHF and develop near-optimal algorithms with improved sample complexity for reward training [ZJJ23]. We also propose an alternative to Proximal Policy Optimization (PPO) for policy optimization that is more stable and sample-efficient [ZSFDZJJ23].

  • LLM Watermarking:
    We recently proposed a statistically near-optimal algorithm for LLM watermarking.


Serving

  • Model Routing and Caching: We analyze and propose near-optimal algorithms for caching and model multiplexing for serving large models, significantly enhancing the efficiency of inference in LLMs [ZSZBJJ23].
  • S-Lora: We also proposed S-Lora, the algorithm and framework for serving thousands of LoRA adaptors.

Additional Research Areas

  1. Bandit and Reinforcement Learning
    • We study online learning and offline learning, off-policy evaluation, and inverse RL
      [RZMJR21, MZJW22].
  2. Information-theoretic Lower Bounds
    • We investigate achieving fundamental limits in noisy searching, sorting, and computing tasks using information-theoretic tools
      [WGZW22, ZWGJW23].
  3. Statistics & Robustness
    • We explore techniques to enhance the resilience of AI models against malicious attacks, extending the theory in high-dimensional robust statistics [ZJS22].
    • We propose efficient algorithms for outlier detection, robust mean estimation, robust covariance estimation, and robust linear regression [ZJS21], as well as Byzantine-robust distributed learning / distributed systems [ZPWWJSJ23].
    • We design doubly-robust estimators that outperform traditional self-training pipelines in computer vision and autonomous driving [ZDJWZJJ23].
    • We conduct theoretical analyses of Generative Adversarial Networks (GANs), providing insights for practical implementations [ZJT19].
    • We explore the interaction between ML systems and self-interested, strategic humans—a crucial topic in economics. By modeling and analyzing online learning in contract theory and the creator economy, I provide near-optimal regret bounds for both problems, addressing the longstanding challenge of sample complexity in online contract design
      [ZBYWJJ23, ZKJJ23].