The Homepage of Banghua Zhu
Assistant Professor at University of Washington.
Principal Research Scientist at Nvidia.

I’m an assistant professor at UW ECE, with adjunct appointment in CSE. I lead the Foundation Model and Reinforcement Learning Research Lab (FMRL2) at UW.
I am also a principal research scientist at Nvidia. I work on star-Nemotron post-training, with a focus on reinforcement learning, agentic systems, and science of model evaluation.
Prior to that, I co-founded Nexusflow AI in 2023, which provides reliable AI agent solutions for enterprise use-cases.
I received my PhD from the Department of EECS, UC Berkeley. I am very fortunate to have been advised by Prof. Jiantao Jiao and Prof. Michael I. Jordan. I am a recipient of the 2023 David J. Sakrison Memorial Prize from Berkeley EECS for truly outstanding PhD research.
News: Checkout our new short course on Post-training of LLMs, co-taught with Andrew Ng on Deeplearning.ai!
I am actively recruiting visiting students, PhD students, and postdoctoral researchers at UW to work with me on LLMs, with a focus on post-training, agent, machine learning systems, or the evaluation of LLMs. If you are interested, please send me an email.
Research Interests
I’m currently interested in the theoretical foundations, training, serving, evaluation, and applications of foundation models. In the past, I have also been working on statistics, information theory, and machine learning, with applications in game theory, robust statistics, reinforcement learning, and human-AI interactions in the past.
Training
- Starling-7B:
Check out our open 7B model, Starling-7B, which ranks first among all existing Mistral-based 7B models according to human evaluation in Chatbot Arena!- Starling-7B is trained with our open-source high-quality preference dataset, Nectar, using our new reward-training and policy-finetuning algorithms.
- Athene Series:
- Athene-70B: Our first chat model fine-tuned from Llama-3-70B, which increased 30+ ELO on Chatbot Arena and greatly improved its multi-lingual capability.
- Athene-V2-72B-Chat: Fine-tuned from Qwen-2.5-72B. It ranks only behind Deepseek V3 & R1 (671B) among all non-reasoning open models on Chatbot Arena and is competitive with GPT-4o on benchmarks like MMLU-Pro, GPQA, AIME, IFEval, BigcodeBench, LiveBench, etc.
- Athene-V2-72B-Agent: An agent model specializing in function calling and agentic use cases, surpassing GPT-4o in complex function-calling tasks, especially in parallel and nested calls.
Evaluation
- Huggingface Function Calling Leaderboard: Used in the Llama-3.1 technical report for evaluating function-calling capabilities.
- Chatbot Arena: One of the most reliable platforms for evaluating models with human preferences.
- Arena-Hard-Auto: An automatic benchmark creation pipeline that uses LLM-as-a-judge to quickly evaluate model performance.
- Preference Proxy Evaluations: A high-quality evaluation pipeline for reward models in RLHF that correlates very well with downstream RL performance.
- MMMG: A comprehensive and reliable evaluation suite for Multitask Multimodal Generation.
Theoretical Foundations
-
Fundamental Limits of RLHF:
We identify the fundamental limits of RLHF and develop near-optimal algorithms with improved sample complexity for reward training [ZJJ23]. We also propose an alternative to Proximal Policy Optimization (PPO) for policy optimization that is more stable and sample-efficient [ZSFDZJJ23]. -
LLM Watermarking:
We recently proposed a statistically near-optimal algorithm for LLM watermarking.
Serving
- Model Routing and Caching: We analyze and propose near-optimal algorithms for caching and model multiplexing for serving large models, significantly enhancing the efficiency of inference in LLMs [ZSZBJJ23].
- S-Lora: We also proposed S-Lora, the algorithm and framework for serving thousands of LoRA adaptors.
Additional Research Areas
- Bandit and Reinforcement Learning
- Information-theoretic Lower Bounds
- Statistics & Robustness
- We explore techniques to enhance the resilience of AI models against malicious attacks, extending the theory in high-dimensional robust statistics [ZJS22].
- We propose efficient algorithms for outlier detection, robust mean estimation, robust covariance estimation, and robust linear regression [ZJS21], as well as Byzantine-robust distributed learning / distributed systems [ZPWWJSJ23].
- We design doubly-robust estimators that outperform traditional self-training pipelines in computer vision and autonomous driving [ZDJWZJJ23].
- We conduct theoretical analyses of Generative Adversarial Networks (GANs), providing insights for practical implementations [ZJT19].
- We explore the interaction between ML systems and self-interested, strategic humans—a crucial topic in economics. By modeling and analyzing online learning in contract theory and the creator economy, I provide near-optimal regret bounds for both problems, addressing the longstanding challenge of sample complexity in online contract design
[ZBYWJJ23, ZKJJ23].