The Homepage of Banghua Zhu
Principal Research Scientist at Nvidia. Incoming Assistant Professor at University of Washington.
 
 I am a principal research scientist at Nvidia. I work on star-Nemotron post-training, with a focus on reinforcement learning, agentic systems, and science of model evaluation.
I’m also an incoming assistant professor at University of Washington. I lead the Foundation Model and Reinforcement Learning Research Lab (FMRL2) at UW.
Prior to that, I co-founded Nexusflow AI in 2023, which provides reliable AI agent solutions for enterprise use-cases.
I received my PhD from the Department of EECS, UC Berkeley. I am very fortunate to have been advised by Prof. Jiantao Jiao and Prof. Michael I. Jordan. I am a recipient of the 2023 David J. Sakrison Memorial Prize from Berkeley EECS for truly outstanding PhD research.
News: Checkout our new short course on Post-training of LLMs, co-taught with Andrew Ng on Deeplearning.ai!
Research Interests
I’m currently interested in the theoretical foundations, training, serving, evaluation, and applications of foundation models. In the past, I have also been working on statistics, information theory, and machine learning, with applications in game theory, robust statistics, reinforcement learning, and human-AI interactions in the past.
Training
-  Starling-7B:
 Check out our open 7B model, Starling-7B, which ranks first among all existing Mistral-based 7B models according to human evaluation in Chatbot Arena!- Starling-7B is trained with our open-source high-quality preference dataset, Nectar, using our new reward-training and policy-finetuning algorithms.
 
-  Athene Series: - Athene-70B: Our first chat model fine-tuned from Llama-3-70B, which increased 30+ ELO on Chatbot Arena and greatly improved its multi-lingual capability.
- Athene-V2-72B-Chat: Fine-tuned from Qwen-2.5-72B. It ranks only behind Deepseek V3 & R1 (671B) among all non-reasoning open models on Chatbot Arena and is competitive with GPT-4o on benchmarks like MMLU-Pro, GPQA, AIME, IFEval, BigcodeBench, LiveBench, etc.
- Athene-V2-72B-Agent: An agent model specializing in function calling and agentic use cases, surpassing GPT-4o in complex function-calling tasks, especially in parallel and nested calls.
 
Evaluation
- Huggingface Function Calling Leaderboard: Used in the Llama-3.1 technical report for evaluating function-calling capabilities.
- Chatbot Arena: One of the most reliable platforms for evaluating models with human preferences.
- Arena-Hard-Auto: An automatic benchmark creation pipeline that uses LLM-as-a-judge to quickly evaluate model performance.
- Preference Proxy Evaluations: A high-quality evaluation pipeline for reward models in RLHF that correlates very well with downstream RL performance.
- MMMG: A comprehensive and reliable evaluation suite for Multitask Multimodal Generation.
Theoretical Foundations
-  Fundamental Limits of RLHF: 
 We identify the fundamental limits of RLHF and develop near-optimal algorithms with improved sample complexity for reward training [ZJJ23]. We also propose an alternative to Proximal Policy Optimization (PPO) for policy optimization that is more stable and sample-efficient [ZSFDZJJ23].
-  LLM Watermarking: 
 We recently proposed a statistically near-optimal algorithm for LLM watermarking.
Serving
- Model Routing and Caching: We analyze and propose near-optimal algorithms for caching and model multiplexing for serving large models, significantly enhancing the efficiency of inference in LLMs [ZSZBJJ23].
- S-Lora: We also proposed S-Lora, the algorithm and framework for serving thousands of LoRA adaptors.
Additional Research Areas
- Bandit and Reinforcement Learning
- Information-theoretic Lower Bounds
-  Statistics & Robustness - We explore techniques to enhance the resilience of AI models against malicious attacks, extending the theory in high-dimensional robust statistics [ZJS22].
- We propose efficient algorithms for outlier detection, robust mean estimation, robust covariance estimation, and robust linear regression [ZJS21], as well as Byzantine-robust distributed learning / distributed systems [ZPWWJSJ23].
- We design doubly-robust estimators that outperform traditional self-training pipelines in computer vision and autonomous driving [ZDJWZJJ23].
- We conduct theoretical analyses of Generative Adversarial Networks (GANs), providing insights for practical implementations [ZJT19].
- We explore the interaction between ML systems and self-interested, strategic humans—a crucial topic in economics. By modeling and analyzing online learning in contract theory and the creator economy, I provide near-optimal regret bounds for both problems, addressing the longstanding challenge of sample complexity in online contract design
 [ZBYWJJ23, ZKJJ23].