The Homepage of Banghua Zhu
Assistant Professor at University of Washington, ECE Department.
Adjunct Professor at University of Washington, CSE Department.
Cofounder at Nexusflow AI.

I’m an incoming assistant professor at UW ECE, with adjunct appointments in CSE.
I co-founded Nexusflow AI in 2023, which provides reliable AI agent solutions for enterprise use-cases.
I received my PhD from the Department of EECS, UC Berkeley. I am very fortunate to have been advised by Prof. Jiantao Jiao and Prof. Michael I. Jordan. I am a recipient of the 2023 David J. Sakrison Memorial Prize from Berkeley EECS for truly outstanding PhD research.
Research Interests
I’m currently interested in the theoretical foundations, training, serving, evaluation, and applications of foundation models. I have also been working on statistics, information theory, and machine learning, with applications in game theory, robust statistics, reinforcement learning, and human-AI interactions.
Training
- Starling-7B:
Check out our open 7B model, Starling-7B, which ranks first among all existing Mistral-based 7B models according to human evaluation in Chatbot Arena!- Starling-7B is trained with our open-source high-quality preference dataset, Nectar, using our new reward-training and policy-finetuning algorithms.
- Athene Series:
- Athene-70B: Our first chat model fine-tuned from Llama-3-70B, which increased 30+ ELO on Chatbot Arena and greatly improved its multi-lingual capability.
- Athene-V2-72B-Chat: Fine-tuned from Qwen-2.5-72B. It ranks only behind Deepseek V3 & R1 (671B) among all non-reasoning open models on Chatbot Arena and is competitive with GPT-4o on benchmarks like MMLU-Pro, GPQA, AIME, IFEval, BigcodeBench, LiveBench, etc.
- Athene-V2-72B-Agent: An agent model specializing in function calling and agentic use cases, surpassing GPT-4o in complex function-calling tasks, especially in parallel and nested calls.
Evaluation
- Huggingface Function Calling Leaderboard: Used in the Llama-3.1 technical report for evaluating function-calling capabilities.
- Chatbot Arena: One of the most reliable platforms for evaluating models with human preferences.
- Arena-Hard-Auto: An automatic benchmark creation pipeline that uses LLM-as-a-judge to quickly evaluate model performance.
- Preference Proxy Evaluations: A high-quality evaluation pipeline for reward models in RLHF that correlates very well with downstream RL performance.
Theoretical Foundations
-
Fundamental Limits of RLHF:
We identify the fundamental limits of RLHF and develop near-optimal algorithms with improved sample complexity for reward training [ZJJ23]. We also propose an alternative to Proximal Policy Optimization (PPO) for policy optimization that is more stable and sample-efficient [ZSFDZJJ23]. -
LLM Watermarking:
We recently proposed a statistically near-optimal algorithm for LLM watermarking.
Serving
- Model Routing and Caching: We analyze and propose near-optimal algorithms for caching and model multiplexing for serving large models, significantly enhancing the efficiency of inference in LLMs [ZSZBJJ23].
- S-Lora: We also proposed S-Lora, the algorithm and framework for serving thousands of LoRA adaptors.
Additional Research Areas
- Bandit and Reinforcement Learning
- Information-theoretic Lower Bounds
- Statistics & Robustness
- We explore techniques to enhance the resilience of AI models against malicious attacks, extending the theory in high-dimensional robust statistics [ZJS22].
- We propose efficient algorithms for outlier detection, robust mean estimation, robust covariance estimation, and robust linear regression [ZJS21], as well as Byzantine-robust distributed learning / distributed systems [ZPWWJSJ23].
- We design doubly-robust estimators that outperform traditional self-training pipelines in computer vision and autonomous driving [ZDJWZJJ23].
- We conduct theoretical analyses of Generative Adversarial Networks (GANs), providing insights for practical implementations [ZJT19].
- We explore the interaction between ML systems and self-interested, strategic humans—a crucial topic in economics. By modeling and analyzing online learning in contract theory and the creator economy, I provide near-optimal regret bounds for both problems, addressing the longstanding challenge of sample complexity in online contract design
[ZBYWJJ23, ZKJJ23].