publications | The Homepage of Banghua Zhu

2025

Towards Principled Training and Serving of Large Language Models

Banghua Zhu

2025

2024

Guided online distillation: Promoting safe reinforcement learning by offline demonstration

Jinning Li, Xinyi Liu, Banghua Zhu, and 4 more authors

In 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024
A Theoretical Explanation of Deep RL Performance in Stochastic Environments

Cassidy Laidlaw, Banghua Zhu, Stuart Russell, and 1 more author

In NeurIPS 2023 Workshop on Generalization in Planning, 2024
Fairness in serving large language models

Ying Sheng, Shiyi Cao, Dacheng Li, and 5 more authors

In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), 2024
Iterative data smoothing: Mitigating reward overfitting and overoptimization in rlhf

Banghua Zhu, Michael I Jordan, and Jiantao Jiao

arXiv preprint arXiv:2401.16335, 2024
Efficient prompt caching via embedding similarity

Hanlin Zhu, Banghua Zhu, and Jiantao Jiao

arXiv preprint arXiv:2402.01173, 2024
Generative AI security: challenges and countermeasures

Banghua Zhu, Norman Mu, Jiantao Jiao, and 1 more author

arXiv preprint arXiv:2402.12617, 2024
Chatbot arena: An open platform for evaluating llms by human preference

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, and 8 more authors

In Forty-first International Conference on Machine Learning, 2024
Noisy computing of the threshold function

Ziao Wang, Nadim Ghaddar, Banghua Zhu, and 1 more author

arXiv preprint arXiv:2403.07227, 2024
Noisy Computing of the OR and MAX Functions

Banghua Zhu, Ziao Wang, Nadim Ghaddar, and 2 more authors

IEEE Journal on Selected Areas in Information Theory, 2024
Slora: Scalable serving of thousands of lora adapters

Ying Sheng, Shiyi Cao, Dacheng Li, and 8 more authors

Proceedings of Machine Learning and Systems, 2024
From crowdsourced data to high-quality benchmarks: Arena-hard and benchbuilder pipeline

Tianle Li, Wei-Lin Chiang, Evan Frick, and 5 more authors

arXiv preprint arXiv:2406.11939, 2024
From live data to high-quality benchmarks: The arena-hard pipeline, April 2024

Tianle Li, Wei-Lin Chiang, Evan Frick, and 4 more authors

URL https://lmsys. org/blog/2024-04-19-arena-hard, 2024
From live data to high-quality benchmarks: The arena-hard pipeline

Tianle Li, Wei-Lin Chiang, Evan Frick, and 4 more authors

lmsys Blog.(Apr. 19, 2024),[Online]. Available: https://lmsys. org/blog/2024-04-19-arena-hard/(visited on 08/04/2024), 2024
Pairwise Proximal Policy Optimization: Large Language Models Alignment via Comparative RL

Tianhao Wu, Banghua Zhu, Ruoyu Zhang, and 3 more authors

2024
Athene-70B: Redefining the Boundaries of Post-Training for Open Models

Evan Frick, Peter Jin, Tianle Li, and 4 more authors

2024
Pairwise proximal policy optimization: Language model alignment with comparative RL

Tianhao Wu, Banghua Zhu, Ruoyu Zhang, and 3 more authors

In First Conference on Language Modeling, 2024
Starling-7b: Improving helpfulness and harmlessness with rlaif

Banghua Zhu, Evan Frick, Tianhao Wu, and 5 more authors

In First Conference on Language Modeling, 2024
Taming overconfidence in llms: Reward calibration in rlhf

Jixuan Leng, Chengsong Huang, Banghua Zhu, and 1 more author

arXiv preprint arXiv:2410.09724, 2024
How to Evaluate Reward Models for RLHF

Evan Frick, Tianle Li, Connor Chen, and 6 more authors

arXiv preprint arXiv:2410.14872, 2024
Chatbot arena: An open platform for evaluating llms by human preference, 2024

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, and 8 more authors

URL https://arxiv. org/abs/2403.04132, 2024
Watermarking using Semantic-aware Speculative Sampling: from Theory to Practice

Baihe Huang, Hanlin Zhu, Julien Piet, and 5 more authors

2024

2023

Jump-start reinforcement learning

Ikechukwu Uchendu, Ted Xiao, Yao Lu, and 8 more authors

In International Conference on Machine Learning, 2023
Online learning in stackelberg games with an omniscient follower

Geng Zhao, Banghua Zhu, Jiantao Jiao, and 1 more author

In International Conference on Machine Learning, 2023
Byzantine-robust federated learning with optimal statistical rates

Banghua Zhu, Lun Wang, Qi Pang, and 4 more authors

In International Conference on Artificial Intelligence and Statistics, 2023
Online learning in a creator economy

Banghua Zhu, Sai Praneeth Karimireddy, Jiantao Jiao, and 1 more author

arXiv preprint arXiv:2305.11381, 2023
Doubly-robust self-training

Banghua Zhu, Mingyu Ding, Philip Jacobson, and 4 more authors

Advances in Neural Information Processing Systems, 2023
On optimal caching and model multiplexing for large model inference

Banghua Zhu, Ying Sheng, Lianmin Zheng, and 3 more authors

arXiv preprint arXiv:2306.02003, 2023
Fine-tuning language models with advantage-induced policy alignment

Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, and 4 more authors

arXiv preprint arXiv:2306.02231, 2023
On the Optimal Bounds for Noisy Computing

Banghua Zhu, Ziao Wang, Nadim Ghaddar, and 2 more authors

In 2023 IEEE International Symposium on Information Theory (ISIT), 2023
Noisy Sorting Capacity

Ziao Wang, Nadim Ghaddar, Banghua Zhu, and 1 more author

arXiv preprint arXiv:2202.01446, 2023
Variable-length insertion-based noisy sorting

Ziao Wang, Nadim Ghaddar, Banghua Zhu, and 1 more author

In 2023 IEEE International Symposium on Information Theory (ISIT), 2023
Noisy Computing of the OR and MAX Functions

Banghua Zhu, Ziao Wang, Nadim Ghaddar, and 2 more authors

arXiv preprint arXiv:2309.03986, 2023
Pairwise proximal policy optimization: Harnessing relative feedback for llm alignment

Tianhao Wu, Banghua Zhu, Ruoyu Zhang, and 3 more authors

arXiv preprint arXiv:2310.00212, 2023
Qft: Quantized full-parameter tuning of llms with affordable resources

Zhikai Li, Xiaoxuan Liu, Banghua Zhu, and 3 more authors

arXiv preprint arXiv:2310.07147, 2023
Towards the fundamental limits of knowledge transfer over finite domains

Qingyue Zhao, and Banghua Zhu

arXiv preprint arXiv:2310.07838, 2023
Principled reinforcement learning with human feedback from pairwise or k-wise comparisons

Banghua Zhu, Michael Jordan, and Jiantao Jiao

In International Conference on Machine Learning, 2023
Towards Optimal Caching and Model Selection for Large Model Inference

Banghua Zhu, Ying Sheng, Lianmin Zheng, and 3 more authors

Advances in Neural Information Processing Systems, 2023
S-lora: Serving thousands of concurrent lora adapters

Ying Sheng, Shiyi Cao, Dacheng Li, and 8 more authors

arXiv preprint arXiv:2311.03285, 2023
Nexusraven: a commercially-permissive language model for function calling

Venkat Krishna Srinivasan, Zhen Dong, Banghua Zhu, and 5 more authors

In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023
Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF

Banghua Zhu, Evan Frick, Tianhao Wu, and 2 more authors

2023
The Effective Horizon Explains Deep RL Performance in Stochastic Environments

Cassidy Laidlaw, Banghua Zhu, Stuart Russell, and 1 more author

arXiv preprint arXiv:2312.08369, 2023
Towards optimal statistical watermarking

Baihe Huang, Hanlin Zhu, Banghua Zhu, and 4 more authors

arXiv preprint arXiv:2312.07930, 2023
Efficient Prompt Caching for Large Language Model Inference via Embedding Similarity

Hanlin Zhu, Banghua Zhu, and Jiantao Jiao

2023
S-LoRA: Serving thousands of concurrent LoRA adapters. arXiv preprint (2023)

Ying Sheng, Shiyi Cao, Dacheng Li, and 8 more authors

2023

2022

Generalized resilience and robust statistics

Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt

The Annals of Statistics, 2022
Robust estimation via generalized quasi-gradients

Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt

Information and Inference: A Journal of the IMA, 2022
Minimax off-policy evaluation for multi-armed bandits

Cong Ma, Banghua Zhu, Jiantao Jiao, and 1 more author

IEEE Transactions on Information Theory, 2022
Robust estimation for non-parametric families via generative adversarial networks

Banghua Zhu, Jiantao Jiao, and Michael I Jordan

In 2022 IEEE International Symposium on Information Theory (ISIT), 2022
The sample complexity of online contract design

Banghua Zhu, Stephen Bates, Zhuoran Yang, and 3 more authors

arXiv preprint arXiv:2211.05732, 2022

2021

Linear representation meta-reinforcement learning for instant adaptation

Matt Peng, Banghua Zhu, and Jiantao Jiao

arXiv preprint arXiv:2101.04750, 2021
Bridging offline reinforcement learning and imitation learning: A tale of pessimism

Paria Rashidinejad, Banghua Zhu, Cong Ma, and 2 more authors

Advances in Neural Information Processing Systems, 2021

2020

When does the tukey median work?

Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt

In 2020 IEEE International Symposium on Information Theory (ISIT), 2020

2019

Deconstructing Generative Adversarial Networks

Banghua Zhu, Jiantao Jiao, and David Tse

arXiv preprint arXiv:1901.09465, 2019
Joint transceiver optimization for wireless communication PHY using neural network

Banghua Zhu, Jintao Wang, Longzhuang He, and 1 more author

IEEE Journal on Selected Areas in Communications, 2019
Joint Transceiver Optimization for Wireless Communication PHY Using Neural Network

Zhu Banghua, WANG JINTAO, HE LONGZHUANG, and 1 more author

IEEE Journal on Selected Areas in Communications, IEEE Service Center, Piscataway, US, 2019

2018

Sparse tensor decomposition for haplotype assembly of diploids and polyploids

Abolfazl Hashemi, Banghua Zhu, and Haris Vikalo

BMC genomics, 2018

2017

Improving Decision Tree Learning by Optimal Split Scoring Function Estimation

Banghua Zhu, Jiantao Jiao, Yanjun Han, and 1 more author

2017