publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. Towards Principled Training and Serving of Large Language Models
    Banghua Zhu
    2025

2024

  1. Guided online distillation: Promoting safe reinforcement learning by offline demonstration
    Jinning Li, Xinyi Liu, Banghua Zhu, and 4 more authors
    In 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024
  2. A Theoretical Explanation of Deep RL Performance in Stochastic Environments
    Cassidy Laidlaw, Banghua Zhu, Stuart Russell, and 1 more author
    In NeurIPS 2023 Workshop on Generalization in Planning, 2024
  3. Fairness in serving large language models
    Ying Sheng, Shiyi Cao, Dacheng Li, and 5 more authors
    In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), 2024
  4. Iterative data smoothing: Mitigating reward overfitting and overoptimization in rlhf
    Banghua Zhu, Michael I Jordan, and Jiantao Jiao
    arXiv preprint arXiv:2401.16335, 2024
  5. Efficient prompt caching via embedding similarity
    Hanlin Zhu, Banghua Zhu, and Jiantao Jiao
    arXiv preprint arXiv:2402.01173, 2024
  6. Generative AI security: challenges and countermeasures
    Banghua Zhu, Norman Mu, Jiantao Jiao, and 1 more author
    arXiv preprint arXiv:2402.12617, 2024
  7. Chatbot arena: An open platform for evaluating llms by human preference
    Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, and 8 more authors
    In Forty-first International Conference on Machine Learning, 2024
  8. Noisy computing of the threshold function
    Ziao Wang, Nadim Ghaddar, Banghua Zhu, and 1 more author
    arXiv preprint arXiv:2403.07227, 2024
  9. Noisy Computing of the OR and MAX Functions
    Banghua Zhu, Ziao Wang, Nadim Ghaddar, and 2 more authors
    IEEE Journal on Selected Areas in Information Theory, 2024
  10. Slora: Scalable serving of thousands of lora adapters
    Ying Sheng, Shiyi Cao, Dacheng Li, and 8 more authors
    Proceedings of Machine Learning and Systems, 2024
  11. From crowdsourced data to high-quality benchmarks: Arena-hard and benchbuilder pipeline
    Tianle Li, Wei-Lin Chiang, Evan Frick, and 5 more authors
    arXiv preprint arXiv:2406.11939, 2024
  12. From live data to high-quality benchmarks: The arena-hard pipeline, April 2024
    Tianle Li, Wei-Lin Chiang, Evan Frick, and 4 more authors
    URL https://lmsys. org/blog/2024-04-19-arena-hard, 2024
  13. From live data to high-quality benchmarks: The arena-hard pipeline
    Tianle Li, Wei-Lin Chiang, Evan Frick, and 4 more authors
    lmsys Blog.(Apr. 19, 2024),[Online]. Available: https://lmsys. org/blog/2024-04-19-arena-hard/(visited on 08/04/2024), 2024
  14. Pairwise Proximal Policy Optimization: Large Language Models Alignment via Comparative RL
    Tianhao Wu, Banghua Zhu, Ruoyu Zhang, and 3 more authors
    2024
  15. Athene-70B: Redefining the Boundaries of Post-Training for Open Models
    Evan Frick, Peter Jin, Tianle Li, and 4 more authors
    2024
  16. Pairwise proximal policy optimization: Language model alignment with comparative RL
    Tianhao Wu, Banghua Zhu, Ruoyu Zhang, and 3 more authors
    In First Conference on Language Modeling, 2024
  17. Starling-7b: Improving helpfulness and harmlessness with rlaif
    Banghua Zhu, Evan Frick, Tianhao Wu, and 5 more authors
    In First Conference on Language Modeling, 2024
  18. Taming overconfidence in llms: Reward calibration in rlhf
    Jixuan Leng, Chengsong Huang, Banghua Zhu, and 1 more author
    arXiv preprint arXiv:2410.09724, 2024
  19. How to Evaluate Reward Models for RLHF
    Evan Frick, Tianle Li, Connor Chen, and 6 more authors
    arXiv preprint arXiv:2410.14872, 2024
  20. Chatbot arena: An open platform for evaluating llms by human preference, 2024
    Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, and 8 more authors
    URL https://arxiv. org/abs/2403.04132, 2024
  21. Watermarking using Semantic-aware Speculative Sampling: from Theory to Practice
    Baihe Huang, Hanlin Zhu, Julien Piet, and 5 more authors
    2024

2023

  1. Jump-start reinforcement learning
    Ikechukwu Uchendu, Ted Xiao, Yao Lu, and 8 more authors
    In International Conference on Machine Learning, 2023
  2. Online learning in stackelberg games with an omniscient follower
    Geng Zhao, Banghua Zhu, Jiantao Jiao, and 1 more author
    In International Conference on Machine Learning, 2023
  3. Byzantine-robust federated learning with optimal statistical rates
    Banghua Zhu, Lun Wang, Qi Pang, and 4 more authors
    In International Conference on Artificial Intelligence and Statistics, 2023
  4. Online learning in a creator economy
    Banghua Zhu, Sai Praneeth Karimireddy, Jiantao Jiao, and 1 more author
    arXiv preprint arXiv:2305.11381, 2023
  5. Doubly-robust self-training
    Banghua Zhu, Mingyu Ding, Philip Jacobson, and 4 more authors
    Advances in Neural Information Processing Systems, 2023
  6. On optimal caching and model multiplexing for large model inference
    Banghua Zhu, Ying Sheng, Lianmin Zheng, and 3 more authors
    arXiv preprint arXiv:2306.02003, 2023
  7. Fine-tuning language models with advantage-induced policy alignment
    Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, and 4 more authors
    arXiv preprint arXiv:2306.02231, 2023
  8. On the Optimal Bounds for Noisy Computing
    Banghua Zhu, Ziao Wang, Nadim Ghaddar, and 2 more authors
    In 2023 IEEE International Symposium on Information Theory (ISIT), 2023
  9. Noisy Sorting Capacity
    Ziao Wang, Nadim Ghaddar, Banghua Zhu, and 1 more author
    arXiv preprint arXiv:2202.01446, 2023
  10. Variable-length insertion-based noisy sorting
    Ziao Wang, Nadim Ghaddar, Banghua Zhu, and 1 more author
    In 2023 IEEE International Symposium on Information Theory (ISIT), 2023
  11. Noisy Computing of the OR and MAX Functions
    Banghua Zhu, Ziao Wang, Nadim Ghaddar, and 2 more authors
    arXiv preprint arXiv:2309.03986, 2023
  12. Pairwise proximal policy optimization: Harnessing relative feedback for llm alignment
    Tianhao Wu, Banghua Zhu, Ruoyu Zhang, and 3 more authors
    arXiv preprint arXiv:2310.00212, 2023
  13. Qft: Quantized full-parameter tuning of llms with affordable resources
    Zhikai Li, Xiaoxuan Liu, Banghua Zhu, and 3 more authors
    arXiv preprint arXiv:2310.07147, 2023
  14. Towards the fundamental limits of knowledge transfer over finite domains
    Qingyue Zhao, and Banghua Zhu
    arXiv preprint arXiv:2310.07838, 2023
  15. Principled reinforcement learning with human feedback from pairwise or k-wise comparisons
    Banghua Zhu, Michael Jordan, and Jiantao Jiao
    In International Conference on Machine Learning, 2023
  16. Towards Optimal Caching and Model Selection for Large Model Inference
    Banghua Zhu, Ying Sheng, Lianmin Zheng, and 3 more authors
    Advances in Neural Information Processing Systems, 2023
  17. S-lora: Serving thousands of concurrent lora adapters
    Ying Sheng, Shiyi Cao, Dacheng Li, and 8 more authors
    arXiv preprint arXiv:2311.03285, 2023
  18. Nexusraven: a commercially-permissive language model for function calling
    Venkat Krishna Srinivasan, Zhen Dong, Banghua Zhu, and 5 more authors
    In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023
  19. Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF
    Banghua Zhu, Evan Frick, Tianhao Wu, and 2 more authors
    2023
  20. The Effective Horizon Explains Deep RL Performance in Stochastic Environments
    Cassidy Laidlaw, Banghua Zhu, Stuart Russell, and 1 more author
    arXiv preprint arXiv:2312.08369, 2023
  21. Towards optimal statistical watermarking
    Baihe Huang, Hanlin Zhu, Banghua Zhu, and 4 more authors
    arXiv preprint arXiv:2312.07930, 2023
  22. Efficient Prompt Caching for Large Language Model Inference via Embedding Similarity
    Hanlin Zhu, Banghua Zhu, and Jiantao Jiao
    2023
  23. S-LoRA: Serving thousands of concurrent LoRA adapters. arXiv preprint (2023)
    Ying Sheng, Shiyi Cao, Dacheng Li, and 8 more authors
    2023

2022

  1. Generalized resilience and robust statistics
    Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt
    The Annals of Statistics, 2022
  2. Robust estimation via generalized quasi-gradients
    Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt
    Information and Inference: A Journal of the IMA, 2022
  3. Minimax off-policy evaluation for multi-armed bandits
    Cong Ma, Banghua Zhu, Jiantao Jiao, and 1 more author
    IEEE Transactions on Information Theory, 2022
  4. Robust estimation for non-parametric families via generative adversarial networks
    Banghua Zhu, Jiantao Jiao, and Michael I Jordan
    In 2022 IEEE International Symposium on Information Theory (ISIT), 2022
  5. The sample complexity of online contract design
    Banghua Zhu, Stephen Bates, Zhuoran Yang, and 3 more authors
    arXiv preprint arXiv:2211.05732, 2022

2021

  1. Linear representation meta-reinforcement learning for instant adaptation
    Matt Peng, Banghua Zhu, and Jiantao Jiao
    arXiv preprint arXiv:2101.04750, 2021
  2. Bridging offline reinforcement learning and imitation learning: A tale of pessimism
    Paria Rashidinejad, Banghua Zhu, Cong Ma, and 2 more authors
    Advances in Neural Information Processing Systems, 2021

2020

  1. When does the tukey median work?
    Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt
    In 2020 IEEE International Symposium on Information Theory (ISIT), 2020

2019

  1. Deconstructing Generative Adversarial Networks
    Banghua Zhu, Jiantao Jiao, and David Tse
    arXiv preprint arXiv:1901.09465, 2019
  2. Joint transceiver optimization for wireless communication PHY using neural network
    Banghua Zhu, Jintao Wang, Longzhuang He, and 1 more author
    IEEE Journal on Selected Areas in Communications, 2019
  3. Joint Transceiver Optimization for Wireless Communication PHY Using Neural Network
    Zhu Banghua, WANG JINTAO, HE LONGZHUANG, and 1 more author
    IEEE Journal on Selected Areas in Communications, IEEE Service Center, Piscataway, US, 2019

2018

  1. Sparse tensor decomposition for haplotype assembly of diploids and polyploids
    Abolfazl Hashemi, Banghua Zhu, and Haris Vikalo
    BMC genomics, 2018

2017

  1. Improving Decision Tree Learning by Optimal Split Scoring Function Estimation
    Banghua Zhu, Jiantao Jiao, Yanjun Han, and 1 more author
    2017