publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- Towards Principled Training and Serving of Large Language Models2025
2024
- Guided online distillation: Promoting safe reinforcement learning by offline demonstrationIn 2024 IEEE International Conference on Robotics and Automation (ICRA), 2024
- A Theoretical Explanation of Deep RL Performance in Stochastic EnvironmentsIn NeurIPS 2023 Workshop on Generalization in Planning, 2024
- Fairness in serving large language modelsIn 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), 2024
- Iterative data smoothing: Mitigating reward overfitting and overoptimization in rlhfarXiv preprint arXiv:2401.16335, 2024
- Efficient prompt caching via embedding similarityarXiv preprint arXiv:2402.01173, 2024
- Generative AI security: challenges and countermeasuresarXiv preprint arXiv:2402.12617, 2024
- Chatbot arena: An open platform for evaluating llms by human preferenceIn Forty-first International Conference on Machine Learning, 2024
- Noisy computing of the threshold functionarXiv preprint arXiv:2403.07227, 2024
- Noisy Computing of the OR and MAX FunctionsIEEE Journal on Selected Areas in Information Theory, 2024
- Slora: Scalable serving of thousands of lora adaptersProceedings of Machine Learning and Systems, 2024
- From crowdsourced data to high-quality benchmarks: Arena-hard and benchbuilder pipelinearXiv preprint arXiv:2406.11939, 2024
- From live data to high-quality benchmarks: The arena-hard pipeline, April 2024URL https://lmsys. org/blog/2024-04-19-arena-hard, 2024
- From live data to high-quality benchmarks: The arena-hard pipelinelmsys Blog.(Apr. 19, 2024),[Online]. Available: https://lmsys. org/blog/2024-04-19-arena-hard/(visited on 08/04/2024), 2024
- Pairwise Proximal Policy Optimization: Large Language Models Alignment via Comparative RL2024
- Athene-70B: Redefining the Boundaries of Post-Training for Open Models2024
- Pairwise proximal policy optimization: Language model alignment with comparative RLIn First Conference on Language Modeling, 2024
- Starling-7b: Improving helpfulness and harmlessness with rlaifIn First Conference on Language Modeling, 2024
- Taming overconfidence in llms: Reward calibration in rlhfarXiv preprint arXiv:2410.09724, 2024
- How to Evaluate Reward Models for RLHFarXiv preprint arXiv:2410.14872, 2024
- Chatbot arena: An open platform for evaluating llms by human preference, 2024URL https://arxiv. org/abs/2403.04132, 2024
- Watermarking using Semantic-aware Speculative Sampling: from Theory to Practice2024
2023
- Jump-start reinforcement learningIn International Conference on Machine Learning, 2023
- Online learning in stackelberg games with an omniscient followerIn International Conference on Machine Learning, 2023
- Byzantine-robust federated learning with optimal statistical ratesIn International Conference on Artificial Intelligence and Statistics, 2023
- Online learning in a creator economyarXiv preprint arXiv:2305.11381, 2023
- Doubly-robust self-trainingAdvances in Neural Information Processing Systems, 2023
- On optimal caching and model multiplexing for large model inferencearXiv preprint arXiv:2306.02003, 2023
- Fine-tuning language models with advantage-induced policy alignmentarXiv preprint arXiv:2306.02231, 2023
- On the Optimal Bounds for Noisy ComputingIn 2023 IEEE International Symposium on Information Theory (ISIT), 2023
- Noisy Sorting CapacityarXiv preprint arXiv:2202.01446, 2023
- Variable-length insertion-based noisy sortingIn 2023 IEEE International Symposium on Information Theory (ISIT), 2023
- Noisy Computing of the OR and MAX FunctionsarXiv preprint arXiv:2309.03986, 2023
- Pairwise proximal policy optimization: Harnessing relative feedback for llm alignmentarXiv preprint arXiv:2310.00212, 2023
- Qft: Quantized full-parameter tuning of llms with affordable resourcesarXiv preprint arXiv:2310.07147, 2023
- Towards the fundamental limits of knowledge transfer over finite domainsarXiv preprint arXiv:2310.07838, 2023
- Principled reinforcement learning with human feedback from pairwise or k-wise comparisonsIn International Conference on Machine Learning, 2023
- Towards Optimal Caching and Model Selection for Large Model InferenceAdvances in Neural Information Processing Systems, 2023
- S-lora: Serving thousands of concurrent lora adaptersarXiv preprint arXiv:2311.03285, 2023
- Nexusraven: a commercially-permissive language model for function callingIn NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023
- Starling-7B: Improving LLM Helpfulness & Harmlessness with RLAIF2023
- The Effective Horizon Explains Deep RL Performance in Stochastic EnvironmentsarXiv preprint arXiv:2312.08369, 2023
- Towards optimal statistical watermarkingarXiv preprint arXiv:2312.07930, 2023
- Efficient Prompt Caching for Large Language Model Inference via Embedding Similarity2023
- S-LoRA: Serving thousands of concurrent LoRA adapters. arXiv preprint (2023)2023
2022
- Generalized resilience and robust statisticsThe Annals of Statistics, 2022
- Robust estimation via generalized quasi-gradientsInformation and Inference: A Journal of the IMA, 2022
- Minimax off-policy evaluation for multi-armed banditsIEEE Transactions on Information Theory, 2022
- Robust estimation for non-parametric families via generative adversarial networksIn 2022 IEEE International Symposium on Information Theory (ISIT), 2022
- The sample complexity of online contract designarXiv preprint arXiv:2211.05732, 2022
2021
- Linear representation meta-reinforcement learning for instant adaptationarXiv preprint arXiv:2101.04750, 2021
- Bridging offline reinforcement learning and imitation learning: A tale of pessimismAdvances in Neural Information Processing Systems, 2021
2020
- When does the tukey median work?In 2020 IEEE International Symposium on Information Theory (ISIT), 2020
2019
- Deconstructing Generative Adversarial NetworksarXiv preprint arXiv:1901.09465, 2019
- Joint transceiver optimization for wireless communication PHY using neural networkIEEE Journal on Selected Areas in Communications, 2019
- Joint Transceiver Optimization for Wireless Communication PHY Using Neural NetworkIEEE Journal on Selected Areas in Communications, IEEE Service Center, Piscataway, US, 2019
2018
- Sparse tensor decomposition for haplotype assembly of diploids and polyploidsBMC genomics, 2018
2017
- Improving Decision Tree Learning by Optimal Split Scoring Function Estimation2017