publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems
    Ruiming Lu, Yunchi Lu, Yuxuan Jiang, and 2 more authors
    In Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, Philadelphia, PA, USA, Apr 2025
  2. Training with Confidence: Catching Silent DL Training Bugs with Automated Proactive Checks
    Yuxuan Jiang, Ziming Zhou, Boyu Xu, and 3 more authors
    In Proceedings of the 19th USENIX Conference on Operating Systems Design and Implementation, Boston, MA, USA, Jul 2025

2024

  1. Xpert: Empowering Incident Management with Query Recommendations via Large Language Models
    Yuxuan Jiang, Chaoyun Zhang, Shilin He, and 8 more authors
    In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, Jul 2024

2023

  1. Acto: Automatic End-to-End Testing for Operation Correctness of Cloud System Management
    Jiawei Tyler Gu, Xudong Sun, Wentao Zhang, and 5 more authors
    In Proceedings of the 29th Symposium on Operating Systems Principles, Koblenz, Germany, Jul 2023