publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. traincheck-logo.png
    Training with Confidence: Catching Silent Errors in Deep Learning Training with Automated Proactive Checks
    Yuxuan Jiang, Ziming Zhou, Boyu Xu, and 3 more authors
    In Proceedings of the 19th USENIX Conference on Operating Systems Design and Implementation, Boston, MA, USA, Jul 2025
  2. slow-faults.png
    One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems
    Ruiming Lu, Yunchi Lu, Yuxuan Jiang, and 2 more authors
    In Proceedings of the 22nd USENIX Symposium on Networked Systems Design and Implementation, Philadelphia, PA, USA, Apr 2025

2024

  1. xpert.jpg
    Xpert: Empowering Incident Management with Query Recommendations via Large Language Models
    Yuxuan Jiang, Chaoyun Zhang, Shilin He, and 8 more authors
    In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, Apr 2024

2023

  1. acto.jpg
    Acto: Automatic End-to-End Testing for Operation Correctness of Cloud System Management
    Jiawei Tyler Gu, Xudong Sun, Wentao Zhang, and 5 more authors
    In Proceedings of the 29th Symposium on Operating Systems Principles, Koblenz, Germany, Apr 2023