Acto

Automatic end-to-end testing for operation correctness of cloud system operators

Cloud management platforms like Kubernetes rely on operators — programs that automate system operations using a state-reconciliation model. When an operator has a bug, it can silently leave managed systems in incorrect states, causing subtle and hard-to-diagnose failures in production.

Acto is the first automatic end-to-end testing framework for cloud system operators. It uses a state-centric approach: Acto continuously drives an operator to reconcile the managed system across a large state space, checking at each step whether the system correctly reaches the declared desired state.

Key contributions:

  • State-transition modeling of operator behavior for systematic state-space exploration
  • Automatic oracles that check correctness without requiring user-defined assertions
  • Found 56 serious new bugs (42 confirmed, 30 fixed) across 11 Kubernetes operators

(Gu et al., 2023)

Published at SOSP ‘23. Code available on GitHub.

References

2023

  1. acto.jpg
    Acto: Automatic End-to-End Testing for Operation Correctness of Cloud System Management
    Jiawei Tyler Gu, Xudong Sun, Wentao Zhang, and 5 more authors
    In Proceedings of the 29th Symposium on Operating Systems Principles, Koblenz, Germany, 2023