Techniques for detecting scalability bugs

Bowen Zhou, Purdue University

Abstract

Developing correct and efficient software for large scale systems is a challenging task. Developers may overlook pathological cases in large scale runs, employ inefficient algorithms that do not scale, or conduct premature performance optimizations that work only for small scale runs. Such program errors and inefficiencies can result in an especially subtle class of bugs that are scale-dependent. While small-scale test cases may not exhibit these bugs, large-scale production runs may suffer failures or performance issues caused by them. Without an effective method to find such bugs, the developers are forced to search through an enormous amount of logs generated in production systems to fix a scaling problem. We developed a series of statistical debugging techniques to detect and localize bugs based on a key observation that most programs developed for large scale systems exhibit behavioral features predictable from the scale. Our techniques extrapolate these features to large scale runs, based solely on the training data collected in small scale runs. The predicted behaviors are then compared with the actual behaviors to pinpoint individual program features that contain a bug. We applied these techniques to detect and localize real-world bugs found in a popular MPI library, a P2P file sharing program, and synthetic faults injected into an HPC application. We also built Lancet, a symbolic execution tool, to generate large scale test inputs for distributed applications. Lancet infers the constraints that a large-scale input should satisfy based on models built on small-scale inputs, allowing programmers to generate large-scale inputs without performing symbolic execution at large scales. With built-in support for multithreading, socket API, and event-driven asynchronous programming, Lancet is ready to be applied to real world distributed applications. We demonstrated the effectiveness of Lancet by using it to generate large-scale, targeted inputs for various SPEC benchmarks and Memcached.

Degree

Ph.D.

Advisors

Zhang, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS