Failure characterization and error detection in distributed web applications
We have seen an evolution of increasing scale and complexity of enterprise-class distributed applications, such as, web services for providing anything from critical infrastructure services to electronic commerce. With this evolution, it has become increasingly difficult to understand how these applications perform, when do they fail, and what can be done to make them more resilient to failures, both due to hardware and due to software? Application developers tend to focus on bringing their applications to market quickly without testing the complex failure scenarios that can disrupt or degrade a given web service. Operators configure these web services without the complete knowledge of how the configurations interact with the various layers. Matters are not helped by ad hoc and often poor quality failure logs generated by even mature and widely used software systems. Worse still, both end users and servers sometime suffer from "silent problems" where something goes wrong without any immediate obvious end-user manifestation. To address these reliability issues, characterizing and detecting software problems with some post-detection diagnostic-context is crucial. This dissertation first presents a fault-injection and bug repository-based evaluation to characterize silent and non-silent software failures and configuration problems in three-tier web applications and Java EE application servers. Second, for detection of software failures, we develop simple low-cost application-generic and application-specific consistency checks, while for duplicate web requests (a class of performance problems), we develop a generic autocorrelation-based algorithm at the server end.Third, to provide diagnostic-context as a post-detection step for performance problems, we develop an algorithm based on pair-wise correlation of system metrics to diagnose the root-cause of the detected problem.
Bagchi, Purdue University.
Computer Engineering|Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our