Department of Electrical and Computer Engineering Technical Reports

Distributed Diagnosis of Failures in a Three Tier E-Commerce System

Fahad Arshad, Purdue UniversityFollow
Gunjan Khanna, Purdue UniversityFollow
Ignacio Laguna, Purdue UniversityFollow
Saurabh Bagchi, Purdue UniversityFollow

Abstract

For dependability outages in distributed internet infrastructures, it is often not enough to detect a failure, but it is also required to diagnose it, i.e., to identify its source. Complex applications deployed in multi-tier environments, such as the classic three tier e-commerce system, make diagnosis challenging because of fast error propagation, black-box applications, constraints on the diagnosis delay, the amount of states that can be maintained, and imperfect diagnostic tests. Here, we propose a probabilistic diagnosis model for arbitrary failures in components of a distributed application. The monitoring system (the Monitor) passively observes the message exchanges between the components and at runtime, performs a probabilistic diagnosis of the component that was the root cause of a detected failure. The diagnosis model takes into account the possibility of a service failure, link failure, test imperfection, and lack of perfect observability at the monitoring station. We demonstrate the approach by applying it to a J2EE-based e-commerce application called Pet Store exercising a workload of browse-and-buy user transactions. We compare our approach with Pinpoint by quantifying the latency and accuracy of the two systems. The Monitor system outperforms Pinpoint by achieving comparably accurate diagnosis with higher precision in shorter time.

Keywords

Distributed system diagnosis, runtime monitoring, probabilistic diagnosis, fault injection based

Date of this Version

May 2007

Download

COinS

Department of Electrical and Computer Engineering Technical Reports

Distributed Diagnosis of Failures in a Three Tier E-Commerce System

Abstract

Keywords

Date of this Version

Search

Links

Links for Authors

Browse

Department of Electrical and Computer Engineering Technical Reports

Distributed Diagnosis of Failures in a Three Tier E-Commerce System

Authors

Abstract

Keywords

Date of this Version

Share

Search

Links

Links for Authors

Browse