Failure Logs Based Event Prediction in Large Scale Systems

Meetali Vaidya, Purdue University

Abstract

Research in the field of failure log analysis shows that spatial and temporal patterns exist among events contained within the system logs of nodes that comprise large-scale systems. Existing works in this field use clustering mechanisms on log events to represent these patterns and recommend proactive methods to prevent failures in the immediate future. Recent works use discrete-time Semi Markov Models to closely model such events and calculate node reliability. In this research, we use a Hidden Semi Markov Model to predict subsystem failure events leading to a degraded or failure state of a node. As a proactive measure, this method can allow a job scheduler to intelligently assign time and resource consuming jobs to appropriate nodes based on this assessment of reliability of nodes. It will also enable system administrators to be informed of specific subsystem events that are likely to occur in the future.

Degree

M.S.

Advisors

Hacker, Purdue University.

Subject Area

Information Technology|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS