Failure Logs Based Event Prediction in Large Scale Systems
Abstract
Research in the field of failure log analysis shows that spatial and temporal patterns exist among events contained within the system logs of nodes that comprise large-scale systems. Existing works in this field use clustering mechanisms on log events to represent these patterns and recommend proactive methods to prevent failures in the immediate future. Recent works use discrete-time Semi Markov Models to closely model such events and calculate node reliability. In this research, we use a Hidden Semi Markov Model to predict subsystem failure events leading to a degraded or failure state of a node. As a proactive measure, this method can allow a job scheduler to intelligently assign time and resource consuming jobs to appropriate nodes based on this assessment of reliability of nodes. It will also enable system administrators to be informed of specific subsystem events that are likely to occur in the future.
Degree
M.S.
Advisors
Hacker, Purdue University.
Subject Area
Information Technology|Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.