Selective logging for accurate, space efficient forensic analysis and reducible execution replay
Abstract
Logging is a well-established technique to record dynamic information during system execution. It has two important capabilities: (1) investigating cyber attacks to identify the root cause of an attack, and to determine the ramification of an attack for recovery from it; and (2) reproducing software failures to understand and fix them. However, there are many long-running processes such as server programs and user interactive (UI) programs that receive a large volume of inputs and produce many outputs, where each output may be causally related to all the preceding inputs, making attack investigation almost infeasible. Another key challenge to applying a logging technique in attack investigation is the size of the logs. Audit logs generated by the traditional logging techniques can grow at a rate of gigabytes per day, incurring excessive storage and processing overhead. Logging and replaying long-running programs can be problematic because they may produce large replay logs that entail long replay time. The developer may have to wait for hours or days before a failure is reproduced. In this dissertation, we present selective logging techniques for (1) highly accurate forensic analysis, (2) a space efficient audit logging system, and (3) reducible execution replay. We make three contributions. Our first contribution is a highly accurate attack provenance tracing technique enabled by a selective fine-grained logging method, called BEEP. It automatically partitions a long-running process into multiple autonomous units that handle independent input data. We show that BEEP effectively captures the minimal causal graph for every attack case we have studied. Our second contribution is a garbage collection enabled audit logging system, called LogGC. It automatically removes unreachable objects in audit logs that record history over a long period of time. With LogGC, space consumption of audit logs can be reduced by an order of magnitude without affecting the accuracy of forensic analysis. Our third contribution is a compiler-based technique that generates a reducible replay log. The technique divides an execution into units and instruments programs to collect minimal additional information into the replay log, and then reduction can be achieved through analyzing just the log.
Degree
Ph.D.
Advisors
Xu, Purdue University.
Subject Area
Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.