Reliable data processing enabled by program analysis
Abstract
Errors pose a serious threat to the output validity of modern data processing, which is often performed by computer programs. In scientific computation, data are collected through instruments or sensors that may be exposed to rough environmental conditions, leading to errors. Furthermore, during the computation process data may not be precisely represented due to the limited precision of the underlying machine, leading to representation errors. Computational processing of these data may hence produce unreliable output results or even faulty conclusions. We call them reliability problems. We consider the reliability problems that are caused by two kinds of errors. The first kind of errors includes input and parameter errors, which originate from the external physical environment. We call these external errors. The other kind of errors is due to the limited representation of floating-point values. They occur when values cannot be precisely represented by machines. We call them internal representation errors, or internal errors. They are usually at a much smaller scale compared to external errors. Nonetheless, such tiny errors may still lead to unreliable results and serious problems. In this dissertation, we develop program analysis techniques to enable reliable data processing. For external errors, we propose techniques to improve the sampling efficiency of Monte Carlo methods, namely execution coalescing and white-box sampling. For internal errors, we develop efficient monitoring techniques to detect instability problems at runtime in floating point program executions.
Degree
Ph.D.
Advisors
Zhang, Purdue University.
Subject Area
Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.