Static Analysis of Android Apps with Text Analysis and Bi-directional Propagation
Abstract
While smartphones and mobile apps have been an integral part of our life, personal security issues on smartphones become a serious concern. Privacy leakage, namely sensitive data disclosures, happens frequently in mobile apps to disclose the user's sensitive information to untrusted, even malicious, third-party service providers, leading to serious problems. Besides, stealthy behaviors that are performed without the user's acknowledgment may cause unexpected phone charges or leakage of sensitive information. To address these problems, many approaches have been proposed. However, previous mobile privacy related research efforts have largely focused on predefined known sources managed by smartphones. More specifically, they focus on the API functions that directly return sensitive values. Some other information sources, such as the user inputs through user interface and data obtained from network or files, have been mostly neglected, even though such sources may contain a lot of sensitive information. In addition, the research efforts on detecting stealthy behaviors also depend on identifying suspicious behaviors with known actions, e.g., known premium phone numbers or URLs of malicious websites. In this dissertation, we present two automated techniques for the purpose of comprehensively sensitive data disclosure detection. Moreover, we propose a novel technique to detect stealthy behaviors in Android apps. Firstly, we examine the possibility of scalably detecting sensitive user inputs from mobile apps. We design and implement SUPOR, a novel static analysis tool that automatically examines the user interface to identify sensitive user inputs containing critical user data, such as user credentials, finance and medical data. SUPOR mimics from the user's perspective to associate input fields in user interfaces with most correlated text labels and utilizes text analysis to determine the sensitiveness of the user inputs. With the knowledge of sensitive user inputs, we are then able to detect their disclosures with the help of taint analysis. Secondly, we develop BidText to address the issues of detecting sensitive data disclosures where the data is generated by generic API functions whose return values cannot be easily recognized as sensitive or insensitive. BidText leverages the context of the data, associates the correlated text labels to corresponding variables and then applies text analysis to determine the sensitiveness of the data held by the variables. The intuition here is that the context of programs contains useful information to indicate what the variables may hold. BidText also features a novel bi-directional propagation technique through forward and backward data-flow to enhance static sensitive data disclosure detection. Thirdly, we develop AsDroid to detect stealthy behaviors in Android apps by checking the contradiction between user expectation, which is represented by user interface, and program behavior that can be abstracted by API invocations. We model API invocations with different types of intents and backwardly propagate the intents to top level functions, e.g., a user interaction function. We then analyze the text extracted from the user interface component associated with the top level function. Semantic mismatch of the two indicates stealthy behavior. To sum up, in this dissertation, we present SUPOR to detect sensitive user inputs, and BidText to determine the sensitiveness of the data generated by generic API functions. We also propose bi-directional propagation to enhance sensitive data disclosure detection. In addition, we inspect the contradiction between program behaviors and user expectations to detect stealthy behaviors in Android apps.
Degree
Ph.D.
Advisors
Zhang, Purdue University.
Subject Area
Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.