Incremental update framework for efficient retrieval from software libraries for bug localization

Shivani Rao, Purdue University

Abstract

Information Retrieval (IR) based bug localization techniques use a bug report to query a software repository to retrieve relevant source files. Search is performed using a query that is constructed from the bug report on an index built from the the source files in the software repository. Much of the current research is focused on improving the retrieval effectiveness of these methods. However, little consideration has been given to the efficiency of such approaches for software repositories that are constantly evolving. As a software repository evolves, the index creation and model learning have to be repeated to ensure accurate retrieval for each new bug. This amounts to redundant computations for the vast majority of the source files that remain unchanged in a single commit thereby increasing the retrieval time for a given query. To address these issues, we propose an incremental update framework that continuously updates the index and the model on the basis of just the changes made at each commit. We demonstrate the versatility of our framework using four popular text models --- Vector Space Model (VSM), Smoothed Unigram Model (SUM), Latent Semantic Analysis (LSA), and Latent Dirichlet Allocation (LDA). We show that the same retrieval accuracy can be achieved but with a fraction of the time needed by current approaches. We also propose strategies to identify commits where the index and the model may require to be re-computed. The dataset we used in our validation experiments was created by tracking the commit history of AspectJ and JodaTime software libraries over a span of 10 years.

Degree

Ph.D.

Advisors

Kak, Purdue University.

Subject Area

Computer Engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS