Spectrum-based fault localization using machine learning
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2017 Dr Neelofar
Debugging is critical in the production of reliable software. One of the effective bug localization techniques is Spectrum-based Fault Localization (SBFL). This technique locates a buggy statement by applying an evaluation metric to program spectra and ranking program components on the basis of the score it computes. Most SBFL research to date has had a strong bias toward single bug programs and there are many good evaluation metrics available. The same is true for deterministic bugs (which cause test cases to fail whenever they are executed). However, having multiple bugs has now become a norm in large software systems, and debugging metrics that perform well for such software are consequently in high demand. In this thesis, we propose a parametric class of metrics that adapts itself to single, multiple and deterministic bugs. The parameter values are learnt using optimization methods such as Genetic Programming or Simulated Annealing. We name our proposed class of metric as Hyperbolic” metric due to the nature of its contours which are like hyperbola. We evaluate the performance of our proposed metric both on real programs and model programs with single bugs, multiple bugs, deterministic bugs and non-deterministic bugs, and find that the proposed class of metrics performs as well or better than the previous best-performing metrics over a broad range of data. SBFL is lightweight but has limited accuracy due to the limited amount of information it uses for localizing faults. It depends solely on the execution information of program components in passed and failed test cases. In this thesis, we propose a debugging approach that is a hybrid of SBFL and Static Analysis to provide extra information to SBFL about programs under test. Program statements are categorized into various types which are given weights based on how likely a category is related to a bug. We evaluate the performance of our technique both for small programs from the Siemens Test Suite (STS) and the larger Space program. Results show that our technique improves the performance of a wide variety of fault localization metrics on single and multiple bug data. Statement type is one of the many program features that can be used to get valuable clues about the location of a bug. Other features could be statement length, nesting depth, cyclomatic complexity etc. However, it is very expensive and thus impractical to plugin each of these features into our proposed technique to find how effective they are in fault localization. We devise a statistical method that can be used to evaluate the importance of a program feature in fault localization without using it in a machine learning method. The similarity of the results obtained by our proposed statistical method and actual implementation of the feature in machine learning techniques depicts the accuracy of our proposed method in feature-importance identification. Usually, when two or more well-performing techniques are combined, they do not perform better than any of the techniques individually. We combined our hyperbolic class of metrics with statement weightage technique, and the combined technique further improves the performance of the hyperbolic class of metrics. The improvement in performance is statistically significant for many single and multi-bug datasets.
KeywordsSBFL; Debugging and Verification, Software Testing; Static and Dynamic Analysis
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References