Noise Handling for Improving Machine Learning-based Test Case Selection: brownbag seminar from November 1
SC doctoral candidate Khaled Al-Sabbagh presents his research about noise selection – which is a preparation for his Licentiate seminar.
Recording of the presentation on YouTube:
https://www.youtube.com/watch?v=vseujcza10k
Background:
Continuous integration is a modern software engineering practice that promotes for continuously integrating and testing new code changes as soon as they get submitted to the project repository. One challenge in continuous integration concerns the ability to select a subset of test cases that have the highest probability of revealing faults during each integration cycle. The availability of large amounts of data about code changes and executed test cases in continuous integration systems poses an opportunity to design data-driven approaches that can effectively select test cases for regression testing.
Objective: The objective of this thesis is to create a method for selecting test cases that have the highest probability of revealing faults in the system, given new code changes pushed into the code-base. Using historically committed source code and their respective executed test cases, we can utilize textual analysis and machine learning to design a method, called MeBoTS, that can learn the selection of test cases.
Method: To address this objective, we carried out two design science research cycles and two controlled experiments. A combination of quantitative and qualitative data collection methods were used, including testing and code commits data, surveys, and a workshop, to evaluate and improve the effectiveness of MeBoTS in selecting effective test cases.
Results: The main findings of this thesis are that: 1) using an elimination and a re-labelling strategy for handling class noise in the data increases the performance of MeBoTS from 25% to 84% (F1-score), 2) eliminating attribute noise from the training data does not improve the predictive performance of a test selection model (F1-score remains unchanged at 66%), and 3) memory management changes in the source code should be tested with performance, load, soak, stress, volume, and capacity tests; the algorithmic complexity changes should be tested with the same tests for memory management code changes in addition to maintainability tests.
Conclusion: Our first conclusion is that textual analysis of source code can be effective in test case selection if a class noise handling strategy is applied for curating incorrectly labeled data points in the training data. Secondly, test orchestrators do not need to handle attribute noise in the data, since it does not lead to an improvement in the performance of MeBoTS. Finally, we conclude that the performance of MeBoTS can be improved by instrumenting a tool that automatically associates code changes of specific types to test cases that are in dependency for training.