Welcome to the Software Center lunch seminar organized by theme 3:
Comparing Input Prioritization Techniques for Testing Deep Learning Algorithms
Speaker: Masilii Mosin, Volvo Cars, Associated Project DeVeLoP
Deep learning (DL) systems are becoming more popular in modern software due to their ability to solve complex problems. For example, they are used in safety-critical applications, such as camera perception in self-driving cars. It is important to thoroughly test such DL systems by ensuring their correctness on the predefined test set. This is a challenging task in itself since the test sets can grow over time as the new data is being acquired. Therefore, it becomes time-consuming to test the system on all test inputs. Thus, input prioritization is necessary to reduce the testing time since prioritized test inputs are more likely to reveal the erroneous behavior of a DL system earlier during test execution. Also, when testing only on the prioritized test inputs, there is no need to label the rest of the test set, thus reducing the total labeling cost.
In this study, we compare test input prioritization techniques of different types in terms of their effectiveness and efficiency. In particular, we consider surprise adequacy, autoencoder-based, and similarity-based input prioritization approaches on the example of testing a DL image classification algorithm applied on MNIST, Fashion-MNIST, CIFAR-10, and STL-10 datasets. We use a modified APFD (Average Percentage of Fault Detected) as the test input prioritization performance measure to operationalize the effectiveness. We use the setup and execution time as a measure to operationalize the efficiency. We observe that the surprise adequacy approach is the most effective with the performance from 0.785 to 0.914 APFD. The autoencoder-based and similarity-based approaches are less effective with the performance from 0.532 to 0.744 APFD and from 0.579 to 0.709 APFD respectively. At the same time, the similarity-based approach is the most efficient, and the surprise adequacy approach is the least efficient. The findings in this work demonstrate the trade-off between the considered input prioritization techniques to understand their practical applicability for testing DL algorithms.
_____________________________________________________
Microsoft Teams meeting
Join on your computer or mobile app
Click here to join the meeting
______________________________________________________