Software Center brown bag seminar from August 30, organized by the AI engineering theme. The speaker is Aiswarya Raj, PhD student at Chalmers:
Abstract: A large volume of high-quality data is mission-critical for real-world AI applications. Data pipelines consolidate data from disparate sources into one common destination, enable quick data accessibility, and ensure consistent data quality, which is crucial for AI applications. Companies from all domains experience data quality issues and practitioners report that they spend a significant amount of time in data pre-processing. Thus, validating the quality of data is critical for establishing the trustworthiness of data pipelines, but is mostly a challenging task of complex real-world applications. State-of-the-practice solutions require explicit domain expertise and are often implemented using a rule-based fault detection system. However, due to the evolving nature of the data (data drifts), it is difficult to anticipate all faults associated with data that may occur in the data pipeline. Therefore, in this talk, the main focus will be on employing AI in the data pipeline for the detection of such faults and invoking corresponding mitigation strategies. We use a four-stage model to implement the AI-powered fault-tolerant data pipelines. From a research perspective, the main benefit of the project is to enable companies to develop a data pipeline that delivers high-quality data products. To companies, project results will help to reduce the amount of data pre-processing work done by data scientists, data analysts and other AI practitioners while working with data, which in turn accelerates the entire AI system development process.