mlidea: Interactively Improving ML Data Preparation Code via "Shadow Pipelines"

Stefan Grafberger; Paul Groth; Sebastian Schelter

doi:https://doi.org/10.14778/3750601.3750671

mlidea: Interactively Improving ML Data Preparation Code via "Shadow Pipelines"

Authors	Stefan Grafberger Paul Groth Sebastian Schelter
Publication date	08-2025
Journal	Proceedings of the VLDB Endowment
Volume \| Issue number	18 \| 12
Pages (from-to)	5359–5362
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. To address this challenge, we propose to assist data scientists with automatically derived interactive suggestions for pipeline improvements during this development cycle. We demonstrate mlidea, a library to generate interactive suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. Our system uses incremental view maintenance to enable data scientists to quickly iterate on their code and to ensure low-latency maintenance of the shadow pipelines. We demonstrate how our system improves code for various domains with three interactive shadow pipelines: fixing mislabeled rows, enhancing robustness against data quality problems, and improving pipeline performance on data slices with subpar predictions.
Document type	Article
Language	English
Published at	https://doi.org/10.14778/3750601.3750671 (Final published version)
Downloads	mlidea (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

mlidea: Interactively Improving ML Data Preparation Code via "Shadow Pipelines"