mlidea: Interactively Improving ML Data Preparation Code via "Shadow Pipelines"

Open Access
Authors
Publication date 08-2025
Journal Proceedings of the VLDB Endowment
Volume | Issue number 18 | 12
Pages (from-to) 5359–5362
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. To address this challenge, we propose to assist data scientists with automatically derived interactive suggestions for pipeline improvements during this development cycle. We demonstrate mlidea, a library to generate interactive suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. Our system uses incremental view maintenance to enable data scientists to quickly iterate on their code and to ensure low-latency maintenance of the shadow pipelines. We demonstrate how our system improves code for various domains with three interactive shadow pipelines: fixing mislabeled rows, enhancing robustness against data quality problems, and improving pipeline performance on data slices with subpar predictions.
Document type Article
Language English
Published at https://doi.org/10.14778/3750601.3750671
Downloads
mlidea (Final published version)
Permalink to this page
Back