Towards data-centric what-if analysis for native machine learning pipelines

Open Access
Authors
Publication date 2022
Book title Proceedings of the Sixth Workshop on Data Management for End-to-End Machine Learning
Book subtitle in conjunction with the 2022 ACM SIGMOD/PODS Conference, Philadelphia, PA, USA
ISBN (electronic)
  • 9781450393751
Event 6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference
Article number 3
Number of pages 5
Publisher New York, NY: Association for Computing Machinery
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

An important task of data scientists is to understand the sensitivity of their models to changes in the data that the models are trained and tested upon. Currently, conducting such data-centric what-if analyses requires significant and costly manual development and testing with the corresponding chance for the introduction of bugs. We discuss the problem of data-centric what-if analysis over whole ML pipelines (including data preparation and feature encoding), propose optimisations that reuse trained models and intermediate data to reduce the runtime of such analysis, and finally conduct preliminary experiments on three complex example pipelines, where our approach reduces the runtime by a factor of up to six.

Document type Conference contribution
Language English
Published at https://doi.org/10.1145/3533028.3533303
Other links https://www.scopus.com/pages/publications/85133167380
Downloads
3533028.3533303 (Final published version)
Permalink to this page
Back