Towards data-centric what-if analysis for native machine learning pipelines
| Authors | |
|---|---|
| Publication date | 2022 |
| Book title | Proceedings of the Sixth Workshop on Data Management for End-to-End Machine Learning |
| Book subtitle | in conjunction with the 2022 ACM SIGMOD/PODS Conference, Philadelphia, PA, USA |
| ISBN (electronic) |
|
| Event | 6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference |
| Article number | 3 |
| Number of pages | 5 |
| Publisher | New York, NY: Association for Computing Machinery |
| Organisations |
|
| Abstract |
An important task of data scientists is to understand the sensitivity of their models to changes in the data that the models are trained and tested upon. Currently, conducting such data-centric what-if analyses requires significant and costly manual development and testing with the corresponding chance for the introduction of bugs. We discuss the problem of data-centric what-if analysis over whole ML pipelines (including data preparation and feature encoding), propose optimisations that reuse trained models and intermediate data to reduce the runtime of such analysis, and finally conduct preliminary experiments on three complex example pipelines, where our approach reduces the runtime by a factor of up to six. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/3533028.3533303 |
| Other links | https://www.scopus.com/pages/publications/85133167380 |
| Downloads |
3533028.3533303
(Final published version)
|
| Permalink to this page | |
