DORIAN in action: Assisted Design of Data Science Pipelines

Authors
Publication date 08-2022
Journal Proceedings of the VLDB Endowment
Volume | Issue number 15 | 12
Pages (from-to) 3714–3717
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Existing automated machine learning solutions and intelligent discovery assistants are popular tools that facilitate the end-user with the design of data science (DS) pipelines. However, they yield limited applicability for a wide range of real-world use cases and application domains due to (a) the limited support of DS tasks; (b) a small, static set of available operators; and (c) restriction to evaluation processes with quantifiable loss functions. We demonstrate DORIAN, a human-in-the-loop approach for the assisted design of data science pipelines that supports a large and growing set of DS tasks, operators, and arbitrary user-defined evaluation processes. Based on the user query, i.e., a dataset and a DS task, DORIAN computes a ranked list of candidate pipelines that the end-user can choose from, alter, execute and evaluate. It stores executed pipelines in an experiment database and utilizes similarity-based search to identify relevant previously-run pipelines from the experiment database. DORIAN also takes user interaction into account to improve suggestions over time. We show how users can interact with DORIAN to create and compare DS pipelines on various real-world DS tasks without the need for writing any code.
Document type Article
Language English
Related publication Assisted design of data science pipelines
Published at https://doi.org/10.14778/3554821.3554882
Permalink to this page
Back