Directions Towards Efficient and Automated Data Wrangling with Large Language Models

Open Access
Authors
Publication date 2024
Book title 2024 IEEE 40th International Conference on Data Engineering Workshops
Book subtitle ICDEW 2024 : 13-17 May 2024, Utrecht, Netherlands : proceedings
ISBN
  • 9798350384048
ISBN (electronic)
  • 9798350384031
  • 9798350317152
Event 40th IEEE International Conference on Data Engineering Workshops, ICDEW 2024
Pages (from-to) 301-304
Publisher Los Alamitos, California: IEEE Computer Society
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

Data integration and cleaning have long been a key focus of the data management community. Recent research indicates the potential of large language models (LLMs) for such tasks. However, scaling and automating data wrangling with LLMs for real-world use cases poses additional challenges. Manual prompt engineering for example, is expensive and hard to operationalise, while full fine-tuning of LLMs incurs high compute and storage costs. Following up on previous work, we evaluate parameter-efficient fine-tuning (PEFT) methods for efficiently automating data wrangling with LLMs. We conduct a study of four popular PEFT methods on differently sized LLMs for ten benchmark tasks, where we find that PEFT methods achieve performance on-par with full fine-tuning, and that we can leverage small LLMs with negligible performance loss. However, even though such PEFT methods are parameter-efficient, they still incur high compute costs at training time and require labeled training data. We explore a zero-shot setting to further reduce deployment costs, and propose our vision for ZEROMATCH, a novel approach to zero-shot entity matching. It is based on maintaining a large number of pretrained LLM variants from different domains and intelligently selecting an appropriate variant at inference time.

Document type Conference contribution
Language English
Published at https://doi.org/10.1109/ICDEW61823.2024.00044
Other links https://www.proceedings.com/75058.html
Downloads
Permalink to this page
Back