From Text to Knowledge: Leveraging LLMs and RAG for Relationship Extraction in Ontologies and Thesauri

Open Access
Authors
Publication date 2025
Host editors
  • C. Badenes-Olmedo
  • I. Novalija
  • E. Daga
  • L. Stork
  • R.G. Pillai
  • L. Dierickx
  • B. Kruit
  • V. Degeler
  • J. Moreira
  • B. Zhang
  • R. Alharbi
  • Y. He
  • A. Graciotti
  • A. Morales Tirado
  • V. Presutti
  • E. Motta
Book title Joint Proceedings of Posters, Demos, Workshops, and Tutorials of the 24th International Conference on Knowledge Engineering and Knowledge Management (EKAW-PDWT 2024)
Book subtitle co-located with 24th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2024) : Amsterdam, Netherlands, November 26-28, 2024
Series CEUR Workshop Proceedings
Event Posters, Demos, Workshops, and Tutorials of the 24th International Conference on Knowledge Engineering and Knowledge Management
Number of pages 16
Publisher Aachen: CEUR-WS
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Ontologies, vocabularies, and thesauri provide a shared conceptualisation for a domain. Manually maintaining and updating such knowledge systems when knowledge changes, does not scale for large domains, such as in biomedicine. Recently, large language models (LLMs) have been increasingly used as tools in knowledge engineering processes, offering new possibilities for the automatic creation and maintenance of knowledge systems. This work explores how LLMs can be leveraged for the automated extension of such knowledge systems.
Specifically, we build on the DRAGON-AI framework, which integrates Retrieval-Augmented Generation (RAG) to provide LLMs with access to external knowledge sources for more faithful outputs. We investigate the ability of the framework to predict relationships between a given knowledge system and a novel concept. We do so for both an ontology and a thesaurus, and analyse the impact of (i) enriching prompts with contextual information as well as more clear instructions, (ii) an alternative retrieval approach, and (iii) using a conversational model versus an instruction-following model. The results show superior quality in the ontology generations for all models and approaches compared to the thesaurus. The two models show varied performance across the different experiment configurations with only the conversational model showing notably improved performance, in terms of F1, for the ontology with the custom retrieval approach.
Document type Conference contribution
Language English
Published at https://ceur-ws.org/Vol-3967/ELMKE_2024_paper_4.pdf
Other links https://ceur-ws.org/Vol-3967/
Downloads
ELMKE_2024_paper_4 (Final published version)
Permalink to this page
Back