A hybrid data harmonization workflow using word embeddings for the interlinking of heterogeneous cross-domain clinical data structures

Authors
  • S.W. Van der Laan
  • F. Lamers
  • T. Lehtimäki
  • W. März
  • D.I. Fotiadis
Publication date 2021
Book title IEEE BHI-BSN 2021
Book subtitle 2021 BHI conference proceedings : virtual conference, July 27-30, 2021
ISBN
  • 9781665447706
ISBN (electronic)
  • 9781665403580
Series IEEE-EMBS International Conference on Biomedical and Health Informatics
Event 2021 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2021
Pages (from-to) 88-91
Number of pages 4
Publisher Piscataway, NJ: IEEE
Organisations
  • Faculty of Social and Behavioural Sciences (FMG) - Psychology Research Institute (PsyRes)
Abstract

Retrospective data harmonization is an open issue in healthcare due to the emerging need to interlink data from multiple clinical centers with the absence of standardized data collection protocols. In this work, we present an automated data harmonization workflow which utilizes lexical and semantic analysis based on word embeddings and relational modeling to detect terminologies with common lexical and conceptual basis. The method is built on top of a knowledge base to enable the interlinking of heterogeneous cross-domain data. A case study is applied in two clinical domains, namely the cardiovascular disease (CVD) and the mental disorders, where the proposed method yielded matched terminologies with 85% precision in less execution time than the application of lexical analysis and manual mapping which yielded 10% less precision.

Document type Conference contribution
Language English
Published at https://doi.org/10.1109/BHI50953.2021.9508484
Other links https://www.scopus.com/pages/publications/85125466146
Permalink to this page
Back