A hybrid data harmonization workflow using word embeddings for the interlinking of heterogeneous cross-domain clinical data structures
| Authors |
|
|---|---|
| Publication date | 2021 |
| Book title | IEEE BHI-BSN 2021 |
| Book subtitle | 2021 BHI conference proceedings : virtual conference, July 27-30, 2021 |
| ISBN |
|
| ISBN (electronic) |
|
| Series | IEEE-EMBS International Conference on Biomedical and Health Informatics |
| Event | 2021 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2021 |
| Pages (from-to) | 88-91 |
| Number of pages | 4 |
| Publisher | Piscataway, NJ: IEEE |
| Organisations |
|
| Abstract |
Retrospective data harmonization is an open issue in healthcare due to the emerging need to interlink data from multiple clinical centers with the absence of standardized data collection protocols. In this work, we present an automated data harmonization workflow which utilizes lexical and semantic analysis based on word embeddings and relational modeling to detect terminologies with common lexical and conceptual basis. The method is built on top of a knowledge base to enable the interlinking of heterogeneous cross-domain data. A case study is applied in two clinical domains, namely the cardiovascular disease (CVD) and the mental disorders, where the proposed method yielded matched terminologies with 85% precision in less execution time than the application of lexical analysis and manual mapping which yielded 10% less precision. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1109/BHI50953.2021.9508484 |
| Other links | https://www.scopus.com/pages/publications/85125466146 |
| Permalink to this page | |