Privacy-Preserving Record Linkage with Spark

O. Valkering; A. Belloum

doi:https://doi.org/10.1109/CCGRID.2019.00058

Privacy-Preserving Record Linkage with Spark

Authors	O. Valkering A. Belloum
Publication date	2019
Book title	Proceedings 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Book subtitle	CCGrid 2019, Cyprus
ISBN	9781728109138
ISBN (electronic)	9781728109121
Event	19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2019
Pages (from-to)	440-448
Number of pages	9
Publisher	IEEE Computer Society
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Privacy considerations obligate careful and secure processing of personal data. This is especially true when personal data is linked against databases from other organizations. During such endeavors, privacy-preserving record linkage (PPRL) can be utilized to prevent needless exposure of sensitive information to other organizations. With the increase of personal data that is being gathered and analyzed, scalable PPRL capable of handling massive databases is much desired. In this work, we evaluate Apache Spark as an option to scale PPRL. Not only is it valuable to have a scalable PPRL implementation, but one based on the Spark would also be commonly deployable and could take advantage of further development of the ecosystem. Our results show that a PPRL solution based on Spark outperforms alternatives when it comes to handling multiple millions of records; can scale to dozens of nodes, and is on-par with regular record linkage implementations in terms of achieved results.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1109/CCGRID.2019.00058 (Final published version)
Other links	https://www.scopus.com/pages/publications/85069434168
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Privacy-Preserving Record Linkage with Spark