Computing Web-scale Topic Models using an Asynchronous Parameter Server

R. Jagerman; C. Eickhoff; M. de Rijke

doi:https://doi.org/10.1145/3077136.3084135

Computing Web-scale Topic Models using an Asynchronous Parameter Server

Authors	R. Jagerman C. Eickhoff M. de Rijke
Publication date	2017
Book title	SIGIR'17 : proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
Book subtitle	August 7-11, 2017, Shinjuku, Tokyo, Japan
ISBN (electronic)	9781450350228
Event	40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017
Pages (from-to)	1337-1340
Number of pages	4
Publisher	New York, NY: Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Topic models such as Latent Dirichlet Allocation (LDA) have been widely used in information retrieval for tasks ranging from smoothing and feedback methods to tools for exploratory search and discovery. However, classical methods for inferring topic models do not scale up to the massive size of today's publicly available Web-scale data sets. The state-of-the-art approaches rely on custom strategies, implementations and hardware to facilitate their asynchronous, communication-intensive workloads. We present APS-LDA, which integrates state-of-the-art topic modeling with cluster computing frameworks such as Spark using a novel asynchronous parameter server. Advantages of this integration include convenient usage of existing data processing pipelines and eliminating the need for disk writes as data can be kept in memory from start to finish. Our goal is not to outperform highly customized implementations, but to propose a general high-performance topic modeling framework that can easily be used in today's data processing pipelines. We compare APS-LDA to the existing Spark LDA implementations and show that our system can, on a 480-core cluster, process up to 135× more data and 10× more topics without sacricing model quality.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3077136.3084135
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Computing Web-scale Topic Models using an Asynchronous Parameter Server