Etude - Evaluating the Inference Latency of Session-Based Recommendation Models at Scale

B. Kersbergen; O. Sprangers; F. Kootte; S. Guha; M. de Rijke; S. Schelter

doi:https://doi.org/10.1109/icde60146.2024.00389

Etude - Evaluating the Inference Latency of Session-Based Recommendation Models at Scale

Authors	B. Kersbergen O. Sprangers F. Kootte S. Guha M. de Rijke S. Schelter
Publication date	2024
Book title	2024 IEEE 40th International Conference on Data Engineering
Book subtitle	ICDE 2024 : 13-17 May 2024, Utrecht, Netherlands : proceedings
ISBN	9798350317169
ISBN (electronic)	9798350317152
Event	IEEE 40th International Conference on Data Engineering
Pages (from-to)	5177-5183
Publisher	Los Alamitos, California: IEEE Computer Society
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Session-based recommendation (SBR) targets a core scenario in e-Commerce: Given a sequence of interactions of a visitor with a selection of items, we want to recommend the next item(s) of interest to interact with. Unfortunately, SBR models are difficult to deploy in practice, as (i) session-based recommendations cannot be precomputed offline, but must be inferred online for ongoing user sessions with low latency, and (ii) there is a huge variety of SBR models available, typically designed by academic researchers, whose inference performance and deployment cost is unclear. As a result, data scientists must typically prototype and evaluate different deployment options in collaboration with devops teams - a tedious and costly process, which does not scale to multiple use cases. To alleviate this, we present Etude, an end-to-end bench-marking framework, which enables data scientists to automati-cally evaluate the inference performance of SBR models under different deployment options. With Etude, data scientists can declaratively specify workload statistics, hardware options, as well as latency and throughput constraints. Based on these, Etude automatically deploys and runs an inference benchmark in Kubernetes with a synthetically generated click workload. Sub-sequently, Etude provides the data scientists with measurements on the achieved throughput and latency, as a basis for deciding on feasible and cost-efficient deployment options. We detail the design of Etude and present an experimental study for ten different SBR models in challenging settings resembling real-world workloads encountered at the large Euro-pean e-Commerce platform bol.com. We determine performant and cost-efficient deployment options in terms of models and cloud instance types for a variety of online shopping use cases (ranging from grocery shopping to large e-Commerce platforms). Moreover, we identify severe performance bottlenecks in the open source TorchServe inference server from the PyTorch ecosystem and in the implementation of four SBR models from the open source RecBole library. We make the source code of our framework and experimental results publicly available.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1109/icde60146.2024.00389
Other links	https://www.proceedings.com/75189.html
Downloads	Etude_-_Evaluating_the_Inference_Latency_of_Session-Based_Recommendation_Models_at_Scale (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Etude - Evaluating the Inference Latency of Session-Based Recommendation Models at Scale