Etude - Evaluating the Inference Latency of Session-Based Recommendation Models at Scale

Open Access
Authors
Publication date 2024
Book title 2024 IEEE 40th International Conference on Data Engineering
Book subtitle ICDE 2024 : 13-17 May 2024, Utrecht, Netherlands : proceedings
ISBN
  • 9798350317169
ISBN (electronic)
  • 9798350317152
Event IEEE 40th International Conference on Data Engineering
Pages (from-to) 5177-5183
Publisher Los Alamitos, California: IEEE Computer Society
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Session-based recommendation (SBR) targets a core scenario in e-Commerce: Given a sequence of interactions of a visitor with a selection of items, we want to recommend the next item(s) of interest to interact with. Unfortunately, SBR models are difficult to deploy in practice, as (i) session-based recommendations cannot be precomputed offline, but must be inferred online for ongoing user sessions with low latency, and (ii) there is a huge variety of SBR models available, typically designed by academic researchers, whose inference performance and deployment cost is unclear. As a result, data scientists must typically prototype and evaluate different deployment options in collaboration with devops teams - a tedious and costly process, which does not scale to multiple use cases. To alleviate this, we present Etude, an end-to-end bench-marking framework, which enables data scientists to automati-cally evaluate the inference performance of SBR models under different deployment options. With Etude, data scientists can declaratively specify workload statistics, hardware options, as well as latency and throughput constraints. Based on these, Etude automatically deploys and runs an inference benchmark in Kubernetes with a synthetically generated click workload. Sub-sequently, Etude provides the data scientists with measurements on the achieved throughput and latency, as a basis for deciding on feasible and cost-efficient deployment options. We detail the design of Etude and present an experimental study for ten different SBR models in challenging settings resembling real-world workloads encountered at the large Euro-pean e-Commerce platform bol.com. We determine performant and cost-efficient deployment options in terms of models and cloud instance types for a variety of online shopping use cases (ranging from grocery shopping to large e-Commerce platforms). Moreover, we identify severe performance bottlenecks in the open source TorchServe inference server from the PyTorch ecosystem and in the implementation of four SBR models from the open source RecBole library. We make the source code of our framework and experimental results publicly available.
Document type Conference contribution
Language English
Published at https://doi.org/10.1109/icde60146.2024.00389
Other links https://www.proceedings.com/75189.html
Downloads
Permalink to this page
Back