When stakes are high: balancing accuracy and transparency with Model-Agnostic Interpretable Data-driven suRRogates

doi:https://doi.org/10.1016/j.eswa.2022.117230

When stakes are high: balancing accuracy and transparency with Model-Agnostic Interpretable Data-driven suRRogates

Authors	R. Henckaerts K. Antonio M.-P. Côté
Publication date	15-09-2022
Journal	Expert Systems With Applications
Article number	117230
Volume \| Issue number	202
Number of pages	13
Organisations	Faculty of Economics and Business (FEB) - Amsterdam School of Economics Research Institute (ASE-RI) Faculty of Economics and Business (FEB)
Abstract	Technological advancements allow to develop high-performance black box predictive models. However, strictly regulated industries (like banking and insurance) ask for transparent decision-making algorithms. We therefore present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr) suited for structured tabular data. Knowledge is extracted from a black box via partial dependence effects. These are used to perform smart feature engineering by grouping variable values. This results in a segmentation of the feature space with automatic variable selection. A transparent generalized linear model (GLM) is fit to the features in categorical format and their relevant interactions. This GLM serves as a global surrogate to the original black box and replaces it in production. We demonstrate our R package maidrr with a case study on general insurance claim frequency modeling for six publicly available datasets. Our maidrr GLM closely approximates a gradient boosting machine (GBM) black box and outperforms both a linear and tree surrogate as benchmarks.
Document type	Article
Language	English
Published at	https://doi.org/10.1016/j.eswa.2022.117230
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

When stakes are high: balancing accuracy and transparency with Model-Agnostic Interpretable Data-driven suRRogates