When stakes are high: balancing accuracy and transparency with Model-Agnostic Interpretable Data-driven suRRogates
| Authors |
|
|---|---|
| Publication date | 15-09-2022 |
| Journal | Expert Systems With Applications |
| Article number | 117230 |
| Volume | Issue number | 202 |
| Number of pages | 13 |
| Organisations |
|
| Abstract |
Technological advancements allow to develop high-performance black box predictive models. However, strictly regulated industries (like banking and insurance) ask for transparent decision-making algorithms. We therefore present a procedure to develop a Model-Agnostic Interpretable Data-driven suRRogate (maidrr) suited for structured tabular data. Knowledge is extracted from a black box via partial dependence effects. These are used to perform smart feature engineering by grouping variable values. This results in a segmentation of the feature space with automatic variable selection. A transparent generalized linear model (GLM) is fit to the features in categorical format and their relevant interactions. This GLM serves as a global surrogate to the original black box and replaces it in production. We demonstrate our R package maidrr with a case study on general insurance claim frequency modeling for six publicly available datasets. Our maidrr GLM closely approximates a gradient boosting machine (GBM) black box and outperforms both a linear and tree surrogate as benchmarks.
|
| Document type | Article |
| Language | English |
| Published at | https://doi.org/10.1016/j.eswa.2022.117230 |
| Permalink to this page | |