The ramifications of data handling for computational models

J.G. Nevin

The ramifications of data handling for computational models

Authors	J.G. Nevin
Supervisors	P.T. Groth
Cosupervisors	M.H. Lees
Award date	04-12-2024
ISBN	9789493391598
Series	SIKS Dissertation series , 2024-37
Number of pages	196
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Many computational models rely on real-world data, with the successful application of these models being dependent on access to accurate and representative datasets. With increasingly sophisticated models and data, the steps required in moving from data collection to model output are becoming more complex. The effects of data handling steps such as cleaning and integration on the modelling and simulation process have generally not been addressed in the literature. This thesis investigates these issues and introduces frameworks for how best to reason about such problems. The first part of the thesis is focused on network diffusion models. These models are used to simulate spreading processes (such as disease or information) over networks. The outputs of such models are highly sensitive to the topology of the network on which they are run. From both theoretical and practical perspectives, we show the high model sensitivities to data handling that can be observed and suggest how results can be reported for transparent, holistic conclusions. In the second part, we expand to other data handling problems and model types. We first illustrate how data preprocessing decisions can change the structure of word co-occurrence networks. Such networks are frequently used in the social sciences, where decisions behind network construction are often not justified. Second, we show how mismatched training and test data cleaning pipelines can affect the performance and selection of regression models. Such mismatches can have surprising consequences, which have strong implications for practice.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

The ramifications of data handling for computational models