Data augmentation for vehicle detection with diffusion-based object inpainting

Sebastiaan B. Snel; Thijs A. Eker; Ella P. Fokkinga; A. Visser; Klamer Schutte; Friso G. Heslinga

doi:https://doi.org/10.1117/12.3070068

Data augmentation for vehicle detection with diffusion-based object inpainting

Authors	Sebastiaan B. Snel Thijs A. Eker Ella P. Fokkinga A. Visser Klamer Schutte Friso G. Heslinga
Publication date	2025
Host editors	H.J. Kuijf R. Prabhu Y. Yitzhaky
Book title	Artificial Intelligence for Security and Defence Applications III
Book subtitle	16-18 September 2025, Madrid, Spain
ISBN	9781510692978
ISBN (electronic)	9781510692985
Series	Proceedings of the SPIE
Event	Artificial Intelligence for Security and Defence Applications III
Article number	136790V
Number of pages	14
Publisher	Bellingham, Washington: SPIE
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Automated vehicle detection in video footage captured by Unmanned Aerial Vehicles (UAVs) is a critical capability in security and defense domains, especially for environments where communication is jammed. Development of deep learning-based object detectors for this purpose typically requires large-scale datasets, which can be hard to obtain due to limited access to relevant environments. To address this challenge, synthetic data has been proposed as a supplementary source of training data, introducing additional variations in the appearance and positioning of objects. One promising strategy for generating synthetic data is inpainting, where objects of interest are seamlessly integrated into various backgrounds. However, traditional inpainting techniques lack spatial and contextual awareness, limiting their effectiveness for data augmentation. Recent advancements in generative AI, specifically diffusion models, have demonstrated improvements in object harmonization and spatial control for object inpainting, enabling realistic foreground-background matching with a high level of diversity. In this work, we explore the value of diffusion-based inpainting as a data augmentation technique. We use the inpainting model AnyDoor to enrich a small subset (1000 frames), of the VisDrone train dataset with inpainted versions of minority-class objects (buses, vans, trucks). We train YOLOX detectors on datasets with increasing amounts of synthetic vehicles (1x, 5x, 10x, and 20x) and analyze the impact on detection performance. Results show that zero-shot inpainting can substantially improve detection for buses up to an augmentation factor of 10x, with no improvements at 20x. Effects for vans and trucks are mixed and sometimes negative. Fine-tuning AnyDoor provided limited additional benefit under the tested conditions. Overall, diffusion-based inpainting shows potential as a data augmentation strategy in low-resource UAV scenarios. Future work should explore strategies to increase contextual diversity, such as adding multiple synthetic objects per image or incorporating automated quality control for synthetic samples.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1117/12.3070068
Other links	https://spie.org/spie-sensors-imaging/presentation/Data-augmentation-for-vehicle-detection-with-diffusion-based-object-inpainting/13679-31
Downloads	SPIE_2025_SI__GenAI_Inpainting_as_Data_Augmentation (Embargo up to 2026-09-17) (Submitted manuscript) 136790V (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Data augmentation for vehicle detection with diffusion-based object inpainting