Data augmentation for vehicle detection with diffusion-based object inpainting

Authors
  • Sebastiaan B. Snel
  • Thijs A. Eker
  • Ella P. Fokkinga
  • A. Visser ORCID logo
  • Klamer Schutte
  • Friso G. Heslinga
Publication date 2025
Host editors
  • H.J. Kuijf
  • R. Prabhu
  • Y. Yitzhaky
Book title Artificial Intelligence for Security and Defence Applications III
Book subtitle 16-18 September 2025, Madrid, Spain
ISBN
  • 9781510692978
ISBN (electronic)
  • 9781510692985
Series Proceedings of the SPIE
Event Artificial Intelligence for Security and Defence Applications III
Article number 136790V
Number of pages 14
Publisher Bellingham, Washington: SPIE
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Automated vehicle detection in video footage captured by Unmanned Aerial Vehicles (UAVs) is a critical capability in security and defense domains, especially for environments where communication is jammed. Development of deep learning-based object detectors for this purpose typically requires large-scale datasets, which can be hard to obtain due to limited access to relevant environments. To address this challenge, synthetic data has been proposed as a supplementary source of training data, introducing additional variations in the appearance and positioning of objects. One promising strategy for generating synthetic data is inpainting, where objects of interest are seamlessly integrated into various backgrounds. However, traditional inpainting techniques lack spatial and contextual awareness, limiting their effectiveness for data augmentation. Recent advancements in generative AI, specifically diffusion models, have demonstrated improvements in object harmonization and spatial control for object inpainting, enabling realistic foreground-background matching with a high level of diversity. In this work, we explore the value of diffusion-based inpainting as a data augmentation technique. We use the inpainting model AnyDoor to enrich a small subset (1000 frames), of the VisDrone train dataset with inpainted versions of minority-class objects (buses, vans, trucks). We train YOLOX detectors on datasets with increasing amounts of synthetic vehicles (1x, 5x, 10x, and 20x) and analyze the impact on detection performance. Results show that zero-shot inpainting can substantially improve detection for buses up to an augmentation factor of 10x, with no improvements at 20x. Effects for vans and trucks are mixed and sometimes negative. Fine-tuning AnyDoor provided limited additional benefit under the tested conditions. Overall, diffusion-based inpainting shows potential as a data augmentation strategy in low-resource UAV scenarios. Future work should explore strategies to increase contextual diversity, such as adding multiple synthetic objects per image or incorporating automated quality control for synthetic samples.
Document type Conference contribution
Language English
Published at https://doi.org/10.1117/12.3070068
Other links https://spie.org/spie-sensors-imaging/presentation/Data-augmentation-for-vehicle-detection-with-diffusion-based-object-inpainting/13679-31
Downloads
SPIE_2025_SI__GenAI_Inpainting_as_Data_Augmentation (Embargo up to 2026-09-17) (Submitted manuscript)
136790V (Embargo up to 2026-04-27) (Final published version)
Permalink to this page
Back