SimPLR: A Simple and Plain Transformer for Scaling-Efficient Object Detection and Segmentation
| Authors | |
|---|---|
| Publication date | 02-2025 |
| Journal | Transactions on Machine Learning Research |
| Article number | 3114 |
| Volume | Issue number | 2025 |
| Number of pages | 17 |
| Organisations |
|
| Abstract |
The ability to detect objects in images at varying scales has played a pivotal role in the design of modern object detectors. Despite considerable progress in removing hand-crafted components and simplifying the architecture with transformers, multi-scale feature maps and pyramid designs remain a key factor for their empirical success. In this paper, we show that shifting the multiscale inductive bias into the attention mechanism can work well, resulting in a plain detector ‘SimPLR’ whose backbone and detection head are both non-hierarchical and operate on single-scale features. We find through our experiments that SimPLR with scale-aware attention is plain and simple architecture, yet competitive with multi-scale vision transformer alternatives. Compared to the multi-scale and single-scale state-of-the-art, our model scales better with bigger capacity (self-supervised) models and more pre-training data, allowing us to report a consistently better accuracy and faster runtime for object detection, instance segmentation, as well as panoptic segmentation.
|
| Document type | Article |
| Language | English |
| Published at | https://doi.org/10.48550/arXiv.2310.05920 |
| Published at | https://openreview.net/forum?id=6LO1y8ZE0F |
| Other links | https://github.com/kienduynguyen/SimPLR https://jmlr.org/tmlr/papers/index.html |
| Downloads |
3114_SimPLR_A_Simple_and_Plain
(Final published version)
|
| Permalink to this page | |
