Attention to the Branches: A Comparative Analysis of FairMOT with Transformers on Fish Dataset
| Authors |
|
|---|---|
| Publication date | 2025 |
| Host editors |
|
| Book title | Multi-disciplinary Trends in Artificial Intelligence |
| Book subtitle | 17th International Conference, MIWAI 2024, Pattaya, Thailand, November 11–15, 2024 : proceedings |
| ISBN |
|
| ISBN (electronic) |
|
| Series | Lecture Notes in Computer Science |
| Volume | Issue number | I |
| Pages (from-to) | 64–76 |
| Publisher | Singapore: Springer |
| Organisations |
|
| Abstract |
The application of Transformers in computer vision has gained momentum, even to the extent of revising the Vision Transformer (ViT) theory of abandoning CNNs, or to be exact CNN backbones for Transformer-based backbones. This research attempts to evaluate the efficiency backbones when incorporated into a re-ID-based model such as FairMOT which is traditionally trained using a CNN. We investigate how Transformer-based feature extraction impacts tracking performance, particularly for small and occluded objects such as fish in video data. Our findings indicate that while ViT backbones offer promising features, they do not yet surpass CNN-based methods in terms of tracking accuracy in regards to the FairMOT approach. This study highlights the need for further optimization of Transformer architectures.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1007/978-981-96-0692-4_6 |
| Downloads |
978-981-96-0692-4_6
(Final published version)
|
| Permalink to this page | |