Motion-consistent temporal fusion for UAV detection and tracking
DOI:
https://doi.org/10.17721/AIT.2025.1.03Keywords:
UAV, Object Tracking, Object Detection, Fusion Algorithm, Motion Filter, RT-DETR, ByteTrackAbstract
Background. Detecting and tracking Unmanned Aerial Vehicles (UAVs) in video streams is essential for modern air-space monitoring yet remains challenging because UAVs are small, fast and easily confused with birds or background clutter. Conventional detectors produce noisy, frame-wise boxes, while standard trackers still suffer from false positives and identity switches. The purpose of this study is to stabilize UAV detections by adding a motion-aware temporal fusion method to a mainstream detector-tracker pipeline.
Methods. A detection-tracking pipeline was constructed using an RT-DETR (Real-Time Detection Transformer) and ByteTrack baseline, extended with a lightweight, training-free motion-consistent fusion (MCF) method. The method (i) aggregates bounding-box history over five frames, (ii) averages spatial and confidence values, and (iii) penalizes tracks whose short-term velocity or angular change exceeds empirically chosen thresholds. No appearance features or additional learning are required, so the solution runs in real time on a single GPU.
Results. Experiments on a labelled UAV-video dataset show that the proposed method increases Multiple Object Tracking Accuracy (MOTA) from 0.533 to 0.591, precision from 73 % to 84 %, and reduces identity switches from 60 to 28 (a 53 % improvement in ID stability). Recall decreases slightly from 90 % to 76 %, reflecting a deliberate trade-off: the system filters unstable or non-UAV motion to improve track consistency and suppress false positives. The evaluation was performed on more than 1,000 video sequences, ensuring robustness across diverse flight environments.
Conclusions. The motion-consistent fusion method significantly enhances both accuracy and temporal coherence while adding minor computational cost. It can be added into existing detection–tracking systems and is particularly suited for real-time UAV surveillance applications, though performance may degrade if drones execute extremely abrupt maneuvers outside the predefined motion thresholds.
Downloads
References
Aharon, N., Orfaig, R., & Bobrovsky, B.-Z. (2022). BoT-SORT: Robust associations multi-pedestrian tracking. arXiv. https://doi.org/10.48550/arXiv.2206.14651
Do, N.-T., Nguyen, N. N.-Y., Nguyen, D.-P., & Do, T.-H. (2024). Ramots: A real-time system for aerial multi-object tracking based on deep learning and big data technology. In 2024 16th International Conference on Knowledge and System Engineering (KSE) (pp. 1–6). VNU University of Engineering and Technology. https://doi.org/10.1109/KSE63888.2024.11063545
Fu, C., Lei, X., Zuo, H., Yao, L., Zheng, G., & Pan, J. (2024). Progressive representation learning for real-time UAV tracking. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 5072–5079). School of Electrical and Electronic Engineering, Nanyang Technological University. https://doi.org/10.1109/IROS58592.2024.10803050
Jiang, N., Wang, K., Peng, X., Yu, X., Wang, Q., Xing, J., Li, G., Zhao, J., Guo, G., & Han, Z. (2021). Anti-UAV: A large multi-modal benchmark for UAV tracking. arXiv. https://doi.org/10.48550/arXiv.2101.08466
Reis, D., Kupec, J., Hong, J., & Daoudi, A. (2023). Real-time flying object detection with yolov8. arXiv. https://doi.org/10.48550/arXiv.2305.09972
Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., & Ding, G. (2024). Yolov10: Real-time end-to-end object detection. Advances in Neural Information Processing Systems, 37, 107984–108011. https://proceedings.neurips.cc/paper_files/paper/2024/hash/c34ddd05eb089991f06f3c5dc36836e0-Abstract-Conference.html
Wang, S., Xia, C., Lv, F., & Shi, Y. (2025). Rt-detrv3: Real-time end-to-end object detection with hierarchical dense positive supervision. In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 1628–1636). Johns Hopkins University. https://doi.org/10.1109/WACV61041.2025.00166
Yu, Q., Ma, Y., He, J., Yang, D., & Zhang, T. (2023). A unified transformer based tracker for anti-UAV tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (pp. 3036–3046). IEEE Computer Society; Computer Vision Foundation. https://openaccess.thecvf.com/content/CVPR2023W/Anti-UAV/html/Yu_A_Unified_Transformer_Based_Tracker_for_Anti-UAV_Tracking_CVPRW_2023_paper.html
Zhang, P., Zhao, J., Wang, D., Lu, H., & Ruan, X. (2022). Visible-thermal UAV tracking: A large-scale benchmark and new baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 8886–8895). IEEE Computer Society; Computer Vision Foundation. https://openaccess.thecvf.com/content/CVPR2022/html/Zhang_Visible-Thermal_UAV_Tracking_A_Large-Scale_Benchmark_and_New_Baseline_CVPR_2022_paper.html
Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., & Wang, X. (2022). Bytetrack: Multi-object tracking by associating every detection box. In S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, & T. Hassner (Eds.), Computer Vision – ECCV 2022 (pp. 1–21). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-20047-2_1
Zhao, J., Zhang, J., Li, D., & Wang, D. (2022). Vision-based anti-uav detection and tracking. IEEE Transactions on Intelligent Transportation Systems, 23(12), 25323–25334. https://doi.org/10.1109/TITS.2022.3177627
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright (c) 2025 Advanced Information Technology
This work is licensed under a Creative Commons Attribution 4.0 International License