MMFW-UAV dataset: multi-sensor and multi-view fixed-wing UAV dataset for air-to-air vision tasks

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Scientific Data volume 12, Article number: 185 (2025 ) Cite this article airplane drones

We present an air-to-air multi-sensor and multi-view fixed-wing UAV dataset, MMFW-UAV, in this work. MMFW-UAV contains a total of 147,417 fixed-wing UAVs images captured by multiple types of sensors (zoom, wide-angle, and thermal imaging sensors), displaying the flight status of fixed-wing UAVs of different sizes, appearances, structures, and stabilized flight velocities from multiple aerial perspectives (top-down, horizontal, and bottom-up views), aiming to cover the full-range of perspectives with multi-modal image data. Quality control processes of semi-automatic annotation, manual check, and secondary refinement are performed on each image. To the best of our knowledge, MMFW-UAV is the first one-to-one multi-modal image dataset for fixed-wing UAVs with high-quality annotations. Several mainstream deep learning-based object detection architectures are evaluated on MMFW-UAV and the experimental results demonstrate that MMFW-UAV can be utilized for fixed-wing UAV identification, detection, and monitoring. We believe that MMFW-UAV will contribute to various fixed-wing UAVs-based research and applications.

Airspace safety has become an increasingly critical issue in modern society, not only for social stability but also for national security and economic development1. After several upgrades and improvements, unmanned aerial vehicle (UAV) technology has demonstrated great value in industrial, commercial and civil applications2,3,4. However, with the rapid increase in the number of UAVs, various airspace safety problems have also arisen5,6,7,8,9. Although management practices such as no-fly zones, flight altitude restriction, and visual line of sight rules have mitigated these issues, the supervision of unauthorized, unresponsive, or uncommunicative UAVs is still faced with many challenges10,11.

Vision-based UAV detection and tracking is an efficient, convenient, and economical solution to the above challenges12,13. Meanwhile, the booming development of deep learning has also facilitated the innovations and advancements of computer vision technologies. With outstanding performance in various vision tasks, high-performance architectures such as Convolutional Neural Networks (CNNs) and vision Transformer have become the common paradigm for such requirements14,15,16. However, as a data-driven approach, the quantity, quality, and feature richness of image data all inevitably affect the process of model training and deployment applications. In addition, while there are already some public image datasets for quadrotor UAVs, such as USC-GRAD-STDdb17, DroneRF18, Aircraft Context Dataset19, DUT Anti-UAV20, Anti-UAV21, and others22,23,24, the number of open-source image/video dataset for fixed-wing UAVs is still relatively scarce.

In current research and engineering projects, fixed-wing UAV image datasets are mainly grouped into three categories: ground-to-air, air-to-air, and random viewpoints, depending on the deployment location of the camera sensor, as shown in Fig. 1. These datasets are mostly collected from real experiments, Internet or simulation system synthesis, and mainly serve different kinds of vision tasks such as detection, tracking, and pose estimation. Specifically, while most of the datasets can be utilized for fixed-wing UAV detection, tasks such as tracking and pose estimation mostly require sequential images, i.e., consecutive images of the flying fixed-wing UAV over a period of time. In addition, the basic information of these public datasets is also summarized, as shown in Table 1.

Open access fixed-wing UAV datasets visualization.

Through the analysis of data sources, capture views, dataset sizes and image resolutions, it is not hard to conclude that there are still many challenges in conducting and implementing vision technologies in this field, as shown in Fig. 2:

Scarcity of real-world air-to-air image data: Air-to-air data refers to images or videos of operating fixed-wing UAVs captured from aerial perspectives. However, most current available data are derived from the Internet or synthesized by simulation systems; the sources of the former are difficult to trace, while the latter needs to take into account the real-world operational scenarios of fixed-wing UAVs. Therefore, vision models trained on such data are more likely to cause stability and robustness problems such as missed detection, false detection, and failure recognition.

Limited or unstructured image capture views: Image acquisition views are one of the most important factors for subsequent research, and images with rich view perspectives can better characterize the mechanical structures and operational status of fixed-wing UAVs. However, most existing datasets are collected from single views or from unstructured viewpoints, which cannot cover multiple perspectives of the same fixed-wing UAV. As a result, such limitations make it difficult to explore a broader range of vision tasks, restricting the diversity of fixed-wing UAV vision research.

Lack of multi-modal image data: Different modal data are suitable for different task scenarios, such as infrared images are more suitable for low-light or nighttime identification and detection. However, limited by imaging technology and data acquisition devices, current public datasets are mostly composed of visible light images and lack multi-modal data, especially the one-to-one corresponding image datasets. As a result, fixed-wing UAV monitoring in low-light environments still requires the assistance of radar systems, sonic detection, or radio spectrum monitoring, which unavoidably increases the task complexity and equipment cost.

Challenges in current fixed-wing UAV image dataset.

To overcome the challenges above, we present a new air-to-air multi-sensor and multi-view fixed-wing UAV dataset, MMFW-UAV. To the best of our knowledge, MMFW-UAV is the first one-to-one multi-modal image dataset for fixed-wing UAVs with high-quality annotations. In addition, the acquisition process, organization standards, annotation procedures and application validation experiments of MMFW-UAV will be detailed subsequently.

The MMFW-UAV dataset was acquired through a self-developed air-to-air data collection platform and passed through several key processes, such as data extraction, data cleansing, data labeling, and quality control, all of which will be detailed in this section.

The air-to-air data collection platform consists of an image capture UAV (DJI M30T) and 12 sorties of fixed-wing UAVs with different sizes, structures and paint coatings. As shown in Table 2, the DJI M30T is equipped with three different sensor devices:

A 48 megapixel (MP) optical RGB zoom sensor with 1/2″ Complementary Metal-Oxide-Semiconductor (CMOS) is equipped with a 21-75 mm zoom lens, which is capable of capturing and storing visible image data at a resolution of 3840 × 2160 pixels with 30 frames-per-second (FPS).

A 12 MP optical RGB wide-angle sensor with 1/2″ CMOS is equipped with a 4.5 mm prime lens, thereby providing a diagonal field of view (DFOV) of 84°, which is capable of capturing and storing visible image data at a resolution of 3840 × 2160 pixels with 30 FPS.

An uncooled vanadium oxide (VOx) thermal imaging sensor is equipped with a 9.1 mm prime lens (the DFOV is 61°), which is capable of capturing and storing infrared image data at a resolution of 1280 × 1024 pixels with 30 FPS in super-resolution mode. In addition, this thermal sensor measures temperature in either spot or region mode with a range of −20° to +150°.

The three sensors are integrated into a data acquisition module controlled simultaneously by a three-axis camera gimbal with a controllable rotation pan and tilt range of ±90° and −120° ~ +45°, respectively, and the angular jitter is ±0.01°. The cost of this image capture UAV is approximately $ 10,000 USD.

The appearance and basic information of each of the captured 12 fixed-wing UAVs are shown in Fig. 3 and Table 3. Specifically, different sorties of fixed-wing UAVs have obvious differences in size, casing color, paint coating, and mechanical structures. For example, there are 6 fixed-wing UAVs maintaining the raw casing appearance without second modifications (Fig. 3, left side), while the other 6 are equipped with solar panel casings to simulate the solar power system in fixed-wing UAVs (Fig. 3, right side). In addition, the collected fixed-wing UAVs range from the smallest size of 95 mm × 122 mm × 23 mm to the largest size of 126 mm × 255 mm × 58 mm, with a steady flight speed of 5~20 m/s.

Demonstration of 12 sorties of captured fixed-wing UAVs.

Based on the aforementioned data collection platform, more than 100 missions were conducted during March 2024 at $3{7}^{\circ }5{8}^{{\prime} }4{1}^{{\prime\prime} }N$ , $11{4}^{\circ }3{1}^{{\prime} }1{6}^{{\prime\prime} }E$ in Shijiazhuang, Hebei Province, China. The image capture UAVs collected data from top-down, horizontal, and bottom-up views for each fixed-wing UAV multiple times, guaranteeing the variety of viewpoints and the coverage of the target. In each mission, the image capture UAV will take off first and wait for the fixed-wing UAV to enter the field of view (FOV). The drone operator will manually control the image capture UAV to follow the fixed-wing UAV and keep it at a roughly fixed distance for data acquisition. In addition, the fixed-wing UAVs will follow a predefined flight path approximating the one shown in Fig. 4, with flight altitudes ranging from 25 m to 100 m. This flight path has been verified several times and can minimize various potential risks in the data collection process. Both the image capture UAV and the fixed-wing UAV operators in these experiments hold the Civil Remote Pilot License issued by the Civil Aviation Administration of China (CAAC) and strictly adhere to local UAV management regulations to ensure the safety of the UAV operation.

Schematic of the predefined fixed-wing UAV flight path, with background from the Google Earth topographic map on June 6, 2022 at $3{7}^{\circ }5{8}^{{\prime} }4{1}^{{\prime\prime} }N$ , $11{4}^{\circ }3{1}^{{\prime} }1{6}^{{\prime\prime} }E$ in Shijiazhuang, Hebei Province, China.

Benefitting from the temporal coherence and spatial consistency between consecutive video frames, the rapid changes during the flight of the fixed-wing UAV are effectively captured and preserved. Also, the lighting conditions and shading variations typically do not change drastically between consecutive video frames, ensuring the quality of the collected images as well as their accessibility for a wide range of vision tasks. A total of 759 minutes of raw video was acquired by the air-to-air data collection platform, i.e., about 253 minutes each of zoom, wide-angle and thermal imaging sensors, with a stabilized frame rate of about 30 FPS, resulting in a total of 1, 366, 200 raw frames. The resolutions of the visible (captured by zoom sensor and wide-angle sensor) and infrared (captured by thermal imaging sensor) video frames were 3840 × 2160 pixels and 1280 × 1024 pixels, respectively, and both had a bit depth of 8.

The data cleaning process mainly consists of two steps: redundant data removal and defective data elimination. Redundant data refers to small motion changes in the target between consecutive video frames and/or no fixed-wing UAV targets were captured. To remove these data, the MMFW-UAV dataset was manually checked several times, and a total of 169, 872 image data were finally retained. Meanwhile, the defective data in the MMFW-UAV dataset were all eliminated, such as blurred images (caused by camera shake), overexposed images (caused by view angle problems), and artifact images (caused by video compression). It should be noted that during the data cleaning process, the removal of redundant or defective images is performed simultaneously across the three types of sensor data, i.e., ensuring a one-to-one correspondence. Finally, after the data cleaning process, the MMFW-UAV contains a total of 147, 417 images and 49, 139 images for each type of sensor.

The current open-source MMFW-UAV dataset mainly serves for object detection tasks, and it adopts a three-stage annotation process of semi-automatic annotation, manual check, and secondary refinement.

In the semi-automatic annotation stage, the Label Studio toolbox25 was used for data annotation, which is an open-source data annotation tool that provides a flexible and extensible interface for multiple types of image annotation tasks, such as bounding boxes, polygons, and key points. In addition, we implemented a semi-automatic annotation by integrating the Segment Anything Model (SAM)26 into the Label Studio backend, which greatly accelerates the annotation speed and improves the annotation efficiency. Specifically, SAM is a zero-sample cue-based model that can automatically identify targets and generate masks with simple clicks or interactive selections. The official “sam_vit_b_01ec64.pth” weights were used for integration with Label Studio.

In the manual checking phase, rigorous annotation validation processes and quality control criteria are set up to ensure the image annotation quality. Firstly, the data annotators will cross-check the annotation results, and incorrect or problematic annotation files will be returned to the original annotators for correction. Second, for those images that are challenging to annotate or with truncated objects (e.g., UAVs located at the edge of the image), we also annotate them with repeated reviews and confirmations. Finally, the secondary refinement is carried out by referencing the results of the previous manual check and the following quality control criteria, aiming to improve the annotation quality via repeated iterations.

Whether there are missing objects or multiple bounding boxes in the image, such as overlapping bounding boxes.

Whether the size of the bounding boxes is reasonable, such as leaving excessive white gaps.

Whether the labeling category is correct, i.e., the accuracy and consistency of the annotation.

MMFW-UAV is publicly available at Science Data Bank27 and is designed to be open and accessible to all UAV (especially fixed-wing UAV) researchers and professionals. The technical validation data are available in28. This section introduces the repository structure, file naming, and data properties of the MMFW-UAV dataset and provides a systematic analysis of potential users, which will help them use this dataset in their scientific research and engineering projects.

The MMFW-UAV repository is built in a five-level tree structure, as shown in Fig. 5. Specifically, there is a “README.md” file, a “Tools” subdirectory, and a “MMFW-UAV-DATASET” subdirectory under the MMFW-UAV root directory, where the “README.md” contains the basic information and usage guidelines of this dataset, the “Tools” subdirectory contains some data processing tools, and the “MMFW-UAV-DATASET” consists of the multi-view and multi-sensor data of 12 sorties of fixed-wing UAVs and the folder is named according to the number of the UAVs sortie ("Fixed-wing-UAV-X”, X denotes the number of sorties). Within each folder, the multi-view data is categorized into “Top_Down”, “Horizontal”, and “Bottom_Up” subfolders, each of which contains the image data captured by the zoom, wide-angle, and thermal imaging (infrared) sensors and their corresponding annotation files, which are named according to the sensor type plus the “_Imgs” or “_Anns”.

Data structure of the MMFW-UAV repository.

The image file in the MMFW-UAV is named according to the following format: “T_W_NNNNNN”, where T denotes the capture time (“0” for morning, “1” for afternoon), W denotes the weather condition (“0” for sunny, “1” for cloudy), and NNNNNN denotes the serial number (start from “000000”) of the image. Meanwhile, the annotation files in the MMFW-UAV have the same naming format as the corresponding images, maintaining a one-to-one correspondence for each set of data.

MMFW-UAV provides users with two data formats, Pascal VOC29 and MS COCO30, which are suitable for training, validation, and applications of most of the mainstream object detection architectures. Specifically, the labeling bounding box in the XML annotation file of Pascal VOC adopts the standard format of (xmin, ymin, xmax, ymax), where (xmin, ymin) and (xmax, ymax) denote the lower-left and upper-right corner points of the bounding boxes, respectively. In contrast, the labeling bounding box in the JSON annotation file of MS COCO adopts the format of (xc, yc, w, h), where (xc, yc), w, and h denote the coordinates of the center point of the bounding box, its width, and its height, respectively. To facilitate reuse, we have also developed conversion tools for XML and JSON annotation files based on (xc, yc, w, h) = (xmin/2 + xmax/2, ymin/2 + ymax/2, xmax − xmin, ymax − ymin), aiming to make the MMFW-UAV dataset applicable to a broader range of vision tasks. In addition, the annotated category for fixed-wing UAVs is defined as “Fixed_Wing_UAV” in both Pascal VOC and MS COCO annotation files.

The capture time, weather, view, and sensor type inevitably affect the quality of the collected image data and, consequently, the richness and diversity of the image features of the target of interest. For example, the visible images will contain more fine-grained textures in sunny environments, while the infrared images can provide more distinctive feature information on cloudy days. In addition, the number of target instances and the target’s position in the image coordinate system also greatly affect feature extraction and model training. On the one hand, large instance samples will allow the vision model to extract more representative information about the appearance and state of the fixed-wing UAV. On the other hand, truncated objects at image edge positions represent the special cases where the fixed-wing UAV is about to fly out of the field of view. Therefore, the basic information in the MMFW-UAV dataset mentioned above has been statistically analyzed, as shown in Fig. 6.

Distribution statistics for the MMFW-UAV dataset: (a) Distribution of image capture time; (b) Distribution of image capture weather; (c) Distribution of image capture view; (d) Distribution of image capture sensor type; (e) Distribution of the number of fixed-wing UAV images; (f) Distribution of target positions in image coordinate system.

In addition, the image feature differences and diversity of different types of data in MMFW-UAV are also analyzed. As shown in Fig. 7, the image data captured by optical RGB zoom sensor, optical RGB wide-angle sensor, and thermal imaging sensor are denoted as general optical image data, wide-angle optical image data, and infrared image data, respectively. With the difference in spectral sensing range and field of view, their image features are obviously different. Specifically, optical image data contains rich color and texture information, while infrared image data primarily represents the heat distributions and thermal properties, which in turn provides contour information about the target. In addition, due to the broad coverage of environmental scenes, wide-angle optical image data typically provide more contextual (background) information compared to other types of images.

Demonstration of fixed-wing UAV images captured by different types of sensors.

In real applications, all three types of images are applicable for object detection, and each of them has some typical purposes. For example, fine-grained features in general optical images are suitable for pixel-level tasks like image segmentation, scene information in wide-angle optical images play an important role in scene understanding and analysis, and infrared images are mostly applied to thermal anomaly diagnosis, image feature fusion, and object detection in low-light environments (e.g., early morning and nighttime). While we only assessed the feasibility and applicability of the MMFW-UAV dataset for object detection in this report, the application performance of this open dataset for other special tasks mentioned above deserves to be explored by future users.

Object detection is one of the fundamental tasks of computer vision, which recognizes the object of interest in images, identifies the category, and determines its relative position in the image coordinate system. With this background, MMFW-UAV is designed to serve recognition and detection tasks of fixed-wing UAV from an air-to-air perspective.

A series of evaluation metrics for computer vision tasks are used to measure the performance of the mainstream object detection architectures discussed above, including the standard metrics Recall and Precision, as well as the combined metrics F1-Score and Average Precision (AP). Specifically, Recall (Equation (1)) measures the model’s capability to detect true objects, while Precision (Equation (2)) measures the accuracy of the model’s predictions. In contrast, F1-Score (Equation (3)) and AP (Equation (4)) provide a more comprehensive evaluation of model detection performance: F1-Score balances the contributions of Precision and Recall in the evaluation, and AP further demonstrates the model’s detection accuracy across different confidence thresholds.

Where TP and FN denote true objects (i.e., fixed-wing UAVs) detected and undetected by the model, respectively, FP denotes true objects incorrectly detected by the model, P and R denote the Precision and Recall, respectively.

Tables 4, 5, and 6 present the accuracy of the benchmarking models on the test sets of the zoom sensor sub-dataset and the thermal imaging sensor sub-dataset. The experimental results show that Yolo V4-Tiny, Yolo V7-Tiny, and Yolo X demonstrate outstanding detection performance across all three types of images. For example, the single-stage detection model Yolo V7-Tiny achieves an average precision (AP) of 99.98%, 99.54%, and 99.80% across the three sub-datasets, while the anchor-free model Yolo X reaches an AP of 95.68%, 99.25%, and 98.91%, respectively. Additionally, in terms of inference speed, we found that Yolo V4-Tiny achieves over 410 frames per second (FPS) on a 4090 GPU while maintaining excellent detection performance (99.98% AP on the zoom sensor sub-dataset, 99.82% AP on the wide-angle sensor sub-dataset, and 99.08% AP on the thermal imaging sensor sub-dataset), indicating a higher trade-off between accuracy and efficiency.

However, the experimental results also indicate that the detection performance of the two-stage architecture Faster-RCNN and the transformer-based architecture Detr in the wide-angle sensor sub-dataset and thermal imaging sensor sub-dataset is still unsatisfactory. Specifically, the AP of Faster-RCNN in these two sub-datasets is only 45.77% and 58.30%, while the AP of Detr is only 9.70% and 32.67%, respectively. Considering the image data types and the characteristics of the detected objects, this degradation in detection accuracy might be caused by the small target size (i.e., fewer pixels occupied) or the lack of distinct image features (i.e., weaker discriminative capacity). In particular, fixed-wing UAV targets typically occupy very few pixels in wide-angle sensor images, making it easy for models like Detr to miss them, leading to significant drops in AP. Additionally, the minimal visual differences between the background and detected targets in the thermal imaging sensor sub-dataset also make it difficult for these detection models to learn discriminative image features, thereby increasing the detection challenges.

Based on the above technical validation results and the characteristics of each type of image, we reached the following main conclusions.

The one-stage architectures Yolo V4-Tiny, Yolo V7-Tiny, and the anchor-free architecture Yolo X exhibit outstanding detection performance in the multi-sensor sub-datasets, indicating that all three types of image data in the MMFW-UAV Dataset are suitable for fixed-wing UAV object detection.

Since the test dataset consists of multi-view fixed-wing UAV images, it demonstrates that multi-view image data can be effectively utilized for most downstream air-to-air fixed-wing UAV detection tasks.

The detection performance and inference speed of some commonly used benchmarking models in the MMFW-UAV Dataset still require improvement, which may be closely related to image feature processing mechanisms and model complexity.

In general, all current mainstream object detection models can be trained, tested, and validated on the MMFW-UAV Dataset. Since the MMFW-UAV Dataset contains multi-sensor and multi-view fixed-wing UAV image data, it can better support a wider range of downstream air-to-air vision tasks.

The MMFW-UAV is publicly available at Science Data Bank27. This dataset currently provides image and annotation files for object detection tasks with both Pascal VOC and MS COCO annotation formats, enabling users to work with this dataset conveniently and efficiently. MMFW-UAV offers multi-sensor and multi-view images of fixed-wing UAVs in air-to-air scenarios, featuring a broad span of capture times and a rich variety of shooting environments. Overall, there are several important academic research and engineering applications of MMFW-UAV.

Multi-sensor and multi-view images in MMFW-UAV are mainly fit for mainstream object detection model training, deployment and application. In particular, thermal imaging sensor data is highly suited for fixed-wing UAV detection missions in nighttime or poor lighting conditions. Therefore, models trained with MMFW-UAV will be applicable to full-time and all-weather UAV search, surveillance, and reconnaissance applications.

The image data of each fixed-wing UAV in MMFW-UAV are temporally consistent, i.e., the visible and infrared images are captured at the same moment and from the same viewpoint. Therefore, such multi-modal data are well-suited for image fusion research at the pixel level (i.e., each pixel of an image) or feature level (i.e., extracted features in the image). These studies further improve the performance and adaptability of vision models in complex environments.

Although the MMFW-UAV dataset currently only provides annotation files for object detection, it can also be utilized for more kinds of vision tasks, such as image generation, object tracking, and image super-resolution. In addition, users can also apply MMFW-UAV to vision tasks like semantic segmentation, pose estimation, and anomaly detection by secondary annotation of object contours, key points, and flight states. The expansions above greatly enhance the application value of MMFW-UAV in research and engineering fields.

To reproduce the technical validation experiments, the selected experiment data are available in Science Data Bank repository28 and the source code are available in Detection_Codes folder. In addition, the supporting data processing code is available in the Tools folder, mainly providing the following functions:• Tools/voc2coco.py is used for conversion between XML to JSON format annotation files.• Tools/visualization.py is used for visualization of images and bounding boxes in MMFW-UAV.

Roychoudhury, I. et al. Real-time monitoring and prediction of airspace safety. NASA Tech. Rep. https://ntrs.nasa.gov/api/citations/20180006637/downloads/20180006637.pdf (2018).

Menouar, H. et al. UAV-Enabled Intelligent Transportation Systems for the Smart City: Applications and Challenges. IEEE Commun. Mag. 55(3), 22–28, https://doi.org/10.1109/MCOM.2017.1600238CM (2017).

Rahnemoonfar, M., Chowdhury, T. & Murphy, R. RescueNet: a high resolution UAV semantic segmentation dataset for natural disaster damage assessment. Sci. Data 10, 913, https://doi.org/10.1038/s41597-023-02799-4 (2023).

Article PubMed PubMed Central MATH Google Scholar

Xiang, T. Z., Xia, G. S. & Zhang, L. Mini-Unmanned Aerial Vehicle-Based Remote Sensing: Techniques, applications, and prospects. IEEE Geosci. Remote Sens. Mag. 7(3), 29–63, https://doi.org/10.1109/MGRS.2019.2918840 (2019).

Weibel, R. E. & Hansman, R. J. Safety considerations for operation of unmanned aerial vehicles in the national airspace system. ICAT2005-01, http://hdl.handle.net/1721.1/34912 (2006).

Loh, R., Bian, Y. & Roe, T. UAVs in civil airspace: Safety requirements. IEEE Aerosp. Electron. Syst. Mag. 24(1), 5–17, https://doi.org/10.1109/MAES.2009.4772749 (2009).

Dalamagkidis, K., Valavanis, K. P. & Piegl, L. A. On unmanned aircraft systems issues, challenges and operational restrictions preventing integration into the National Airspace System. Prog. Aerosp. Sci. 44, 503–519, https://doi.org/10.1016/j.paerosci.2008.08.001 (2008).

Bauranov, A. & Rakas, J. Designing airspace for urban air mobility: A review of concepts and approaches. Prog. Aerosp. Sci. 125, 100726, https://doi.org/10.1016/j.paerosci.2021.100726 (2021).

DeGarmo, M. & Nelson, G. Prospective unmanned aerial vehicle operations in the future national airspace system. AIAA 4th Aviation Technol. Integration Oper. Forum 6243, https://doi.org/10.2514/6.2004-6243 (2004).

He, D. et al. A Friendly and Low-Cost Technique for Capturing Non-Cooperative Civilian Unmanned Aerial Vehicles. IEEE Netw. 33(2), 146–151, https://doi.org/10.1109/MNET.2018.1800065 (2019).

Roychoudhury, I. et al. Predicting real-time safety of the national airspace system. AIAA Infotech@ Aerospace 2131, https://doi.org/10.2514/6.2016-2131 (2016).

Wu, X., Li, W., Hong, D., Tao, R. & Du, Q. Deep Learning for Unmanned Aerial Vehicle-Based Object Detection and Tracking: A survey. IEEE Geosci. Remote Sens. Mag. 10(1), 91–124, https://doi.org/10.1109/MGRS.2021.3115137 (2022).

Cai, Y. et al. Guided Attention Network for Object Detection and Counting on Drones. Proc. 28th ACM Int. Conf. Multimedia 709–717, https://doi.org/10.1145/3394171.3413816 (2020).

Zhao, Z. Q., Zheng, P., Xu, S. T. & Wu, X. Object Detection With Deep Learning: A Review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232, https://doi.org/10.1109/TNNLS.2018.2876865 (2019).

Article PubMed MATH Google Scholar

Zou, Z., Chen, K., Shi, Z., Guo, Y. & Ye, J. Object Detection in 20 Years: A Survey. Proc. IEEE 111, 257–276, https://doi.org/10.1109/JPROC.2023.3238524 (2023).

Han, K. et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110, https://doi.org/10.1109/TPAMI.2022.3152247 (2023).

Article PubMed MATH Google Scholar

Bosquet, B., Mucientes, M. & Brea, V. STDnet: A ConvNet for Small Target Detection. British Mach. Vis. Conf. (BMVC) 253, http://bmvc2018.org/contents/papers/0897.pdf (2018).

Al-Sa’d, M. F., Al-Ali, A., Mohamed, A., Khattab, T. & Erbad, A. RF-based drone detection and identification using deep learning approaches: An initiative towards a large open source drone database. Future Gener. Comput. Syst. 100, 86–97, https://doi.org/10.1016/j.future.2019.05.007 (2019).

Steininger, D., Widhalm, V., Simon, J., Kriegler, A. & Sulzbacher, C. The Aircraft Context Dataset: Understanding and Optimizing Data Variability in Aerial Domains. IEEE/CVF Int. Conf. Comput. Vis. Workshops (ICCVW) 3816-3825, https://doi.org/10.1109/ICCVW54120.2021.00426 (2021).

Zhao, J., Zhang, J., Li, D. & Wang, D. Vision-Based Anti-UAV Detection and Tracking. IEEE Trans. Intell. Transp. Syst. 23(12), 25323–25334, https://doi.org/10.1109/TITS.2022.3177627 (2022).

Jiang, N. et al. Anti-UAV: A Large-Scale Benchmark for Vision-Based UAV Tracking. IEEE Trans. Multimedia 25, 486–500, https://doi.org/10.1109/TMM.2021.3128047 (2023).

Schumann, A., Sommer, L., Klatte, J., Schuchert, T. & Beyerer, J. Deep cross-domain flying object classification for robust UAV detection. 14th IEEE Int. Conf. Adv. Video Signal Based Surveillance (AVSS) 1-6, https://doi.org/10.1109/AVSS.2017.8078558 (2017).

Munir, A., Siddiqui, A. J. & Anwar, S. Investigation of UAV Detection in Images With Complex Backgrounds and Rainy Artifacts. Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV) Workshops 221-230, https://doi.org/10.1109/WACVW60836.2024.00031 (2024).

Rozantsev, A., Sinha, S. N., Dey, D. & Fua, P. Flight Dynamics-Based Recovery of a UAV Trajectory Using Ground Cameras. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) 2482-2491, https://doi.org/10.1109/CVPR.2017.266 (2017).

Tkachenko, M., Malyuk, M., Holmanyuk, A. & Liubimov, N. Label Studio: Data Labeling Software. AVAILBLE: https://github.com/heartexlabs/label-studio (2020-2022).

Kirillov, A. et al. Segment anything. Int. Conf. Comput. Vis. (ICCV) 3992-4003, https://doi.org/10.1109/ICCV51070.2023.00371 (2023).

Liu, Y. et al. MMFW-UAV Dataset: Multi-Sensor and Multi-View Fixed-Wing UAV Dataset For Air-to-Air Vision Tasks. Sci. Data Bank V4, https://doi.org/10.57760/sciencedb.07839 (2024).

Liu, Y. et al. Technical Validation Experiments for ’MMFW-UAV Dataset: Multi-Sensor and Multi-View Fixed-Wing UAV Dataset For Air-to-Air Vision Tasks’. Sci. Data Bank V1, https://doi.org/10.57760/sciencedb.07878 (2024).

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J. & Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 303–338, https://doi.org/10.1007/s11263-009-0275-4 (2010).

Lin, T. Y. et al. Microsoft COCO: Common Objects in Context. Comput. Vis. ECCV 740-755, https://doi.org/10.1007/978-3-319-10602-1_48 (2014).

Bochkovskiy, A., Wang, C.-Y. & Liao, H.-Y. M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934, https://doi.org/10.48550/arXiv.2004.10934 (2020).

Wang, C-Y., Bochkovskiy, A. & Liao, H-Y.M. Scaled-YOLOv4: Scaling Cross Stage Partial Network. 2021 IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 13024-13033, https://doi.org/10.1109/CVPR46437.2021.01283 (2021).

Redmon, J. & Farhadi, A. YOLOv3: An Incremental Improvement. arXiv preprint arXiv:1804.02767, https://doi.org/10.48550/arXiv.1804.02767 (2018).

Wang, C.-Y., Bochkovskiy, A. & Liao, H.-Y. M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. 2023 IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR) 7464-7475, https://doi.org/10.1109/CVPR52729.2023.00721 (2023).

Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149, https://doi.org/10.1109/TPAMI.2016.2577031 (2017).

Article PubMed MATH Google Scholar

He, K., Zhang, X., Ren, S. & Sun, J. Deep Residual Learning for Image Recognition. IEEE Conf. Comput. Vis. Pattern Recognit. 770–778, https://doi.org/10.1109/CVPR.2016.90 (2016).

Ge, Z., Liu, S., Wang, F., Li, Z. & Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv preprint arXiv:2107.08430, https://doi.org/10.48550/arXiv.2107.08430 (2021).

Carion, N. et al. End-to-End Object Detection with Transformers. Comput. Vis. ECCV 213-229, https://doi.org/10.1007/978-3-030-58452-8_13 (2020).

Çintaş, E., Özyer, B. & Şimşek, E. Vision-Based Moving UAV Tracking by Another UAV on Low-Cost Hardware and a New Ground Control Station. IEEE Access 8, 194601–194611, https://doi.org/10.1109/ACCESS.2020.3033481 (2020).

Zheng, X. & Hu, T. Air2Land: A deep learning dataset for unmanned aerial vehicle autolanding from air to land. IET Cyber-Syst. Robot. 4(2), 77–85, https://doi.org/10.1049/csy2.12045 (2022).

Wang, Y., Huang, Z., Laganière, R., Zhang, H. & Ding, L. A UAV to UAV tracking benchmark. Knowl.-Based Syst. 261, 110197, https://doi.org/10.1016/j.knosys.2022.110197 (2023).

This work was supported in part by the National Natural Science Foundation of China National Science Fund for Distinguished Young Scholars 62025301, the National Natural Science Foundation of China Basic Science Center Program under Grant 62088101, the China Scholarship Count under Grants 202406030027, the Natural Science Foundation of Hebei Province under Grant F2024208002, the Science and Technology Project of Hebei Education Department under Grant QN2024205, and the BIT Research and Innovation Promoting Project under Grant 2024YCXZ023.

National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology, Beijing, 100081, China

Yang Liu, Zhihao Sun, Lele Zhang, Wei Dong, Chen Chen, Maobin Lu, Hailing Fu & Fang Deng

School of Electrical Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China

Chongqing Innovation Center, Beijing Institute of Technology, Chongqing, 401120, China

Chen Chen, Maobin Lu & Fang Deng

You can also search for this author in PubMed Google Scholar

Yang Liu conceived the work, collected the data, and wrote the paper. Zhihao Sun and Lele Xi designed the experiment platform and conducted the experiments. Lele Zhang and Wei Dong annotated the images and analyzed the results. Chen Chen and Maobin Lu helped design the structure of the research and supervised the unmanned aerial vehicle (UAV) control. Hailing Fu guided the selection of sensor hardware. Fang Deng edited the paper, supervised the work, and provided the research funding. All authors reviewed the manuscript.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Liu, Y., Sun, Z., Xi, L. et al. MMFW-UAV dataset: multi-sensor and multi-view fixed-wing UAV dataset for air-to-air vision tasks. Sci Data 12, 185 (2025). https://doi.org/10.1038/s41597-025-04482-2

DOI: https://doi.org/10.1038/s41597-025-04482-2

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Scientific Data (Sci Data) ISSN 2052-4463 (online)

nerf drone Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

MMFW-UAV dataset: multi-sensor and multi-view fixed-wing UAV dataset for air-to-air vision tasks | Scientific Data