Drone vs Bird Competition Announcement
As OBSS AI we are were happy to inform you that we are 1st place on WOSDETC Drone vs Bird Challenge 2021 which is the most prestigious competition on detecting drones.
What is Drone vs Bird Challenge
The Drone vs. Bird Detection Challenge was launched in 2017, during the first edition of the International Workshop on Small-Drone Surveillance, Detection and Counteraction Techniques (WOSDETC) as part of the 14th edition of the IEEE International Conference on Advanced Video and Signal based Surveillance (AVSS). This challenge aims to address the technical issues of discriminating between drones and birds. Given their characteristics, in fact, drones can be easily confused with birds, particularly at long distances, which makes the surveillance task even more challenging. The use of video analytics can solve the issue, but effective algorithms are needed that can operate under unfavorable conditions, namely weak contrast, long-range, low visibility, etc.
In 2021, Drone vs Bird Detection Challenge is held 3rd time with more comprehensive testing set. For more information on the challange you can visit the challange web site: https://wosdetc2021.wordpress.com/drone-vs-bird-detection-challenge/
For the Drone vs Bird Detection Challenge 2021, 77 different video sequences have been made available as training data. These video sequences originate from the previous installment of the challenge and were collected using MPEG4-coded static cameras by the Fraunhofer IOSB research institute. The video sequences are recorded with both static cameras and moving cameras and the resolution varies between 720×576 and 3840×2160 pixels. In
total, 8 different types of drones exist in the dataset, 3 with fixed wings and 5 rotary ones. For each video, a separate annotation file is provided, which contains the frame number and the bounding box (expressed as [topx topy width height]) for the frames in which drones enter the scenes.
Our Solution
Our method approaches the drone detection problem by fine-tuning a YOLOv5 model with real and synthetically generated data, then using a Kalman-based object tracker to boost detection confidence. Our results indicate that augmenting the real data with an optimal subset of synthetic data can increase the performance. Moreover, temporal information gathered by object tracking methods can increase performance further.
As the base object detector we utilized YOLOv5 object detection method. YOLOv5 is selected because since it is the fastest and accurate enough anchor-based object detector as of 2021. In addition, it supports automated anchor optimization, which is proven to improve performance and multi-level feature stages that can handle objects at different scales.
Object tracking algorithms are used to provide continuity of object detections over time. While tracking the objects is not directly required, it provides temporal information about the objects in the video that can further improve performance. A simple Kalman-based tracking method is applied over the predictions of the object detection network. The tracking parameters are optimized for drone tracking considering moving camera scenarios. Furthermore, assigning the maximum confidence value in a track to all independent frame predictions, we apply a track-based confidence boosting which further contributes to the overall AP score.
The use of synthetic data in deep learning appears helpful in scenarios where data is scarce or unavailable. Although synthetic data alone cannot show the same performance as real data, it has been seen that it increases performance when used alongside real data. For the drone tracking problem, a method for creating labeled, randomized compositions by positioning 3D drone objects in front of 2D backgrounds were designed. This method was chosen because it is challenging to create a 3D randomized environment for the drone detection problem, and a location-independent object such as a drone can be used appropriately with 2D backgrounds. To generate the dataset, 3D drone models were rendered with various conditions such as position, rotation and lighting, and post-process effects on the randomized background images.
Refer to our published papers for more technical detail: