Enhanced aerial vehicle system techniques for detection and tracking in fog, sandstorm, and snow conditions

Unmanned aerial vehicles are rapidly being utilized in surveillance and traffic monitoring because of their great mobility and capacity to cover regions at various elevations and positions. It is a challenging task to detect vehicles due to their various shapes, textures, and colors. One of the most difficult challenges is correctly detecting and counting aerial view vehicles in real time for traffic monitoring objectives using aerial images and videos. In this research, strategies are presented for improving the detection ability of self-driving vehicles in tough conditions, also for traffic monitoring, vehicle surveillance. We make classification, tracking trajectories, and movement calculation where fog, sandstorm (dust), and snow conditions are challenging. Initially, image enhancement methods are implemented to improve unclear images of roads. The improved images are then subjected to an object detection and classification algorithm to detect vehicles. Finally, new methods were evaluated (Corrected Optical flow/Corrected Kalman filter) to get the least error of trajectories. Also features like vehicle count, type, tracking trajectories by (Optical flow, Kalman Filter, Euclidean Distance) and relative movement calculation are extracted from the coordinates of the observed objects. These techniques aim to improve vehicle detection, tracking, and movement over aerial views of roads especially in bad weather. As a result, for aerial view vehicles in bad weather, our proposed method has an error of less than 5 pixels from the actual value and give the best results. This improves detection and tracking performance for aerial view vehicles in bad weather conditions.

in computer vision tasks, has resulted in an impressive improvement in classification and object recognition accuracy. In this regard, Liu et al. [7] solved the vehicle detection problem from Google Earth images using a technique based on a hybrid deep convolutional neural network (HDNN) with a sliding window search. To detect and count vehicles in high-resolution UAV images of urban areas, Bazi et al. [8] employed a pre-trained CNN in conjunction with a linear support vector machine (SVM) classifier. Several CNN algorithms and architectures, including Yolo and its variations [9][10][11], R-CNN and its variations [12][13][14], have been proposed. Darrell et al. [12] proposed R-CNN, a region-based CNN that combines 1 3 the region-proposals system with CNN. The same authors enhanced their technique by overcoming RCNN's limitation of constructing a convolutional feature map with the image rather than the regions as input. The convolutional feature map is then used to identify the region of proposals. Wang [15] shows a vehicle detection method using the SSD model after HSV transformation. The authors of [6] discussed the difficulties associated with using aerial images for car detection, notably the issue of small objects and complicated backgrounds. To overcome the problem, they suggested a Multi-task Cost-sensitive-Convolutional Neural Network based on Faster R-CNN. Other researchers tackled the problem by using deep learning techniques on aerial images in a number of scenarios, such as object recognition and classification [16,17], semantic segmentation [18][19][20], and generative adversarial networks (GANs) [21]. Hoang [22] used Yolov5 in vehicle detection for speed identification in smart cities. In [23], Rhizma et al. explored the topic of automated car counting in CCTV photos obtained from four datasets of varying resolutions. They investigated both classic image processing algorithms and deep learning neural networks using Yolov2 [11] and FCRN [24]. Their findings suggest that when applied to higher resolution datasets, deep learning algorithms produce significantly superior detection outcomes. There is a shortage in the amount of work done in aerial view vehicle detection and tracking in bad weather.
Humayun [25] detecting vehicles in a scene in multiple weather scenarios including haze, dust and sandstorms, snowy and rainy weather both in day and nighttime. Using YOLOv4 and Spatial Pyramid Pooling Network. The proposed architecture uses CSPDarknet53 as baseline architecture modified with spatial pyramid pooling (SPP-NET) layer and reduced Batch Normalization layers. They augment the Dataset with different techniques including Hue, Saturation, Exposure, Brightness, Darkness, Blur and Noise. This not only increases the size of the dataset but also make the detection more challenging. The model obtained mean average precision of 81% during training and detected smallest vehicle present in the image.
Punagin [26] they made vehicle detection on unstructured roads based on transfer learning. Using BDD with YOLOv2 network. The data set have different weather conditions such as snowy, rainy, foggy in distinct scene types as highway, residential, and city streets. The model gives mAP of 72.4% on Berkeley Deep Drive Dataset (BDD) and when the same model is tested in an unstructured environment using Indian Driving Dataset (IDD), it gives mAP of 56.76%.
Hnewa [27] 2020 Object Detection Under Rainy Conditions for Autonomous Vehicles using Faster R-CNN and YOLO-V3 with generative adversarial network that is called "DeRaindrop" to remove raindrops from images. Shortcomings The lack of data, and especially annotated data, that captures the truly diverse nature of rainy conditions for moving vehicles is arguably the most critical and fundamental issue in this area. The mAP is 52.62% For Yolo and 44.45% for Faster R-CNN.
Ajinkya [28] presented a car detection using Yolo algorithm in normal conditions to detect five classes (cars, trucks, pedestrians, traffic signs, traffic lights). The neural network was trained for 120 epochs which gives mAP of 46.6%. The accuracy of this algorithm can be increased with training on the bigger and more diverse datasets that cover different weather and lighting conditions. Enhanced aerial vehicle system techniques for detection and… Uzar [29] presented the performance analysis of YOLO versions for automatic vehicle detection from UAV images process was performed for three classes (car, bus, and minibus). The YOLOv5m provides the mAP0.5 of 84% as the highest value. The drawback there were incorrect or missing vehicle detection in the images obtained when trees covered the vehicles the errors due to shadows, also vehicles were not detected correctly when dark colored vehicles such as black and gray glow under the influence of sunlight and have the same pixel gray value as the road object.
In this study, we look to improve the detection ability of self-driving vehicles in tough conditions, traffic monitoring, vehicle surveillance, and tracking, a drone is used to recognize and count vehicles from aerial video streams. A new method for detecting small objects in UAV perspective is presented based on the enhanced Yolov5 method. Our contribution is to combine color correction techniques with deep learning for improving object detection in bad weather, which are then applied to improve trajectory tracking precision and relative movement calculation. In several ways, the current work improves on the literature as follows: • We used two Retinex image/video enhancement methods with two different input sizes of Yolov5 for vehicle detection in tough weather conditions like fog, sandstorm (dust), and snow using several hyperparameter settings and compared the results in detail by displaying the several testing weather conditions for video and images, comparing the tradeoffs between Yolov5 models and Retinex images. •  to get the least error of trajectories. • We calculated the movement of each vehicle on the road in both bad and enhanced weather conditions. • We employed three datasets with varied features for training and testing, whereas earlier research stated above evaluated their approach on a single proprietary dataset. We show that annotation mistakes in the dataset have a large influence on detection performance.
The following is how this paper is organized. Section 2 defines our system's methodology. Section 3 describes the experiments that were carried out and the results that were obtained. Finally, in Sect. 4, is the conclusion of the paper.

Methodology
This section goes over our approach (MSRCR-Yolov5) in detail. Our method combines the Multi-Scale Retinex (MSR) color enhancement methodology [30] with the Yolov5 algorithm to improve vehicle recognition. Vehicles are tracked once they have been detected to extract features such as vehicle trajectories. Finally, vehicle trajectories are aggregated with Yolov5 to attain the different vehicles movements.

Dataset
Three datasets are used with augmentation to improve aerial view vehicle detection. The proposed method is for improving foggy, sandstorm (dust), and snowy road images.

Data collection
The dataset was collected from different resources with varied features. The datasets contain sequences of images for each scene. Due to the limited resources of the Colab training platform, we took several images from each scene and annotated them carefully. We then increased the images by augmentation, as shown in Table 1.
Aerial view vehicle videos in bad weather are rarely found to be tracked, so we collect images of aerial vehicles' views from many datasets and apply Yolov5 (you only look once), which can be trained on images or videos and tested on them. We test on images and videos of bad weather.
The datasets are: (1) images from the UAV-benchmark-S dataset [31], where the dataset is captured by UAVs in a variety of complex scenarios; (2) the PSU aerialcar-dataset: This dataset was generated from images captured by a UAV flying above the Prince Sultan University campus [32]. (3) The Stanford dataset: This is a dataset of aerial images of a university campus [33]. The collected training data contains different flying altitudes referring to the flying heights of UAVs (high, low, medium) with different camera views (side, front, bird-view), see Fig. 1. The approximate height of the UAV flight is ranged from less than 30 m to more than 70 m.
After collecting all the images, they were annotated. The annotation format is (Xcenter, Ycenter, classId, height, width). After that, resize the images to 640 × 640.

Data augmentation
The number of images and objects is insufficient to build a robust model. As a result, one of the most important methods for enhancing performance is data augmentation, which aims to improve variance in the training data. We use augmentation techniques like flip, rotate, blur and mosaic. The dataset has a total of 2418 images with 70% training (1692) and 30% validation (726) and has been enhanced to 5709 total In the augmentation method, we did not add any brightness, saturation, or hue, so the model was not affected by other color changes than Retinex enhancement.

Testing data
Testing data (videos/images) was collected from the UAVDT [31] and also from searching Google images for images that contain special weather conditions that affect the appearance and representation of vehicles. It consists of fog, sandstorm (dust), snow, and a night scene with dim streetlamp lighting that provides little texture information. Meanwhile, frames captured in fog and bad weather lack sharp details, causing object contours to vanish in the background and making it hard for Yolov5 to detect objects.

Data pre-processing
The primary goal of this approach is to improve aerial view vehicle detection and tracking. The dataset was enhanced with Retinex color enhancement methods. We had three separate datasets for training and validation by the Yolov5 technique (Original dataset and two Retinex datasets), as shown in Fig. 2. The flowchart of tracking methods is also shown. First, the images were enhanced with Retinex MSRCR and MSRCP techniques. Second, the objects in the Original and enhanced images were processed and detected with Yolov5. Lastly, we applied vehicle tracking techniques and calculated relative movement as described in the next sections:

Retinex enhancement of image
Improving the videos/images in the system is beneficial because it explains unclear roads. To accomplish this, we employ the MSR algorithm (multi-scale Retinex) [30]. The Retinex idea was first proposed by Land and McCann [34]. Retinex is an image enhancement method founded on a model of the human perception of light and color. Image sharpening and color consistency improvement, as well as dynamic range compression are possible through this process. Retinex filtering is based on Land's image perception theory, which was proposed to explain the perceived color constancy of objects under varying lighting conditions. There are several approaches to implementing the Retinex principles, the multiscale Retinex is one of them, and it's utilized to bridge the gap between color images and human scene observation [35]. The steps for this technique are as follows. First, the image is processed by the Single-Scale Retinex (SSR), which subtracts the logarithm of the image from its Gaussian Filter. The first step is depicted in the equation below.
The initial image is defined by I(x, f ) , and the image of Gaussian Filter is defined by F(x, f , ∝) . The second and final step is for the image to be sent to Multi-Scale N is the number of scales. F n represents the nth scale. MSRCR stands for multiscale Retinex with color restoration algorithm, which applies MSR to each spectral channel has proved to be a very flexible automated image enhancing system that combines color constancy and dynamic range compression with local contrast enhancement to render images in a manner similar to how human vision is thought to operate. MSRCR generates images with much better contrast and sharper colors. There is also the (MSRCP) algorithm, another version of MSR that is applied to the intensity channel without color restoration and is known as MSR with chromaticity preservation (MSRCP) [30].

Processing with Yolov5
Following image enhancement, the first step is to detect a vehicle using the Yolov5 algorithm. The vehicle is then tracked to finally combine vehicle trajectories with Yolov5 to obtain each detected vehicle movement.

Yolov5: deep learning object detection model
Yolov5 is the first Yolo model designed with the PyTorch framework, and it is more lightweight and simpler to use with quick speed than prior Yolo models. For real-time object detection, Yolov5 is built on a smart CNN. The image is divided into regions by this algorithm, and the bounding boxes and probabilities for each area are calculated. The bounding boxes are weighted based on the predicted probabilities. The approach only needs one forward propagation run through the neural network to produce predictions, thus it "only looks once" at the image. It outputs known objects together with bounding boxes after a non-max suppression (which assures that the object detection algorithm only detects each object once). Yolov5 comes in four versions: Yolov5s, Yolov5m, Yolov5l, and Yolov5x. They can be found at [36]. The current implementation employs the smallest model, Yolov5s, and the next-largest model, Yolov5m. The performance of the network may increase as it expands in size, but at the cost of longer processing times [37]. We exclude the largest models due to Colab's time and space constraints. The data was divided into two sets: training, and validation. Then collected testing data was added. The images were enhanced using image enhancement techniques (MSRCR and MSRCP) and then converted to the Yolov5 PyTorch format. After that, the data were stored in a data.yaml file, which defines the classes that will be used to train, validate, and test the model. The models were then trained using a Google Colab virtual machine. The models were trained for a total of 200 epochs.

Performance evaluation of Yolov5 detected objects
It is a difficult challenge to evaluate the effectiveness of an object detector since a bounding box must be drawn around each identified object in the image. Precision, recall, and mAP are defined in Eqs. (3) through (5). These equations define some of the most used metrics for evaluating detection performance, precision, recall, and mAP.
A true positive (TP) is an accurate detection of an actual object in an image. A false positive (FP) is an incorrect object detection, that occurs when the model marks an object in the image that does not exist. An object that is visible in the image but is not recognized by the model is referred to as a false negative (FN) where AP and mAP are the average and mean average precisions; N is the number of classes and AP i is the AP value of the related class ( i).
The intersection over union (IoU) method in object detection calculates the overlap region between the predicted bounding box and the ground truth bounding box of the real item. When the IoU is compared to a certain threshold, detection is categorized as correct or wrong.

Vehicle tracking methods
Object tracking is a critical task in computer vision. There are three crucial processes in video analysis: detecting interesting moving objects; tracking such objects from frame to frame; and evaluating object tracks to recognize their behaviors. The complexity of object tracking is caused by image noise, changes in scene illumination, complex object motion, and partial and complete object occlusion [38]. Several techniques for tracking multiple objects have been proposed. In this section, we describe three techniques: (Kalman Filtering, Optical Flow, and Euclidean Distance) Trajectory Estimation.
Enhanced aerial vehicle system techniques for detection and…

Tracking using Kalman filters
Linear quadratic estimation, also known as the Kalman filter is a set of mathematical equations that, by minimizing the mean squared error, provide an effective computing method for determining the state of a process given its prior state. Prediction and correction are the two steps in the Kalman filter [39]. This filter can estimate future time states even when the nature of the modeled system is unclear. The filter initially calculates the current state variables with their uncertainties in the prediction stage. These estimates are updated after receiving weighted average estimates for the following measurement [40]. Estimates with a higher level of certainty are given more weight. The filter's technique is recursive; thus, it may operate in real time using only the current input measurements and the previously established state and associated uncertainty matrix. These filters are based on linear operators that are distorted by Gaussian noise-induced mistakes. The Kalman filter uses feedback control to estimate process parameters. At some point, the filter estimates the process state and then gets feedback in the form of noisy measurements.

Tracking using optical flow
Optical flow is the movement of objects between consecutive frames of a sequence induced by the relative movement of the object and camera. Chen et al. [41] propose a new optical flow-based approach for tracking any moving object. In complicated scenes, tracking an object's contour is always tough. To begin, they employ a method to obtain the velocity vector. The object contour is then obtained by calculating the location of moving pixels between frames. Lastly, they use the location values to calculate the object's location and speed. Optical flow techniques are classified into two types: sparse optical flow techniques and dense optical flow techniques [42]. Sparse Optical flow technique provides flow vectors of some "interesting features" within the image and only need to process a subset of the pixels in the image, which is used on paper, whereas dense techniques process all pixels. Dense optical flow technique provides the flow at all points in the frame and are slower but more accurate; sparse accuracy may be sufficient for real-time applications.

Tracking using Euclidean distance
The Euclidean Distance method determines the shortest distance between two points. In the Euclidean technique, unlike Kalman and Optical flow, no prediction is made. We can only create a trajectory between two points by using the Euclidean method to obtain the shortest distance. The Euclidean approach is used to find the new positions of points in this procedure. The points detected by Yolo are originally stored in a 2D-array. The next positions of these points are determined after a specified number of frames. The array is updated by using the Euclidean Method to discover the shortest distance between old and new points, and a trajectory is drawn between them.

Results and discussion
This system aims to reduce traffic officers' costs associated with manual tasks and to assist them in finding a solution to labor-and time-intensive tasks such as manually evaluating vehicle paths, so that they can focus on other safety-related traffic solutions. We describe a vehicle monitoring system for difficult weather conditions (fog, sandstorm (dust), and snow) based on a combination of algorithms for detecting vehicle count and trajectories. To begin, we enhance road scene images using the Multi-scale Retinex algorithm, making subsequent steps easier. Then, to detect vehicles, we use the Yolov5 trained model, which was trained using our own dataset. Finally, vehicle trajectories are obtained by merging the Yolov5 object detection and Kalman filter/Optical Flow/Euclidean Distance algorithms to track vehicle activities in each frame of the video. When capturing videos, the weather condition affects the appearance and representation of objects. Small and medium Yolov5 models are used for training. Training our model took 200 epochs with batch size 32, 16 GB RAM, and Google Colab was used to improve the performance. The Yolov5s is a small Yolov5 version that is the smallest and fastest model that was chosen for the proposed system's initial experiment. In the system's second experiment, the Yolov5m was employed, which is a medium Yolov5 version. There were 200 epochs in the training technique.

Performance of enhanced images with Yolov5 models
Original images are images before enhancement, but Msrcm and Msrcp are images after enhancements. We consider three separate datasets (Original, Msrcm, and Msrcp). Table 2 displays the results of metrics produced with model YOLOv5s for all classes. Table 3 shows the same metrics results for the second model, i.e., Yolov5m. The second column displays the number of known targets to be detected. The detector's precision, recall, and mean average precision are shown in the next columns with (Original, Msrcm, and Msrcp) datasets. As the tables show, Yolov5m performs better than Yolov5s, and the Msrcm enhancement method is much better than the Msrcp method. The metrics and validation losses of the models are shown in Fig. 3. Among Original, Msrcm, and Msrcp datasets, there are no major differences between the performance results of Msrcm and Original images on their datasets with an advantage for original dataset performance and they both obtain a higher mAP, precision and recall than Msrcp. Train and validation are done in clear weather conditions that is the reason that original dataset accuracy is slightly better than Msrcm dataset accuracy because Original is clear images with clear objects, this will be the opposite in case of bad weather images and videos. Comparing the performances as shown in Table 3 and Fig. 3 between the Yolov5s and Yolov5m models gives the advantage to the medium model (Yolov5m) because the larger models correspond to less loss and higher mAP and it was pretrained by a larger number of parameters. Figure 4 shows the mAp50:95 the performance it is the same as mAP50 performance, the original mAp is better than the msrcr performance.

Relative movement calculation of objects in video
Velocity is used to determine how fast or slow a vehicle is going. Because measuring velocity necessitates the use of a camera that is fixed in one position and the knowledge of characteristics such as distance and time. We used a different approach to calculate the relative movement of objects, as mentioned in Fig. 2 by measuring how much a point moves within certain frames. We can establish weather an object is moving fast or slowly with respect to other objects in the frame.
In both Optical flow and Kalman, the movement is shown as a number. This number is the distance between the new points and the old points calculated after  Having a large number, we can assume that the car is moving faster than the other detected cars as shown in Fig. 11.
In the next sections, we will describe in detail how we applied the trajectory part.

Optical flow tracking trajectories with Yolov5
The cars detected by Yolo in the first frame are stored in a 2D matrix where the rows represent the number of points, and the columns represent the central (x, y) coordinates.
Stored Yolo detected Points in the First Frame in a 2D Array is shown in Eq. (7). These points are then fed into the Optical flow as old points, and in return, the Optical flow will predict new points for the next frame. This process is repeated consecutively until a new point is detected by Yolo. In that case, the new point is added to the Optical Flow point array. No correction is made in this process only the predicted trajectory by Optical flow is displayed as shown in Fig. 12b, d. where the trajectory lines appear as straight lines. The new points are added by searching along the x-axis and y-axis. A point gap of 30px is chosen to accommodate the difference between optical flow predicted points and Yolo detected points. If any Yolo point is discovered during the search that is not in the Optical flow predicted points array, then that point is vertically stacked in the optical flow points array.

Corrected optical flow tracking trajectories with Yolov5
After every 6th frame, the position of Optical flow points is re-adjusted to be equal to Yolo points. In other words, the error is set to 0 after every 6th frame. New points are added by searching in the same way as before. Deletion of points is also implemented in the Corrected Optical flow. If a point is discovered during the search that has not been repeated within 35 frames, it is deleted from the optical flow point array. Figure 11a, c. The trajectory lines appear as zigzag lines due to re-adjustment and smoothing.

Kalman tracking trajectories with Yolov5
The Kalman prediction is in the form of the mean and variance of the Gaussian Distribution. Where the meaning is that the value we want to measure, and the variance is the confidence level. The Kalman state variables are the values we want to measure based on the values we are receiving. This is the state matrix: Position and velocity in the x and y coordinates make up the state matrix. x, y is the position of the vehicles to be predicted by the Kalman filter and is initialized to zero. ̇x,̇y is the video frame rate.

Error covariance matrix P
This matrix changes during the filter processing. The covariance matrix controls how fast the filter converges to the correct measured values. It is initialized based on sensor (which in our case is Yolo) accuracy. If the sensor (Yolo) is very accurate, then small values should be used here otherwise large values should be used here.

Transition matrix A
The core of the filter is this transition matrix. It is a dynamic matrix. It depends upon the video frames per second, and it is assumed to be constant during filter calculations.

dt = Frame per second
The dot product of the state matrix and transition matrix gives us the predicted value. The State matrix prediction step:

Process noise covariance matrix Q
This matrix tells us about the filter and how the system state can jump from one step to the next. This matrix will introduce noise into the system that comes from different physical conditions or parameters, such as changing camera fps. The matrix is a co-variance matrix containing the following elements [47]. Observations are made by taking the values in the process Noise Co-variance matrix to be equal to:

Measurement noise co-variance matrix R
This matrix represents measurement uncertainty. The sensor (Yolov5) is accurate and small values should be used here.
The Kalman Object detection is implemented by adding the Yolov5 points to the Kalman point array. In normal Kalman detection, initially all the Yolo detected points first 10 values are fed into the Kalman algorithm. Then predictions are taken.

Corrected Kalman tracking trajectories with Yolov5
This error is made 0 at every 6 frames. The positions of points in the Kalman array are made equal to the Yolo detected points. New points are searched and added on every 6th frame. And the points that are not repeated within 25 frames are deleted. By this deletion, we remove the points which are falsely detected by Yolo or the points which are no longer present in the video, i.e., cars going out of the frame. Corrected Kalman trajectories are shown in Fig. 13a, c.

Error calculation for tracking trajectories methods
The error graph is plotted using the predictions made by the Optical flow and Kalman methods. The Euclidean distance of these predicted values is taken with Yolo's detected point. Error is measured after 10 frames in both the Optical flow and Kalman methods. The summation of the Euclidean norm of all the points predicted by Optical flow and Kalman is taken with the Yolo detected points in that 10th frame. To keep the number smaller, the error value is divided by the frame count which is 10 in our case. As seen in Fig. 14a, the (Optical Flow) method works better in an (enhanced weather video) where the error remains less than 10 (red and green lines). But both the figures (enhanced and bad) weather Fig. 14a, b the (Corrected Optical Flow) method has less error, and it remains almost equal to 5 (green line in (a), (b) of Fig. 14). This error is calculated by taking the norm of the predicted points and the Yolo detected points and dividing them by the frame count, which is equal to 10. The green graph has a distance error of 5 pixels from the actual value (Yolo). It is clear show in the line graph, that the (Corrected Kalman) method   Fig. 14a. Overall, the best method for drawing trajectories that have a minimum error is the Corrected Optical flow method (green line), followed by the Optical flow method (red line) for both enhanced Fig. 14a and bad weather in Fig. 14b.
As we see in Table 4 the average error calculation for 40 frames of Fig. 14, the least errors are for Corrected Optical Flow (green) plot in Fig. 14 with 1.53 for enhanced weather and 1.76 for bad weather. The enhanced weather has the least error because the trajectories are more accurate than the bad weather.

Tracking trajectories with Euclidean distance
Using the Euclidean Distance tracking vehicle algorithm, each identified vehicle center point in each frame is sent to an array, which holds the coordinates of the previous frame. The distance between the vehicle's current coordinates and the previous frame coordinates is then determined (Euclidean distance). The identified vehicle either gets a new id or keeps the one from the previous frame. Yolov5 finds the objects in the image. Each object's coordinates will be returned. Using the previous frame, we built object tracking. Because the frame before is a referred frame. The  object coordinates from a previous frame were recorded and compared to the current frame. The straight-line distance between two locations in Euclidean space is known as the "Euclidean distance" or "Euclidean metric." Calculate the distance between the current frame and the reference frame for each object using the Euclidean distance. This is the same object if the distance between two object values is less than 50 pixels; otherwise, create a new label and apply it as shown in Fig. 15. A comparative trajectory line between enhanced and bad weather videos is shown in Fig. 16 where enhanced video has better and longest trajectories due to the higher number and stability of detected vehicles. Figure 13a, c depicts trajectory lines. (Corrected Optical Flow) trajectories lines of (Enhanced weather) videos have the least error, with smooth, not scattered lines, and give better tracking for detecting vehicles.
As a result, in bad weather; enhanced Msrcm images and videos have improved object detection and trajectory lines, particularly when using the [Msrcm, Yolov5m model, and (Corrected Optical Flow) tracking trajectories] the error is less than 5 pixels from the Yolo actual value.

Conclusions
The current work aims to develop a deep learning neural network capable of detecting, counting, and tracking vehicles from aerial video streams for enhancing selfdriving vehicles' recognition rate in bad weather, vehicle surveillance, traffic monitoring, and tracking in bad weather conditions by combining the Retinex color enhancement algorithm and the Yolov5 object detection algorithm in a new manner. Three separate datasets were collected to train the model. Two Yolov5 versions were also examined, and it was concluded that enhanced Msrcm Yolov5m is better in performance than enhanced Msrcm Yolov5s for the intended detection problem. The Yolov5m with enhanced Msrcm retinex method proved the ability to recognize aerial view vehicle types correctly with the largest count in test images and videos. Corrected Optical Flow trajectories with enhanced weather videos have the least error and give better tracking for detected vehicles. These combinations enable the best vehicle detection and trajectory drawing. All the tests and results suggest that the proposed technique is reliable enough to be used in tough weather conditions Enhanced aerial vehicle system techniques for detection and… like fog, sandstorm (dust), and snow. The significant improvements make it possible for the model to perform well in real-world road and traffic applications. Overall, the research presented in this paper has increased self-driving cars' recognition abilities in adverse weather. In future, we will run the system online, expand the detection target types, extract more features from the video frames to get better detection and accuracy, and increase the dataset with more training data.
Author contributions All authors contributed equally to the writing.
Funding Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB). None.

Availability of data and materials
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflict of interest
The authors declare that they have no conflicts of interest.

Ethical approval Yes.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.