Camera-Based Tracking of Floating Objects using Fixed-wing UAVs

This article concerns tracking of floating objects using fixed-wing UAVs with a monocular thermal camera. Target tracking from an agile aerial vehicle is challenging because uncertainty in the UAV pose negatively affects the accuracy of the measurements obtained through thermal images. Consequently, the accuracy of the tracking estimates is degraded if navigation uncertainty is neglected. This is especially relevant for the estimated target covariance since inconsistency is a likely consequence. A tracking system based on the Schmidt-Kalman filter is proposed to mitigate navigation uncertainty. Images gathered with an uncertain UAV pose are weighted less than images captured with a reliable pose. The UAV pose is estimated independently in a multiplicative extended Kalman filter where the estimated covariance matrix is a measure of the uncertainty. The method is compared experimentally with two traditional alternatives based on the extended Kalman filter. The results show that the proposed method performs better with respect to consistency and accuracy.


Introduction
Target tracking is key in mapping and surveillance of the sea surface, and in collision avoidance for autonomous vehicles. It is also important for floating debris detection in seismic operations at sea, and for search and rescue operations. A common requirement is the need for a robust system that can estimate the target states precisely and deliver consistent covariance information.
Tracking systems use a single or several tracking sensors to observe targets. These observations are fused with prior information about how objects are expected to behave. The fusion typically involves filtering using variations of the Kalman filter where a prior target model is combined with a measurement model to estimate the posterior target states. The measurement model depends on the sensor, but a recurrent requirement is the need for accurate pose information (sensor position and orientation). This is particularly challenging when the sensor is mounted on an agile platform, such as fixed-wing unmanned aerial vehicles (UAVs) [12,14].
This research investigates tracking of objects on the sea surface using a fixed-wing UAV equipped with a monocular thermal camera. The architecture presented in this work is also applicable for visual spectrum cameras. The sensor choice depends on the target characteristics. Much research have been conducted on target tracking using optical sensors where the pose of the sensor is stationary, assumed perfectly known or can be estimated with high precision [1,15,23]. However, tracking of objects in an Earth-fixed coordinate frame using an airborne thermal camera with navigation uncertainty is a different challenge. The camera pose depends on the position and attitude of the UAV. Therefore, errors in the navigation system of the UAV negatively affect the tracking accuracy and can cause a significant loss in performance. This is especially troublesome in multi target tracking because accurate covariance estimation is needed for measurement association.
Target tracking using UAVs equipped with optical sensors has been studied previously [12,13,16,17,22,30]. It is desirable to obtain high accuracy while maintaining filter consistency. Reduced tracking accuracy is not necessarily critical, but only if the covariance estimates represent the true uncertainty. In practice, this means that the tracking system should automatically adapt to the uncertainty of the UAV navigation estimates. This is closely related to the idea in simultaneous localization and mapping (SLAM) [20,26,28]. However, SLAM is not preferred in this research because the navigation states can be estimated precisely with inertial sensors and a global navigation satellite system (GNSS). Moreover, erroneous measurement association is a challenge in visual SLAM and can potentially cause divergence in the estimated UAV pose. Consequently, observations requiring data association are a threat to the airworthiness and are not preferred in the navigation system.
Target tracking in the presence of navigation uncertainty has been studied for autonomous ships [6,31,32]. However, these articles have focused on a subset of the navigation space and used a radar as the tracking sensor. Moreover, experimental results are not presented. Fixed-wing UAVs have much faster dynamics than water surface vessels and can change attitude quickly. Moreover, studying the effects of navigation uncertainty in all degrees of freedom for a UAV is to the authors best knowledge not studied previously.
This article proposes a tracking system based on the Schmidt-Kalman filter [32]. A multiplicative extended Kalman filter (MEKF) [19,27] is used as UAV navigation system. Objects are detected automatically in thermal images using image-processing techniques [15]. The pixel position of detected objects is used as measurements in the tracking system where the uncertainty of the navigation estimates is incorporated. Several flight experiments have been carried out at sea to collect experimental data, and the results illustrate the benefit of incorporating navigation uncertainty in the tracking system. The Schmidt-Kalman filter is a variation of the extended Kalman filter and used as a proof of concept in this paper. Other filtering techniques such as the unscented Kalman filter could also benefit from the same inclusion of navigation uncertainty during tracking from an agile platform.
The rest of this article is organized as follows. Section 2 describes the navigation system and relevant coordinate frames. Section 3 describes tracking in the presence of navigation uncertainty and how the effect of navigation uncertainty is mitigated. Section 4 presents target motion and measurement models. Experimental results are described in Section 5 before the work is concluded in Section 6.

UAV Navigation using the Multiplicative Extended Kalman Filter
A navigation system estimates the position, velocity and attitude of a vehicle. Additional states such as acceleration, angular velocity and inertial sensor bias [8] can also be included. This section looks into UAV navigation since navigation uncertainty is incorporated in the tracking system in Section 3.

Preliminaries
The relevant coordinate frames are: -The north-east-down (NED) coordinate frame [9], assumed to be inertial locally and denoted {n}. -The body-fixed frame of the UAV [5], which is fixed to the UAV and denoted {b}. The vector p n nb represents the position of {b} relative to {n} decomposed in {n}, and is the NED positions of the UAV. The target operates on the sea surface and the position is described through NED coordinates p n nt . The coordinate systems are illustrated in Fig. 1.
The UAV states that are estimated in the navigation system are: -The UAV position in NED (p n nb ) -The UAV velocity in NED (v n nb ) -The UAV attitude (rotation between NED and body) parameterized by the unit quaternion (q n b ) Fig. 1 Illustration of coordinate frames J Intell Robot Syst (2021) 102: 80 -Accelerometer bias decomposed in body (b b acc ) -Angular rate bias decomposed in body (b b ars ) The attitude q n b is parameterized by the unit quaternion (Hamiltonian representation) and is a global and singularityfree representation of attitude [27]. The relationship between the quaternion and the corresponding rotation matrix is where q w is the real part of the quaternion, q v is the vector part of the quaternion, I is the identity matrix and S(·) is the skew-symmetric matrix representing a cross-product [9]. The rotation matrix must be reconstructed from the unit quaternion when target detections are integrated in the tracking system in Section 4. The quaternion product, used for composition of rotations and in the kinematic equations in Section 2.4, is defined as [27] where q 1 and q 2 are unit quaternions. In the inertial navigation problem (Section 2.5), the true attitude q can be expressed as q =q ⊗ δq where δq is the attitude error andq is the nominal estimated attitude. The attitude error is parameterized using the four times Modified Rodrigues parameters (MRP) δa [19]: The modified Rodriguez parameters represents the attitude error with the minimal number of parameters and is used to avoid potential singularities in the covariance matrix. Euler angles is also a feasible parameterization for the attitude error since the singularity at a pitch of 90 degrees [5] is an unlikely issue for fixed-wing UAVs.
The UAV position is also a combination of a nominal state estimatep and an error state δp. The true position is p =p + δp where the reference frame is neglected for simplicity. The relationship between the true states, nominal states and error states is also a sum for the velocity, accelerometer bias and angular rate bias.

Navigation Sensors
It is assumed that the UAV is equipped with the following sensors for navigation: -An inertial measurement unit (IMU), providing measurements of the acceleration (specific force) and angular rate at high frequency. -Two GNSS receivers with two antennas providing independent position measurements. One antenna is mounted in the front of the fuselage and one in the back of the fuselage.
The augmented real-time kinematic (RTK) GNSS service is used to obtain position measurements at centimeter level for both receivers using corrections from a local base station [10]. RTK functionality may not be required, but increases the estimation accuracy, particularly when the baseline between the antennas is small as on small UAVs. Using two receivers is beneficial since heading can be estimated based on GNSS position measurements [10]. Thus, it is not necessary to measure heading with other sensors such as a magnetic compass, which has significant weaknesses onboard small vehicles [10]. Heading estimation using a single GNSS receiver is challenging since non-zero angular and linear velocities are needed for observability.

Sensor Models
IMUs measure specific force (f b imu ), and angular rate (ω b imu ). The sensor models follow the definitions in [9]: where b b acc and b b ars are biases on the accelerometers and angular rate sensors, respectively. w b acc and w b ars are zeromean noise terms. The position measurements obtained with GNSS are modeled as: where w n gnss is a zero-mean noise term and r b bi is the lever arm between the origin of {b} and the position of antenna i.

Kinematics
The navigation system is defined by the geometrical relationship between the states (kinematics). The kinematic equations are based on the strapdown equations [27]: where f b nb is the specific force acting on the UAV and g n = [0, 0, g] is the gravity vector in NED [10]. Moreover, the angular velocity of {b} relative to {n} is ω b nb . Ω(ω b nb ) is a 4x4 matrix used to replace the quaternion product in Eq. 6:

Error-State Navigation System
The error-state formulation of the Kalman filter is used as navigation system in this research. The fundamental idea of the error-state formulation is to use the IMU to estimate high-frequency nominal state estimates and use low-frequency aiding measurements (GNSS) to estimate the error states. The error-state Kalman filter is described in detail in [27]. The nominal state vector iŝ (8) and includes the position, velocity, and attitude of the UAV. Moreover, the IMU biases are also estimated. The error-state vector is δx = δp n nb , δv n nb , δa, δb b acc , δb b ars (9) where each element in the nominal state vectorx has a corresponding error state. Note that δa is the attitude error represented by the four times modified Rodrigues parameters as described in Section 2.1. The true state x is expressed as x =x ⊗ δx where ⊗ is a generic composition used to describe the relationship between the true, nominal and error states. The nominal states are estimated through Eqs. (6) and (4): T acc and T ars are diagonal time constant matrices used to model the biases.
The error states are estimated using two GNSS position measurements. The Jacobian of Eq. 5 with respect to the error states is where 0 m×n is a matrix of zeros with m rows and n columns and I 3 is the identity matrix of dimension 3. The Jacobian is used to update the error states before the error is propagated back into the nominal estimates through the reset operation described in [27].

Target Tracking using the Schmidt-Kalman Filter
This section describes target tracking using the Schmidt-Kalman filter (SKF). The goal is to mitigate the effects of uncertainty in the sensor pose. The main issue when neglecting navigation uncertainty is inconsistency in the tracking filter. This is normally observed as the filter being optimistic and estimating a covariance that is much lower than the corresponding mean square estimation error. It is possible to counteract this behavior by increasing the covariance matrices for the noise, but this is an ad-hoc technique that disguises the underlying problem. Moreover, the filter parameters must be tuned towards a particular scenario or set of data. Therefore, it is hard to generalize this strategy to fit new and unknown data. The SKF is a variation of the extended Kalman filter (EKF) where navigation uncertainty is incorporated as states in the filter and where correlations between the navigation system and tracking estimates is maintained. Other tracking filters such as the unscented Kalman filter or the particle filter are viable alternatives to the EKF, particularly when the measurement model is highly nonlinear. Nevertheless, the motivation in this paper is to investigate how navigation uncertainty can be included in a filtering approach and the EKF is used as a proof of concept. The methodology can be generalized to fit into other filter structures such as the unscented Kalman filter.

Introduction to the Schmidt-Kalman Filter
The sensor pose is normally assumed perfectly known. This is reasonable in some applications (e.g., when only the relative position is of interest), but limits the performance of the tracking system in applications where the sensor pose is somewhat uncertain. This is particularly relevant when targets are tracked in an Earth-fixed coordinate frame from an agile fixed-wing UAV. Simultaneous localization and mapping [7] handles mapping in the presence of navigation uncertainty. A common assumption is that landmarks are static. Thus, the conventional SLAM architecture cannot cover a situation with moving targets. SLAM can be extended to consider moving landmarks as suggested in [29]. However, the fundamental idea in SLAM is that target (landmark) observations can influence the UAV pose. Consequently, erroneous data association, inaccurate measurements and the unknown target behavior degrade the accuracy of the UAV pose when using SLAM but might be necessary in certain situations such as in GNSS-denied applications. However, this is not desired in this research because measurements from optical sensors can be inaccurate. Moreover, the sparse set of distinct features on the sea surface makes SLAM architectures less viable for UAV operations at sea.
A more attractive option is to use a framework where the navigation system of the UAV operates independently, but where the uncertainty of the navigation estimates is allowed to influence the tracking system. This can be achieved with the SKF [8,21,25,32], where information can flow from the navigation system to the tracking system. The SKF is a sub-optimal approach, but more robust when considering model mismatch and erroneous data association.

Tracking System Model
Let x t k denote the target state vector at time k. A common way to estimate the target states is to use an EKF. The conventional EKF does not account for uncertainty in parameters affecting the motion and measurement models. Instead, all uncertainty is modeled as additive white noise. This is not a reliable way to model the uncertainty when navigation errors are present.
Consider a situation where the camera pose is defined as x o =x o ⊗ δx o wherex o represents a nominal known estimate and δx o is the error state (follows the notation of the error-state Kalman filter). Moreover, assume that the uncertainty of the error state is known through an estimated covariance matrix P o . The motivation behind this formulation is to augment the tracking system with δx o and let P o influence the target estimates. Only the error states that directly affect the tracking system are included. The augmented state transition model is written as where F t is the target model, and F o is the navigation error model (camera pose). v t k and v o k are additive white noise for the target and navigation error states, respectively. The state transition models of the error state and target are uncoupled because the target behavior should not affect the trajectory of the UAV and vice versa. The covariance matrix of the augmented system is where P t is the target covariance and C is the crosscovariance. Assume that the augmented tracking system has a measurement model that depends on both the target states and navigation states of the UAV: where z is the measurement vector and w is additive zero-mean white Gaussian noise. The nominal navigation estimatex o is known from the navigation system. When the measurement model is nonlinear, the Jacobian of Eq. 14 is used in the EKF. The idea in the SKF is to calculate the Jacobian with respect to both the target states x t and the error states δx o . Consequently, the linearized measurement model is where is the linearization point and Δ represents higher order terms that are neglected in the first-order filter. H t and H o are measurement Jacobians differentiated with respect to the target states and the navigation error states, respectively.

Update Equations for the Schmidt-Kalman Filter
The SKF state transition equations are based on the augmented system in Eqs. 12-13, and the Kalman filter equations. The a priori update is where Q t is the covariance matrix of the process noise in the target model. The covariance of the navigation error state, P o , is extracted from the navigation system at each time step. The Kalman gain is divided into two components, one that affects the target states and one that affects the navigation error: where the innovation covariance is defined as and R k is the covariance of the measurement noise. The SKF formulation appears by choosing a suboptimal gain where K o k is forced to be zero so that the error state, δx o k|k , is constant over the measurement update. The target gain is and the a posteriori state estimates are given as The a posteriori estimates of the covariance and the crosscovariance differ from the regular EKF equations because a suboptimal gain is chosen. Therefore, it is not possible to use the simplified formula for the a posteriori estimate. The Joseph form [3], which is valid for any gain, is used instead: Moreover, the following a posteriori estimates for the target covariance and the cross-covariance are acquired This concludes the main structure of the SKF. The key step is to use the suboptimal gain. The tracking system must maintain the target estimates and corresponding covariance in addition to the cross covariance. The navigation system maintains the covariance of the error state and the nominal state estimates. It is not influenced by the tracking system.

Navigation Error Models
When a monocular camera is used to track objects, only the position and the attitude of the camera influence the tracking system. Consequently, the error state δx o in the augmented system includes the position and attitude of the UAV: δp o is the error state in position. δa o is the error state in attitude and represented by the four times modified Rodriguez parameters (see Section 2). F o , in Eq. 16, is chosen as the identity matrix since the navigation error is estimated outside of the tracking system. This is a fundamental difference from SLAM where the error states are estimated within the same filter as the tracking estimates. The next part discusses how the equations in Section 3.3 are affected by the properties of the error states.

Zero-Mean and Uncorrelated Navigation State Errors
Assume that the navigation system is consistent without a bias. This gives uncertainty in the UAV pose that is zero mean and uncorrelated. Consequently, the measurement noise induced in the tracking filter is white. Moreover, the cross-correlation between the target estimates and the navigation error, in Eq. 13, is zero (C = 0). Therefore, the SKF equations can be simplified and the only effect on the tracking system is a so-called covariance inflation (also called consider covariance), where the innovation covariance is reduced to and the a posteriori target covariance estimate is The second term represents the covariance inflation. This is a viable strategy to mitigate the effect of navigation uncertainty, but one that neglects the cross-covariance. This strategy is called detuned EKF in the experimental validation in Section 5.

Constant Navigation State Bias
A second and more likely situation in experimental data is navigation errors that introduce a constant bias in the target observations. This can be caused by an unknown mounting misalignment between the IMU and the camera. It can also be caused by a constant bias that is estimated poorly in the navigation system. Moreover, experimental data are not necessarily obeying known models so it is likely that estimation errors are correlated in time [12]. Consequently, a nonzero cross-covariance between the target estimates and the navigation errors appears. This mindset gives the equations in Section 3.3 where F o is the identity matrix. This formulation is called SKF in the experimental validation in Section 5.

Other Navigation State Error Models
Error states that behave like slowly varying biases can be covered by the previous case, but higher-order models cannot. If it is assumed that the navigation system is initialized properly and has been running for a few minutes before the tracking objective starts, it is likely that sensor biases and other disturbances (in the navigation filter) have converged. Therefore, it is not considered to be beneficial to model the navigation error with higher-order models in this research. Higher-order models are more relevant in dead reckoning situations caused by GNSS dropout for example. In such a situation, SLAM architectures must be considered.

Camera-Based Tracking of Floating Objects
The general idea of tracking in the presence of navigation uncertainty was derived in Section 3. This section defines specific models for tracking of floating objects using monocular optical sensors.

Target State-Transition Model
Floating objects typically have slow variations in speed and course except for wave-induced motions that are not considered. The target dynamics can be covered by a state vector, x t = [p n nt , v n nt ] , where the north and east positions (p n nt ) and velocities (v n nt ), are included. It is common to choose the following uncoupled (near) constant-velocity target state transition model: where the subscript k refers to time step k, T is the sampling interval, and the covariance of the process noise is where σ 2 v I is the covariance of v, assumed to be equal in north and east. The constant velocity model is defined in discrete time and assumes that the motions in the north and east coordinates are uncoupled. The down coordinate is neglected in the target vector since targets remain on the sea surface at approximately the same altitude. More complex target state transition models are presented in [24]. Without prior knowledge about the target behavior, a simple model with a small amount of parameters is often the most reliable choice [14].

Measurement Model
Monocular cameras only provide bearing information. However, the altitude of the targets is known for objects on the sea surface. Target observations are acquired through object detection in thermal images [15,17]. The target detection algorithm returns the centroid of the target in the image plane. The target pixel position is related to the camera frame through the pinhole camera model [11]: where z 1 and z 2 are measurements, (u t , v t ) are the pixel coordinates of the detected target center and w cam is Gaussian zero-mean measurement noise. The pixel position of the target can be related to the target position in the camera-fixed frame through the pinhole camera model: where f is the focal length of the camera lens and (x c t , y c t , z c t ) = p c ct are the target coordinates in the camera-fixed coordinate frame. The camera-fixed position of a target is related to the camera pose and target position in NED through the following equation: where R c n is the rotation matrix between {c} and {n} and depends on the camera orientation. p n bc is the lever arm between the origin of {b} and the origin of {c}. These frames are assumed to coincide so p n bc = 0. Equation 30 can also be expressed with the navigation error states: where R c b is a known and constant rotation matrix between the camera-fixed frame and the body-fixed frame. R b b is a rotation matrix given by the error state of the attitude, and Rb n is the (nominal) estimated rotation matrix between the body frame and NED in the navigation system.p n nb is the nominal NED positions of the UAV estimated by the navigation system, and δp n nb is the error state in the UAV position. R b b represents the difference between the estimated body frame and the true body frame.
To simplify the Jacobian calculation in the SKF, it is beneficial to rewrite (31) as: where the final two expressions are equal. The last expression is beneficial when differentiating the measurement model with respect to δa, and the first expression is beneficial when differentiating with respect to δp n nb . Note that Eq. 32 needs to be inserted in Eq. 29 before the Jacobian is calculated. Finding the Jacobian theoretically is challenging, but is achievable with a computer program. Nevertheless, for real-time calculation, it is typically more efficient to find the Jacobian numerically.
In the SKF, the Jacobian is calculated with respect to the target position (p n nt ), the target velocity (v n nt ) and the UAV error states (δp n nb , δa). Hence, the Jacobian has two rows and ten columns where column three and four only have zeros because the measurement model does not depend on target velocity.

Field Experiments
The methods presented in Sections 2, 3 and 4 are evaluated using experimental data from several field tests. This section describes the UAV platform used to collect data, how the methods are evaluated, and several case studies presenting experimental results.

Experimental UAV Platform
A fixed-wing UAV with electrical propulsion (displayed in Fig. 2) was equipped with the following list of sensors for collecting relevant data: -FLIR Tau2 thermal camera with a resolution of 640x512 pixels, focal length of 19 mm and frame rate of 7.5 Hz. It is a longwave infrared camera sensitive to wavelengths from 7.5 μm to 13.5 μm. -A ThermalGrabber used to extract the digital image from the Tau2 thermal camera. -Pixhawk autopilot running Arduplane software. -SenTiBoard to synchronize the camera and navigation data [2]. -Odroid-XU4 as on-board computer for storing data.
-Analog Devices ADIS 16490 IMU measuring specific force and angular rate at 250 Hz -Dual-antenna RTK-GNSS based on uBlox NEO-M8T receivers measuring position at 5 Hz.
Three independent field tests were carried out where the goal was to gather data for tracking of the vessel displayed in Fig. 3. Parts of the data from these tests are used in three independent case studies presented in Sections 5.3 to 5.5. The field experiments were carried out on the same day at Raudstein in Norway. The weather was windy and sunny during all flights. The flight paths were predefined without customizing the path based on the weather conditions or

Experimental Evaluation
The purpose of the case studies is to evaluate the methods presented in this work. Sensor data from IMU and GNSS were used to estimate the pose of the UAV as described in Section 2. Navigation results are not presented in this article but have been compared and validated with navigation data from the autopilot. Target detection [15,17] was used to detect the target in thermal images using image processing. Target detections were included as measurements in the tracking system (Section 4) together with the camera pose. The entire pipeline is illustrated in Fig. 4. The green boxes  illustrate the parts of the system that are covered in this article. Object detection is covered in [17]. Figure 5 shows a typical normalized thermal image of the target. The dark background shows the sea surface, and the bright spot is the target vessel. Figure 6 shows another raw normalized thermal image with both the sea surface and the shore.
The thermal images used in the forthcoming case studies only included one target in each image. Consequently, measurement association is not of importance for the case studies, and the filtering part of target tracking is key. The methodology can be extended to include multiple targets by using the Nearest Neighbor Standard Filter [4] for data association due to the strong signal-to-noise ratio in the images. Moreover, false positives are rare [17]. For increased robustness, the SKF combined with the Probabilistic Data Association Filter [4] is a viable strategy.
The methods have been analyzed in post processing, but real-time processing is also feasible on the on-board computer. The object detection algorithm has been executed in real-time previously [17]. Moreover, the execution time of the SKF is comparable with the extended Kalman filter (EKF). The only increase in processing time (compared to the EKF) is the calculation of the cross covariance, and the inclusion of the cross-covariance in the gain and target covariance update. In addition, the Jacobian is extended with additional elements, but is not producing a significant increase in execution time.
The SKF is compared with two alternative tracking filters in the forthcoming case studies. The first alternative is an EKF where navigation uncertainty is neglected. The Jacobian of Eq. 32 is only calculated with respect to the target states and not the navigation states. The second alternative is an EKF where navigation uncertainty is used to inflate the covariance of the target. This solution is based on Eqs. 24 and 25 where the Jacobian is calculated both with respect to the target and navigation states as specified in Eq. 15. It is denoted detuned EKF where detuned refers to the covariance inflation in Eq. 24. The main difference between the SKF and detuned EKF is the cross-covariance, which is maintained in the SKF. All filters have been initialized by using direct georeferencing [12] to find the north and east positions of the first available detection. The true states of the target were measured with a singlefrequency GNSS receiver using standard GNSS service and used as reference. RTK corrections were not available for the target reference so the horizontal positioning accuracy is expected to be 2 m.

Case Study 1 -Tracking of High-Speed Vessel
The objective in the first case study is to detect and track a high-speed target. This case study is based on data from a field test where more than 8000 thermal images were captured in total together with navigation data from the same period. The target was visible and have been detected successfully in 264 images within a chosen tracking period of three minutes. A subset of 1350 images were captured in the tracking period. The target was outside the field of view of the camera in the periods without detections. The UAV moved past the target and had to turn to observe the target again as shown in Fig. 7. Consequently, major parts of the estimates are solely based on prediction.
The covariance of the measurement noise in Eq. 28 was chosen as (30 pixels) 2 along both image frame axes. These values were chosen quite large to also fit the data in case studies 2 and 3. The measurement noise covariance could have been reduced significantly here, but a generic tuning was desired to prove that the system works in several scenarios, and for a fair comparison between the SKF and the alternatives. Therefore, it was not specially optimized for any of the case studies. All filters used the state transition model from Section 4.1 with σ v from Eq. 27 equal to (0.05 m/s 2 ) 2 . The initial covariance in the north and east positions was chosen as (10 m) 2 and the corresponding velocities with covariance (10 m/s) 2 . The initial covariance in velocity is not very far from the values proposed in [18]  An interesting factor in the SKF and detuned EKF is the estimated covariance in the navigation states of the UAV. The mean estimated standard deviations for the north, east and down positions are 8 cm, 8 cm and 7 cm, respectively. The mean estimated standard deviations for the roll, pitch and yaw angles are 1.35°, 1.30°and 2.65°, respectively. The uncertainty in the position is obviously less influential than the attitude [12]. The estimated standard deviation is small The estimation errors in target position are shown in Fig. 8, and the main results of this case study are summarized in Table 1. The accuracy of the estimated position is comparable for the EKF and the SKF, but the SKF is more accurate overall. The mean estimation error is closer to zero and the average absolute estimation error is also smaller.
The estimated speed and course are displayed in Fig. 9. The SKF is more accurate initially, and this explains why the estimated positions of the EKF and detuned EKF drift more in the beginning (the first segment without measurements in Fig. 8).
The largest difference is seen in Fig. 10, which shows the normalized innovation squared (NIS) and normalized estimation error squared (NEES) in position [3]. NEES describes the estimation error squared normalized by the estimated covariance. In the same manner, the NIS describes the measurement innovation squared normalized by the innovation covariance. The NEES is above the upper confidence bound for the EKF in several time periods. This means that the EKF is too optimistic with a covariance estimate that is too small compared to the true estimation error. The SKF on the other hand has the opposite behavior. In fact, the SKF is too pessimistic, which means that the estimated covariance is larger than the corresponding estimation error. Therefore, the measurement noise covariance could have been reduced for the SKF in this case study. The detuned EKF falls somewhere in between but has a covariance that is too small in extended periods.
It is obviously possible to reduce the NEES in general by increasing the measurement noise covariance. However, the  measurement noise standard deviation must be increased to 60 pixels (from 30) to keep the NEES within the confidence bounds for the EKF. This is much larger than the expected measurement noise from the object detection algorithm, and a solution that must be tailored to every new set of data. Consequently, it is not robust in the same sense as the SKF even though such a solution does not necessarily affect the accuracy negatively. The results presented in this case study are further discussed in Section 5.6.

Case Study 2 -Tracking of Slowly-Moving Vessel
The objective in the second case study is to track a slowly moving boat. This case study is based on data from another independent field test. The target was visible and have been detected successfully in 600 images (corresponds to 80 seconds of continuous detections) spread over a tracking period of 750 seconds (5625 images). The detections are spread into 11 segments as shown in Fig. 12. The target was outside the field of view of the camera in the periods without detections. All filters were tuned in the same manner as in the previous case study. The true paths of the UAV and the Fig. 9 Estimat ed speed and course The mean estimated standard deviations for the UAV north, east and down positions are 43 cm, 43 cm and 68 cm, respectively. The mean estimated standard deviations for the roll, pitch and yaw angles are 1.22°, 1.24°and 2.63°, respectively. The standard deviation in position is larger than in the previous case study because the IMU was weighted more in the navigation filter. This is further explained by the flight pattern in Fig. 11, which was dominated by turns.
The estimation errors in target position are shown in Fig. 12. The main results of this case study are summarized in Table 2. The SKF is the most accurate filter. The EKF and detuned EKF drift in periods without measurements while the SKF predicts the position more accurately. Fig. 11 The paths of the UAV and target in case study 2 The estimated speed and course are displayed in Fig. 13. The EKF and detuned EKF manage to estimate the total speed accurately, but the estimated course is wrong and never converges in the same sense as for the SKF. This explains why the position estimates drift more for the EKF and detuned EKF, and similar behavior was also observed in the previous case study. Since all filters have the same state transition model and the same set of measurements, one can conclude that the SKF is more accurate in this case study.
The NIS and NEES (in position) are displayed in Fig. 14. The NIS is similar for both the SKF and EKF, but smaller for the detuned EKF. A few measurements are outside of the confidence bounds, but that is not necessarily an issue for consistency. The largest NIS values appear when it is a significant period between two consecutive measurements, which is the expected behavior.
The NEES is not similar. The EKF and detuned EKF have NEES values that are much larger than the upper confidence bound for major parts of the tracking period. This means that they are too optimistic, and that the estimated covariance in position is much smaller than the corresponding

Case Study 3 -Tracking Of Drifting Vessel
The third and final case study concerns tracking of a drifting vessel. This is based on data from a third independent field test. The target was visible and successfully detected in 2400 images (corresponds to 320 seconds with continuous detection) over tracking period of 1500 seconds (11250 images). The detections are spread into several segments as shown in Fig. 16. The target was outside the field of view The estimation errors in target position are shown in Fig. 16. The main results of this case study are summarized in Table 3. The performance, with respect to tracking accuracy, is comparable for all filters. The EKF and detuned EKF have mean estimation error closer to zero, but the SKF has the smallest average absolute estimation error. The average absolute estimation error is smaller in this case study overall. That can both be because of the reduced altitude of the UAV and the vessel speed, which was smaller during this field test. The large initial estimation error is due to the large initial covariance in velocity and the poor accuracy in the first few measurements. Moreover, only a small set of measurements was available initially, so the filters were not converging until new measurements arrived. A small set of initial measurements is particularly challenging when estimating the velocities which causes drift in position rapidly. Note that the vessel only moved approximately 200 m during the tracking period. Figure 17 shows the estimated speed and course. The estimated speed is a fraction smaller for the SKF and closer to the reference. All filters struggle to estimate the correct course. The SKF converges somewhat halfway into the tracking period and is more accurate than the EKF and detuned EKF. Note that the speed of the target was so small that the course reference from the GNSS receiver on the target could be inaccurate at times and not properly defined.  The course reference could have been smoothed, but the raw data were chosen because it has been used in the previous case studies. Nevertheless, these results show that the SKF estimates the speed and course more accurately, but that it does not affect the accuracy of the estimated position significantly because of the total number of measurements available in this case study. Figure 18 shows NIS and NEES (in position) for all filters. The NEES is almost never within the confidence bounds for the EKF and detuned EKF, which means that the estimated covariance is too small. This is a weakness observed in the previous case studies. The SKF on the other hand, has NEES within the confidence bounds for most of the time. This case study also shows a significant difference in NIS. The NIS is too large for the EKF for several measurements, and must be explained by a small, estimated innovation covariance. Thus, the covariance of the measurement noise should have been increased for the EKF. The SKF also has a few measurements outside of the confidence bound. The NIS and NEES are interesting for the detuned EKF. The small NIS values indicate that  the innovation covariance is too large. However, the NEES values indicate that the covariance is too small compared to the estimation error. The same effect is not observed for the SKF. Consequently, the cross-covariance between the navigation and tracking estimates, which is maintained in the SKF, has a positive effect for consistency and shows the advantage of the SKF. Similar behavior is observed in case studies 1 and 2, but in a less significant manner.

Discussion
A comprehensive amount of data has been used to evaluate the SKF in case studies 1-3. A few conclusions can be drawn from the results: Fig. 18 Normalized estimation error squared and normalized innovation squared -The SKF is more accurate and is especially better when it comes to estimating the speed and course. -The position estimates with the EKF and detuned EKF are accurate, but are weak for long periods solely based on prediction. -The most significant difference is observed for the NEES where the EKF and detuned EKF are overconfident. The estimated covariance is much smaller than the corresponding estimation error, which is an issue for data association in multi-target tracking. It is possible to increase covariance of the noise in the motion and measurement models, but it is hard to generalize such a solution to fit new data. Moreover, this solution does not fit well with the NIS values. The SKF works for a fixed set of parameters, and is more robust and reliable with respect to NEES. -Even though the SKF is superior in some means, the necessary covariance of the measurement noise is still larger than the expected uncertainty in the object detection algorithm [15]. Consequently, the errors affecting the tracking system are not mitigated perfectly. This can be caused by other error sources than navigation errors and effects not mitigated by the SKF. That can for instance be time-varying biases with a small time constant or unknown timing errors in sensor data. -The same tuning was used in all case studies and different results were achieved. The tuning fitted the data in the final case study better and it highlights the challenge when tracking unknown targets. The behavior is unknown beforehand, so it is impossible to find parameters that are ideal in many different scenarios unless multiple model approaches are considered. Nevertheless, it is possible to identify values that are acceptable in several scenarios as shown in this research. -The covariance of the navigation estimates significantly affects the performance. This is particularly observable in the NEES where the major difference is caused by the navigation uncertainty and how the cross covariance is handled. Moreover, the attitude is the critical factor and a mean estimated standard deviation of about 1°to 2°in roll, pitch and yaw seems to have the desired effect. The influence of the position uncertainty is negligible unless the standard deviation is large (5 meters or greater).

Conclusions
This research article has investigated tracking of floating objects using fixed-wing UAVs with a monocular thermal camera. Uncertainty in the UAV position and attitude has been incorporated in the tracking system with the Schmidt-Kalman filter. This design has been used to mitigate issues related to consistency that often occur in this type of UAV mission. A large amount of experimental data has been gathered and analyzed to compare several methods. The results demonstrate that the SKF performs better with respect to consistency and estimation accuracy. Moreover, the position of floating objects is estimated with a mean error below 10 m with the SKF when the UAV operated at altitudes of 200 m to 350 m. Similar performance is also demonstrated in extended periods without target measurements and shows the reliability of the system.

Acknowledgements
The authors are grateful for the excellent assistance from UAV operators Lars Semb and Pål Kvaløy.

Declarations
Ethics approval and consent to participate This paper does not report research that requires ethical approval. Consent to participate or consent to publish statements are accordingly also not required.

Competing interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.