1 Introduction

ATGMs are weapons designed to destroy heavily armored tanks and vehicles, and come in various forms that range from portable shoulder-launched missiles to larger tripod-mounted weapons, and also can be mounted on vehicles. In addition, these powerful portable ATGMs have given the infantry the capability to take out tanks from far distances on the battlefield. Traditional first-generation weapons, such as anti-tank RPG, have limited precision and range, with an accurate range of only 100 m. Third-generation weapons like the Javelin are also expensive and affected by electronic disruptions, with a range of only 4 km (Lightweight CLU) and a cost of US $249,700 (Lightweight CLU only). To improve the performance of ATGMs, they need to become smart, autonomous, inexpensive, and have a longer range by incorporating artificial intelligence technologies, such as object detection models and computer vision mechanisms.

1.1 Anti-Tank Guided Missiles

Anti-Tank guided missile (ATGM) systems are created to hit and destroy heavily armored tanks and other armored vehicles. These systems can vary in size and include shoulder-launched weapons that can be carried by one soldier, larger weapons that require a squad or team to carry and fire, and missiles mounted on vehicles. The development of portable ATGM systems with large warheads has given infantry the power to defeat armored tanks from far away. The first generation of portable ATGM systems were either manually or semi-autonomously controlled. Manual systems required an operator to steer the missile to its target, while semi-autonomous systems required the operator to keep the sights on the target until impact, and guidance commands were sent to the missile through wires, radio, or laser markings. Figure 1 illustrates the first-generation Russian 9M133 Kornet anti-tank missile. Third-generation ATGM systems, such as the Javelin, use a thermal seeker at the front of the missile for guidance and are “fire-and-forget”, meaning that once the target is identified, the missile does not require further operator input during flight. The fire-and-forget system allows operators to retreat after firing, reducing their vulnerability. The portability of these systems allows infantry to quickly change positions, which is an advantage in urban fighting. However, due to the frequent transportation, these systems can be more susceptible to damage. Damage to the ATGM airframe or control surfaces may affect their performance, as precision sensors and guidance are required to fire these systems [1].

Fig. 1
figure 1

Russian 9M133 Kornet anti-tank missile

Third-generation ATGMs are shown in Fig. 2. This generation of ATGMs has two types of folded fins: the front and rear fins. The front fins are static, while the rear fins are movable. In addition to the guidance and control system, these ATGMs consist of booster rocket components, a rocket sustainer, main and precursor warheads, and an infrared seeker or camera [2].

Fig. 2
figure 2

ATGM and control system

The Javelin Weapon System is a portable, fire-and-forget anti-tank guided missile system that consists of two parts: a command launch unit (CLU) and a missile encased in a carbon fiber launch tube. The launch tube protects the missile from the environment and provides electrical connections. The missile has a 5-inch diameter body and eight mid-body wings, as well as four control fins at the rear that are controlled by pivot arms connected to actuators. The seeker at the front of the missile uses infrared to guide the missile to its target, and the autopilot is stored in the CLU. The operator can choose between a top attack or direct attack trajectory, with the top attack trajectory being the default and most commonly used. Low-rate production of the Javelin began in 1996 and it was first deployed in 1997, seeing extensive use during Operation Iraqi Freedom. After the conflict, many of the missiles were returned to storage. Figure 3 illustrates Javelin rockets. The initial phase of flight for an anti-tank guided missile (ATGM) is known as pre-guidance, which occurs before the launch motor is ignited. During this phase, the system checks that the missile is a safe distance from the launch point before proceeding. The 6DOF simulation models pre-guidance by setting initial velocities and Euler Angles. The next phase is called climb-out, in which the missile ascends at a constant rate until it reaches an altitude of 120 m or its line of sight (LOS) nears its maximum range. This second condition, which only occurs when targeting close-range objects, is when the missile enters terminal guidance before reaching 120 m. If the altitude reaches 120 m without meeting the LOS criteria, the missile then enters altitude hold mode. As the missile nears the target in altitude hold mode, the seeker angle becomes increasingly negative. The missile then transitions from altitude hold to terminal guidance when the seeker angle exceeds its negative limit. During terminal guidance, the missile's trajectory is adjusted to intercept the target by maintaining a constant seeker angle. Figure 4 shows a visual depiction of the Javelin trajectory previously defined [3].

Fig. 3
figure 3

American FGM-148 Javelin ATM

Fig. 4
figure 4

Javelin trajectory

First-generation ATGMs have limited range and require the operator to keep the sights on the target until impact. Operators must have certain skills to perform this task correctly, putting their lives at risk. In contrast, third-generation ATGMs use modern technology such as lasers and electro-optical image seekers to guide missiles toward targeted tanks, but are affected by electronic disruptions and are expensive, such as the Javelin, which costs $249,700 (Lightweight CLU only) and has a limited range of 4 km.

Using video streams and computer vision technologies to locate targets is significantly cheaper than using active sensors; for example, the cost of the control circuit designed and constructed here was approximately $700, while the Javelin control circuit costs $249,700 (Lightweight CLU only). Therefore, there is a need for artificial intelligence weapons that are smart, have wide range and are accurate to decrease the number of casualties and amount of damage to civil properties. In addition, these new smart weapons can be cheaper and have more range than available weapons.

1.2 YOLO Model Family

Object detection involves drawing a bounding box around specific objects in an image and assigning them a class label. The most commonly used object detection models are the region-based convolutional neural network (R-CNN) and You Only Look Once (YOLO). R-CNN is more accurate, but slower, whereas YOLO is faster and can conduct object detection in real time. The YOLO family was evolved by Redmon et al. in 2015. YOLO utilizes all top-most feature maps to predict confidences, multiple categories, and bounding boxes.

1.2.1 Yolo

The YOLO model takes an image as input and uses a single CNN trained end-to-end. The model predicts bounding boxes and class labels for each bounding box. YOLO divides the input image into a grid of cells, with each cell responsible for predicting the object centered in that cell. Each cell predicts multiple bounding boxes and corresponding confidence scores. The class probabilities map and the bounding boxes with confidences are then combined to form a final set of bounding boxes and class labels. Figure 5 illustrates the YOLO model predictions [4].

Fig. 5
figure 5

Summary of the predictions made using the YOLO model

1.2.2 Yolov5

YOLOv5 was released on May 27, 2020, by Ultralytics LLC (Los Angeles, CA, USA). It balances detection accuracy and real-time performance, with detection speed of up to 140 frames per second. It is lightweight and has high detection accuracy. YOLOv5 includes the following versions: YOLOv5s, YOLOv5l, YOLOv5x, and YOLOv5m. YOLOv5s has the best detection speed and smallest network structure, but also has the lowest accuracy, as shown in Fig. 6. To use YOLOv5, the data set needs to be changed to a specific annotation format, by providing a.txt annotation file with the same name as the image. This file contains annotation information in each line. After completing the annotation, the data set is exported in YOLO format, creating a.txt file for each image. In this case, XML format, which contains the class_id and bounding box coordinates for each detected object in each image, was used. The data set was then divided into a training data set of approximately 500 images of Russian tanks and a validation data set of approximately 150 images of Russian tanks. Finally, the YOLOv5 model was trained and predictions were made using it [5].

Fig. 6
figure 6

Versions of YOLOv5

The purpose of this paper is to design and construct a wide-range, smart, and autonomous control circuit for ATGMs. The circuit detects targeted tanks using an object detection model such as YOLOv5 and then uses computer vision technologies to locate the target and direct the missile toward it. Because the constructed control circuit is smart and autonomous, it is accurate, resulting in fewer casualties and less damage to civil properties. In addition, this designed control circuit is much cheaper than the Javelin and Lahat missile control circuits.

2 Methods

The following fundamentals and principles of computer vision were utilized in this paper.

2.1 Epipolar Geometry

Epipolar geometry is the intrinsic projective geometry between two images, that relies on not only the relative pose of the cameras but also their internal parameters. It uses two cameras that have centers that are connected by a baseline. For example, point X is imaged in two views in 3D space at xl in the first view and at xr in the second view, where xr is the corresponding right point in the right camera image of point X. Figure 7 illustrates the epipolar geometry [6].

Fig. 7
figure 7

Epipolar geometry

2.2 Fundamental Matrix

$$ {\text{x}}_{{\text{r}}}^{{\text{T}}} {\text{F x}}_{{\text{l}}} = \, 0, $$
(1)

xl is the left image of a point X in the left camara, xr is the corresponding right point in the right camara image of point X.

This equation is the fundamental matrix, which states that multiplying a point in the left camera image by the fundamental matrix and finding the corresponding point in the right camera image will result in zero. The fundamental matrix captures the relationship between corresponding points in two images. To solve this equation, we need to know the left point and the corresponding right point. We need one equation for each pair of corresponding points, so we need eight points to find the eight unknowns, since scaling is not significant. This is a homogeneous set of equations and F can only be determined up to scale [7].

2.3 Normalized Eight-Point Algorithm

The normalized eight-point algorithm is the easiest approach for calculating the fundamental matrix. It was developed by Longuet-Higgins in 1981. The first step in the normalized eight-point algorithm is normalizing the input points before preparing the set of equations. To achieve a stable outcome, points in the view require a simple transformation (translation and scaling) before preparing the linear equations. Each point should be translated and scaled, so that the origin of the coordinate system is the centroid of the referenced points, and √2 represents the root mean square distance of the points from the origin. Figure 8 illustrates real test images with manually matched points.

Fig. 8
figure 8

Real test images with manually matched point

2.4 Computing Fundamental Matrix in Practice

Figure 9 depicts the computation of the fundamental matrix approach. It starts by taking two images of the target tank, as the camera was in motion, two consecutive images were taken for that target. Then, these images were filtered to eliminate noise or converted to grayscale. The next step is the feature detection process, where salient points are detected using a mechanism, such as Harris, SIFT, or ORB detector. The next stage is feature matching, where matched points in the two target images are selected using specific mechanisms, such as the BF matcher. Normalization of the matched points is the next step, and then the fundamental matrix is computed [8].

Fig. 9
figure 9

Computation of the F approach

2.5 Triangulation

Triangulation involves finding the three-dimensional coordinates of a point in space, given its two known image coordinates taken by cameras with known calibrations and poses. Figure 10 illustrates the three world points in space x, where the image of point x in the first camera is located at y1 and the same point x is positioned at y2 in the second camera. The intersection of the two rays, y1o1 and y2o2, determines the position of point x. In 3D, the rays may not intersect due to an imperfect relative orientation or imperfectly matched corresponding point. The projection of x in the left image is

$$ {\text{Z}}_{{1}} {\text{y}}_{{1}} = {\text{ M}}_{{1}} {\text{x}}{.} $$
(2)
Fig. 10
figure 10

Triangulation method

The projection of x in the right image is

$$ {\text{Z}}_{{2}} {\text{y}}_{{2}} = {\text{ M}}_{{1}} {\text{x}}. $$
(3)

Here, M1 = [1 0 0 0; 0 1 0 0; 0 0 1 0] is a one-camera model, and M2 = [R11 R12 R13 Tx; R21 R22 R23 Ty; R31 R32 R33 Tz] is the rotation and translation of view two with respect to view one. Since y1 and M1x are parallel, their cross product should be zero, and since y2

$$ {\text{y}}_{{{1} }} {\text{X M}}_{{1}} {\text{x }} = 0, $$
(4)
$$ {\text{y}}_{{{2} }} {\text{X M}}_{{2}} {\text{x }} = 0. $$
(5)

Point x should satisfy both previous equations. This is a system of four equations, which can be solved for the three unknowns (x, y, z) of point x [9].

2.6 The Constructed Circuit

The first constructed control circuit was a real Anti-tank missile circuit, tested indoors, while the second was a simulator. The first control circuit consisted of the following components:

  1. 1

    Anti-tank missile

  2. 2

    Camera tripod

  3. 3

    A NVIDIA Jetson TX2 Development Kit

  4. 4

    Four servo motors

  5. 5

    Four control surfaces

  6. 6

    A PCA 9685 motor driver

Figure 11 illustrates the real anti-tank missile circuit.

Fig. 11
figure 11

Real anti-tank missile circuit

The second control circuit consisted of the following components:

  1. 1

    A NVIDIA Jetson TX2 Development Kit

  2. 2

    Four servo motors

  3. 3

    Four control surfaces

  4. 4

    A PCA 9685 motor driver

  5. 5

    Pan and tilt camera

  6. 6

    Raspberry Pi 4 microcontroller

  7. 7

    A display

The two control circuits operated as follows: the camera of the NVIDIA Jetson TX2 Developer Kit took two consecutive images of a targeted tank. Then, the YOLOv5 model detected this targeted tank, after which the main program received the two images from one camera to compute the fundamental matrix. The fundamental matrix was determined by detecting the features in the two images of the targeted tank. An ORB detector was used to detect the features in each image, and a BF matcher was used to match the common features in the target images. Then, the coordinates of these matched features were extracted and normalized. The next step was to use the fundamental matrix and triangulation approach to determine the three-dimensional coordinates of the matched points. Then, these coordinates were used to calculate the angles at which one of these matched points was made with the x-axis, y-axis, and z-axis. Finally, the calculated angles were used to move the servomotors, which controlled the control surfaces, toward the targeted tank. This process was repeated continuously as long as the ATGM was moving toward the targeted tank. Figure 12 depicts the constructed simulation circuit, and Fig. 13 illustrates the flowchart of the program that runs on these control circuits.

Fig. 12
figure 12

Simulator circuit

Fig. 13
figure 13

Flowchart of the automated anti-tank missile system with a camera

figure a

3 Results

The aim herein was to design and build a wide range, smart and autonomous anti-tank control circuit that used object detection models and computer vision technologies to forward an ATGM toward a targeted tank. This was achieved by constructed the control circuit and applying the Suliman algorithm as follows. A T90 Russian toy tank was used. The camera of a NVIDIA Jetson TX2 Developer Kit took two consecutive images of the targeted tank. Then, the first step was object detection, where the YOLOv5 model detected the targeted tank, as shown in Figs. 14 and 15.

Fig. 14
figure 14

Object detection of the targeted tank

Fig. 15
figure 15

Object detection of a toy Russian tank

The YOLOv5 model detected the targeted tank in 0.4 s, while YOLOv2 model detected it in 1.1 s. Moreover, the mask-RCNN is very slow with the NVIDIA TX2 board.

The second step was the feature detection process, where the salient points were detected using a detector, such as a Harris, SIFT, or ORB detector. The next stage was feature matching, where the matched points in both images were selected by specific mechanisms, such as a BF matcher, as shown in Fig. 16. Then, the next stage was extraction of the coordinates of these matched points, as shown in Fig. 17. Following this was the normalization of the matched points. In other words, the matched points were translated and scaled, so that the centroid of these matched points was the origin of the coordinate system.

Fig. 16
figure 16

Feature matching process

Fig. 17
figure 17

Extraction of the coordinates of the matched points

In addition, the distance between the matched points and the origin should be equal to √2. As shown in Fig. 18. The next stage was the computation of the fundamental matrix using the eight-point algorithm, as shown in Fig. 19.

Fig. 18
figure 18

Normalization of the matched points

Fig. 19
figure 19

Fundamental matrix

Then, the camera calibration matrix was determined using the chess board approach, as shown in Fig. 20.

Fig. 20
figure 20

Camera matrix

The essential matrix was then computed, since the fundamental matrix equals the essential matrix multiplied by the camera matrix, as shown in Fig. 21.

Fig. 21
figure 21

Essential matrix

Using the triangulation approach, the three-dimensional coordinates of the matched points were computed, as shown in Fig. 22.

Fig. 22
figure 22

Three-dimensional coordinates of the matched points

The next step was choosing one of these matched points and calculating the angles that this point made with the x-axis, y-axis and z-axis using trigonometric formulas, as given in Fig. 23. Finally, these angles were used to move the servomotors toward the targeted tank. This approach was executed repeatedly as long as the missile was moving toward the targeted tank.

Fig. 23
figure 23

Angles of the selected matched point

A pan and tilt camera, which had two servomotors, was used. This camera can move along the x-axis and the y-axis. The servomotors of this pan and tilt camera were connected to a PCA 9685 motor driver, which was connected to the NVIDIA Jetson TX2 Developer Kit. First, the camera of the NVIDIA Jetson TX2 Developer Kit took a video of the targeted tank, and then YOLOv5 detected the tank. Second, two images of the targeted tank were used to repeatedly determine its location using the fundamental matrix and triangulation approach. Then, the calculated x and y angles were used to direct the servomotors of the pan and tilt camera toward the targeted tank. Furthermore, this pan and tilt camera was connected to a Raspberry Pi 4 microcontroller, which was connected to a display. If the target was visible on the display, the pan-and-tilt camera was correctly directed toward the target. Hence, the location determination approach is correct. Therefore, using object detection models to detect a targeted tank and using the fundamental matrix and triangulation technologies to determine the location of the tank resulted in an accurate and inexpensive autonomous ATGM control circuit.

The system was also installed on anti-tank missile, tested indoors, and worked as follows. The NVIDIA Jetson TX2 Developer Kit was connected to A PCA 9685 motor driver connected to four servo motors. These servo motors were connected to the four control surfaces. First, the camera of the NVIDIA Jetson TX2 Developer Kit took a video of the targeted tank, and then YOLOv5 detected the tank. Second, two images of the targeted tank were used repeatedly to determine its location using the fundamental matrix and triangulation approach. Then, the calculated x and y angles were used to rotate the servomotors, which moved the control surfaces toward the targeted tank.

When the targeted tank was moved to another location, the camera of the NVIDIA Jetson TX2 Developer Kit took another video of it, and the x- and y-angles were recalculated to rotate the servomotors, which subsequently moved the control surfaces toward the targeted tank. Hence, the location determination approach is correct, and the system worked correctly.

Therefore, using just two consecutive images and applying computer vision techniques, such as epipolar geometry, feature detection, feature matching, the fundamental matrix, and the triangulation approach, we were able to determine the location of the targeted tank and direct a missile toward it for destruction.

4 Discussion

This paper presents a breakthrough in the design and construction of intelligent weapons. This research designed and constructed a control circuit for a wide range, smart, and autonomous ATGMs. An object detection model and computer vision technologies were used to design and build this control circuit. In addition, not only do smart and autonomous ATGMs reduce the costs of these missiles, but they also enhance the accuracy and increase the range to more than 50 km. Thus, the number of victims and the destruction of civil buildings should decrease. When compared to the control circuit of the FGM-148 Javelin, which is with range of 4 km (Light weight CLU) and cost of US $249,700 (Lightweight CLU only), this smart and autonomous control circuit costs approximately seven hundred dollars. Furthermore, the FGM-148 Javelin uses an infrared seeker to track a targeted tank, whereas the intelligent control circuit presented here uses video frames, an object detection model, and computer vision technologies. However, the object detection model was executed in approximately a half-second of time. Therefore, this model was performed just one time, whereas computer vision technologies are executed repeatedly as long as a missile is moving toward a targeted tank. In addition, outlier features can lead to incorrect locations.

Moreover, The system relies on two clear images of the targeted tank to determine its location, but may not work in harsh conditions, such as fog or rain. However, filters can be used to enhance the images.

In addition, stereo cameras should be used for long distances to avoid delays caused by a single camera (Table 1).

Table 1 Comparison between three different types of ATGM

This is an era, where advanced technologies are being used to create efficient, autonomous weapons capable of hitting targets precisely and at long distances.

5 Conclusion

ATGMs are commonly used in conflicts worldwide to protect countries and destroy enemy tanks and heavy vehicles. However, current ATGMs like the FGM-148 Javelin have limited range and are expensive. Traditional missiles like the 9M133 Kornet lack precision and also have limited range. To improve ATGMs, smart and autonomous control circuits were designed using object detection models, such as YOLOv5 and computer vision technologies. These smart, autonomous, and accurate ATGMs are cheaper and have a wider range, resulting in fewer casualties and less damage to civilian properties. This is the era of intelligent weapons that use advanced technologies to create efficient and cost-effective weapons.