1 Introduction

The calculation of the motion of objects inside a scene that is caused by the relative motion between the scene and the camera is what is referred to as optical flow [10]. Optical flow may be able to solve a number of problems, such as object tracking, video segmentation, structure from motion, gesture tracking, and so on [13, 22, 30]. The underlying idea behind optical flow algorithms is that each algorithm is created to compute the movement of objects in a scene over the course of a series of images by examining the movement of pixels within an image. This is accomplished by applying the algorithm to the sequence of images [9].

The term “augmented reality” describes a category of interactive technology that combines elements of the virtual and physical worlds, [11]. The most prominent characteristics of augmented reality are the combination of actual and virtual components, the use of real-time or nearly real-time applications, and the precise three-dimensional positioning of virtual items [15]. On the other hand, the phrase “virtual reality” refers to the technology that can imitate the complete surrounding environment. For the purposes of simulating a user’s actual physical presence in a virtual setting, the entire world in virtual reality is produced virtually from the real-world [6, 23].

One of the things that these two types of technology have in common is the application of optical flow algorithms for the purpose of computing the 3D information of the surrounding environment. Using optical flow, data regarding the location of items in 3D space within a scene may be determined [1, 18, 26]. This information is valuable for augmented reality technologies because it enables more accurate positioning of virtual objects in 3D space. It is also used to generate depth data, which enables the incorporation of occlusions and other phenomena into augmented reality (AR). Additionally, real-time structure from motion can be created using optical flow, allowing for the quick creation of 3D environments and other assets for use in virtual reality. Another use for optical flow is this.

There are three fundamental approaches that a camera might take to record an environment. These images were taken by the camera while it was moving in a circular, linear, and random pattern. When capturing a scene with a camera, circular motion occurs when the camera is rotated along a fixed axis. Movement of the camera in a straight line, parallel to the action, is called linear motion. In addition, random motion is achieved by moving the camera in all six degrees of freedom simultaneously. The relative mobility of the objects is what determines how the optical flows behave. The manner in which the scene is filmed as well as the movement of the camera both have an effect on the relative motion. These camera motions have the potential to have an impact on the quality of the output produced by optical flow algorithms, which in turn has an impact on the technologies of augmented and virtual reality [2, 27].

The primary objective of this research project is to investigate how camera motions affect optical flow calculations and the accuracy of the flow details. We created the data set in accordance with the research requirements because there is not much research on camera motion and optical flow analysis in the literature. The data set includes a variety of camera motions and the optical flow that was taken into consideration, most of which are suitable for near real-time applications. The research project, therefore, focuses on analyzing performance based on delay and subjective quality of the results that are obtained. In addition, the application that we are targeting are on casual devices for near real-time processing, as a result we have considered the algorithms-based optial flow and excluded learning-based methods.

We collected four datasets for this investigation, each of which contains objects in a distinct context and lighted by either natural or artificial light. Three different camera motions that were available were used to capture each dataset. We implemented and tested the most popular optical flow methods before building and evaluating them on a variety of datasets. The results are then compared based on the quality of the results as well as the time it takes to compute each frame of a movie. We are undertaking research to determine optical flow algorithms that perform the best in terms of latency and image quality for usage in near real-time applications such as augmented reality and virtual reality.

This paper has been divided into 4 parts: Literature survey, methodology, experimentation and results, and conclusion.

2 Literature survey

There have only been a few of articles published on benchmarking the various optical flows that are now available [8, 29]. The vast majority of them make use of synthetic data since it is simpler for the optical flow algorithms to operate on them. McCane et al. [14] has made notable contributions to this field with his work on testing optical flow algorithms using real-world data sets for complicated sceneries. The vast bulk of current research projects are focused on developing methods for inferring camera motion from optical flow [2, 27]. On the other hand, the purpose of this work is to investigate the performance of optical flow algorithms using a variety of camera motions.

Butler et al. [5] has attempted to create a synthetic data set that resembles nature using the animated movie Sintel [19] for optical flow benchmarking. Their data set includes significant elements that are typically present in real-world video recordings, such as lengthy sequences, vigorous motions, specular reflections, motion blur, defocus blur, and atmospheric effects.

The various optical flow algorithms used in this paper are: Deepflow [24] algorithm—despite the availability of numerous optical flow algorithms, handling large displacement in optical flow is still an open problem. The Deepflow algorithm specifically focuses on this problem to solve the optical flow of fast motion in a scene. It is designed to easily handle large displacements occurring in realistic data captures.

DISflow [12] algorithm—Most optical flow algorithms that are available today focus on the accuracy of the result and neglect the time complexity of the algorithm. However, in real-life applications like tracking and pattern detection, the time complexity plays an important role. DISflow or Dense Inverse Search optical flow focuses on producing a dense optical flow with a very low time complexity with the quality of the result comparable to its competitions. It is much faster than most popular optical flow algorithms with similar quality. Therefore DISflow is an ideal algorithm for real-time applications.

Farneback’s [7] algorithm—Farneback proposed a novel two-frame optical flow algorithm based on polynomial expansion. The polynomial expansion is used to approximate the location of the neighbouring pixels of each pixel with a polynomial. This algorithm uses just two frames to calculate the optical flow algorithms to compensate for the background motion in a shaky video.

PCAflow [25] algorithm—PCAflow algorithm is based on sparse-to-dense calculation which initially calculates the sparse features and then interpolates them to dense points. This is possible in minimal time as they use 8 h of commercial movies to make the algorithm learn how sparse points translate to dense points. This lets the algorithm estimate the dense optical flow using the sparse points. This makes the algorithm very efficient.

RLOF [20] algorithm—the most common optical flow methods are global algorithms and use a robust estimation framework. These aim to achieve high accuracy when compared to local estimation algorithms. But real-life applications like augmented reality, object tracking, and pattern detection requires a fast estimation which is only possible with local estimation algorithms. Therefore this algorithm focuses on creating a Robust Local Optical Flow algorithm for a fast as well as an accurate performance.

Simpleflow [21] algorithm—SimpleFlow is an algorithm whose running times increase sub-linearly in the number of pixels. It estimates the flow of motion in a probabilistic way using only local estimation without having to use global optimization. It uses a dynamic approach to calculate optical flow. When the motion in a sample is small and smooth, it uses the sparse model to estimate the motion to save computational power and time.

PyrLK [3] algorithm—pyramidal Lucas Kanade is an iterative model of optical flow. The pixel displacement in real-world scenarios are more than what a basic optical flow algorithm can calculate upon. To solve this problem, the image is reduced to a smaller size in an iterative fashion which reduces the number of displaced pixels each time the size is reduced. This is done to the point where an optical flow algorithm (here, Lucas Kanade algorithm) can easily work on, and then the flow is applied upwards as the images are scaled up. This is also called the Sparse to Dense optical flow algorithm.

TVL1 [17, 28] algorithm—one of the most successful algorithms to estimate the motion of objects between two images are variational optical flow algorithms. TVL1 is an algorithm that is based on the total variation regularization and the robust \(L^1\)-norm of the optical flow constraint. This helps the algorithm to work around the discontinuities in the field of optical flow and have a higher robustness against the changes in illumination and noise.

The most commonly used optical flow algorithms have been considered in this work and used for comparative analysis. We have captured our own datasets for experimentation and study purpose.

3 Methodology

3.1 Dataset creation

In this section, we have discussed the different camera motions in the real-world capturing.

Fig. 1
figure 1

Camera capture in different ways

In real-world scenarios, there are several ways in which a camera can be moved to capture a scene. In general, the camera motion can be classified into three groups-circular, linear and random motion. Circular motion is when the camera is rotated around a fixed axis to capture a scenes shown in Fig. 1a. In linear motion, shown in Fig. 1b, camera is moved in a straight line parallel to the scene, and in random motion, shown in Fig. 1c, the camera is freely moved in any direction of freedom to capture the scene. The study takes into account all types of random motion, including diagonal, forward, and backward motion as well as linear (vertical and horizontal), circular, and anti-clockwise motion.

Depending on which way the pinhole camera is pointed, the movement of the camera has an impact on the movements of the pixels. The objects will be positioned at various depths in the real world, and the camera line that connects the pinhole to the object will be tilted and moving in a variety of ways. When travelling in a circular or linear motion, this camera line will be parallel; when moving in a random pattern, however, it may converge or diverge. As a result, the goal of the research project is to carry out a qualitative analysis of camera lines about optical flow outcomes. As shown in the Fig. 2.

Fig. 2
figure 2

Camera line for different camera motions

The scene that was captured has an effect on real-world factors such as occlusions and the varying depths of the objects in the image. This effect can be seen as a reflection of the real-world factors that were present at the time the scene was captured. Occlusion is the phenomenon that occurs in the field of computer vision when a portion of a scene is covered, either by an object or by moving outside the bounds of the camera’s field of view. This can be described using the term “occlusion”, which is a term that is used to describe the phenomenon. There are many possible explanations for why this would occur. When elements in a scene are organized at different depths, this can also cause certain aspects of the image to become concealed. As a result of the movement of the camera, elements that are situated at greater depths have a greater risk of having their visibility obscured by elements that are situated closer to the camera. This is due to the fact that those objects are situated in closer proximity to the camera.

3.2 Optical flow

The method utilized to record the scene has a direct bearing on the effectiveness of optical flows. The movement of the camera can influence both the rate and quality of the optical flow algorithm’s output. When circular motion capture is performed, there is very little relative motion between the objects that are being maintained at varying depths in three-dimensional space. This is due to the fact that motion capture is performed in a revolving manner. This effect will occur when the camera is moved in a circular motion around a fixed axis and all objects move in a direction that is perpendicular to the camera’s path. When there is no relative motion between the objects being examined, it is challenging for an optical flow algorithm to produce accurate results. When a camera is moved in a linear motion parallel to the scene, there is relative motion between the objects captured in the photograph. Because the camera itself is in motion. Videos captured in linear motion are more conducive to the application of optical flow algorithms because they are simpler to manipulate. This ultimately results in improved outcomes. During random motion capture, there is a significantly increased amount of relative motion, as well as a significantly increased chance that certain objects will be obscured by other objects or exit the frame entirely. These two factors contribute significantly to an increase in the total amount of motion. This produces exceptionally high-quality conclusions regarding optical flow.

To carry out this experiment, we have decided to make use of a total of eight distinct algorithms for optical flow. These are some of the more typical algorithms that can be developed with OpenCV and put to use in a variety of different applications. Both the length of time necessary to display each frame and the accuracy of the conclusions drawn from the optical flow calculations played a role in the decision to go with a particular set of algorithms. In this article, the following computational procedures were applied:

  • DISflow [12]

  • Deepflow [4, 24]

  • Simpleflow [21]

  • Farneback’s [7]

  • TVL1 [17]

  • PCAflow [25]

  • RLOF [20]

  • PyrLK/Sparsetodense [3]

According to Donovan [16], for the basic calculation of optical flow, it is assumed that the in the short time between time t1 and t2, the object may change position but the reflectivity and illumination stays constant. We must assume that objects are illuminated uniformly, and that surface reflectance does not contain specular highlights. The optical flow algorithms work on two frames which are taken at times t and \(t + \Delta t\). Keeping the brightness constancy constraint true, for a \(2D + t\) dimensional case, we get:

$$\begin{aligned} I( x, y, t ) = I( x + \Delta x, y + \Delta y, t + \Delta t ) \end{aligned}$$

where I(xyt) is the intensity, (xyt) is the pixel location at time t, and \(( \Delta x, \Delta y, \Delta t )\) is the change in location of pixel in time \(\Delta t\).

Since the movements are small, using Taylor’s expansion, and ignoring the higher-order terms, we get:

$$\begin{aligned} \begin{aligned} I( x +\Delta x, y + \Delta y, t + \Delta t)&= I( x, y, t ) + \delta I.\Delta x/\delta x \\&\quad + \delta I.\Delta y/\delta y + \delta I.\Delta t/\delta t \end{aligned} \end{aligned}$$

or

$$\begin{aligned} \begin{aligned} \delta I.\Delta x/\delta x + \delta I.\Delta y/\delta y + \delta I.\Delta t/\delta t = 0 \end{aligned} \end{aligned}$$

Dividing by \(\Delta t\), we get:

$$\begin{aligned} \begin{aligned} \delta I.V_x/\delta x + \delta I.V_y/\delta y + \delta I/\delta t = 0 \end{aligned} \end{aligned}$$

Here, \(V_x\) and \(V_y\) are the x and y components of velocity of the intensity I(xyt). The \(V_x\) and \(V_y\) can also be called the x and y components of the optical flow of the intensity. \(\delta I/\delta x\), \(\delta I/\delta y\), and \(\delta I/\delta t\) are the derivatives of the frame at position (xyt) and can be written as \(I_x\), \(I_y\), and \(I_t\), respectively.

Therefore the final equation becomes:

$$\begin{aligned} \begin{aligned} I_x.V_x + I_y.V_y = -I_t \end{aligned} \end{aligned}$$

or

$$\begin{aligned} \begin{aligned} \nabla I.\vec {V} = -I_t \end{aligned} \end{aligned}$$

Here, we need two unknown variables to solve the optical flow equation which cannot be solved on its own. This is called the aperture problem and is solved by different optical flow algorithms by making different assumptions.

4 Experimentation and results

This experiment aims to compare the efficiency of optical flow algorithms which can be used in real-time applications using easily available casual hardware devices. For this purpose, the dataset has been captured using a smartphone. The specifications of the capturing device is as shown in Table 1.

Table 1 Capture device specifications

There are four separate datasets, and each of them was obtained by moving the camera in one of three different ways: either randomly, linearly, or circularly. The background of each segment was provided by one of a number of different locations. The indoor table scene features a table that is artificially lit, and on it are a variety of objects that are kept at different distances from the camera. The scene takes place in an interior setting. The robustness of the algorithms can be evaluated using the examples of inter-object occlusions that are contained within this dataset. (Cases of inter-object occlusions) The purpose of the scene involving the indoor plants is to test the efficiency of the algorithms in confined spaces by utilizing a number of potted plants that are kept in a linear fashion in a room that is well-lit. There are a few cars that are parked at varying distances from the camera in the dataset that contains the outdoor parking scene. The dataset is very large and has a dim lighting scheme, and it contains the scene of an outdoor parking lot. Because of this, it is possible to include occlusions not only between the objects themselves but also in the background. In the scene that takes place in an outdoor garden, there are several potted plants that are shown in the light, and the elements in the background are placed at a considerable distance from the elements in the foreground as well as the camera.

OpenCV and C++ have been utilized in the process of implementing the optical flow algorithms. Using QT creator, a straightforward GUI was developed. The complete module has been developed on the device (Table 2).

Table 2 Processing device specifications

4.1 Computational delay analysis

The amount of time, measured in milliseconds, required by each algorithm is computed and saved in a file. This file is then read in order to generate a line graph that compares the various processing times required by each algorithm. In this evaluation, we will estimate the amount of time that it takes for the various algorithms to complete the computations for the different camera motions. Because optical flow computes the relative movement of the pixels in a frame by using frames, we are currently analyzing the influence that camera direction has on the computation. It is evident from Fig. 2 that the direction in which an object is captured affects the optical flow, even if the object itself is identical.

In this section, the video data set that are generated are categorized based on the camera motion. All the video data set are used to compute the optical flows using various algorithms. Figure 3 compares the time taken by each algorithm for circular motion in different scenes. The camera direction that looks outward is analyzed in this experiment. Similarly, Fig. 4 compares each algorithm when the camera is moved in a linear motion and Fig. 5 compares them in random motion.

Figures 6789101112, and 13 shows the computation delay taken by DeepFlow, DISFlow, Farneback, PCAflow, RLOF, SimpleFlow, SparseToDense, and TVL1 optical flow algorithms respectively.

Fig. 3
figure 3

Performance of different optical flow algorithms in circular motion

Fig. 4
figure 4

Performance of different optical flow algorithms in linear motion

Fig. 5
figure 5

Performance of different optical flow algorithms in random motion

Fig. 6
figure 6

Performance of Deepflow algorithm in different settings

Fig. 7
figure 7

Performance of DISflow algorithm in different settings

Fig. 8
figure 8

Performance of Farneback algorithm in different settings

Fig. 9
figure 9

Performance of PCAflow algorithm in different settings

Fig. 10
figure 10

Performance of RLOF algorithm in different settings

Fig. 11
figure 11

Performance of Simpleflow algorithm in different settings

Fig. 12
figure 12

Performance of SparseToDense algorithm in different settings

Fig. 13
figure 13

Performance of TVL1 algorithm in different settings

4.2 Qualitative analysis

We compare the quality of the computations of each optical flow algorithm for all 4 datasets and for the 3 camera motions. The colour scheme of the outputs is based on the HSV colour wheel shown in Fig. 14.

The Tables 345, and 6 provides additional information regarding the subjective quality of the results that were obtained from the optical flow algorithms. Because the color image of the result is displayed here, we kindly ask the reader to refer to the colored print of this research article. It is clear from looking at the images that there is a lack of reliability in the quality of the flow details when there is circular motion. Here camera line is more diverging, which leads to more poor reliability on the flows that are estimated. This is something that we covered in the previous section. On the other hand, the findings demonstrate that random motion generates more accurate flow information in comparison to linear and circular motion.

Fig. 14
figure 14

HSV colour wheel

Table 3 Performance of optical flow algorithms in outdoor garden scene
Table 4 Performance of optical flow algorithms in indoor plants scene
Table 5 Performance of optical flow algorithms in indoor table scene
Table 6 Performance of optical flow algorithms in outdoor parking scene

4.3 Observations

The results of this experiment indicate that the quality of the circular motion-based camera produces flow details that are less reliable when compared to those produced by cameras based on linear and random motion. There is a correlation between the camera line and the generation of optical flow. Therefore, it is essential to use image capture based on random motion when working with 360-degree images to create 3D models. By increasing the radius and making the model more linear, one can mitigate the effects of circular motion and obtain more accurate information regarding flow.

Additionally, we can also observe the camera motion effects on computation delay and also the quality of the optical flows generated from the experimentation. From the results (Figs. 345) it is evident that the time take in random motion is comparatively higher than in linear and circular motions. In random motion, the relative position of the pixel is located randomly in the reference frame, hence the time taken to locate the pixel is high. However, circular and linear motion involves pixel shift horizontally and hence locating the pixel in the reference frame is less. Figure 6 demonstrates the time taken by the DeepFlow algorithm for various datasets.

Similarly, we have compared the results for qualitative analysis. Tables 345 and 6 shows the optical flow results for various datasets. The results show that the flow is more clear when the camera motion involves random capture.

Figure 15 demonstrates the overall findings obtained from the experiment. This chart illustrates the typical amount of time that elapses between each frame as a result of the algorithms. The findings indicate that DISFlow performs well and is appropriate for applications that require near real-time processing.

In our experiment, we find that the TVL1 flow algorithm, when computed on a CPU, takes the largest time to calculate the flow for one frame. DISflow algorithm, on the other hand, shows the least computational delay among all the 8 algorithms. The results of Simpleflow algorithm shows the most accurate edge and motion detection but is around 15 times slower than DISflow. The Pyramidal Lukas Kanade or the sparseToDense algorithm produces good quality results in minimal time. It is almost 2 times slower than DISflow but produces significantly better quality results.

Fig. 15
figure 15

Average time per frame taken by the Optical Flow algorithms for each data set

5 Conclusion

The movement of the camera affects the computations of optical flow because these computations depend on the position of the camera. For technologies like augmented and virtual reality, which both require optical flow calculations to be performed with a high degree of precision and only a small amount of latency, we made an effort to evaluate optical flow algorithms. In this work, we have considered various camera motions such as circular, linear, and random motions. Using datasets that accounted for each of the three distinct camera motions, we were able to determine which algorithms are best suited for the various scenarios. The results of this experiment indicate that the circular motion-based camera produces less reliable flow details than linear and random motion-based cameras. The efficient algorithms will be utilized in future projects involving augmented and virtual reality applications requiring shape recognition and analysis. These endeavors will occur in the future.