A two-level computer vision-based information processing method for improving the performance of human–machine interaction-aided applications

The computer vision (CV) paradigm is introduced to improve the computational and processing system efficiencies through visual inputs. These visual inputs are processed using sophisticated techniques for improving the reliability of human–machine interactions (HMIs). The processing of visual inputs requires multi-level data computations for achieving application-specific reliability. Therefore, in this paper, a two-level visual information processing (2LVIP) method is introduced to meet the reliability requirements of HMI applications. The 2LVIP method is used for handling both structured and unstructured data through classification learning to extract the maximum gain from the inputs. The introduced method identifies the gain-related features on its first level and optimizes the features to improve information gain. In the second level, the error is reduced through a regression process to stabilize the precision to meet the HMI application demands. The two levels are interoperable and fully connected to achieve better gain and precision through the reduction in information processing errors. The analysis results show that the proposed method achieves 9.42% higher information gain and a 6.51% smaller error under different classification instances compared with conventional methods.


Introduction
Computer vision (CV) is an artificial intelligence-aided paradigm that is used to detect the digital world via cameras, videos, and deep-learning methods [1]. It classifies objects in an appropriate manner and responds to them automatically. CV focuses on the 3D modeling of objects, multiple-camera geometric analyses, cloud processing, and inferences based on the motion [2]. It acquires input from the machine and processes the output based on specific knowledge, namely, object labels and synchronization. It also includes developing particular technologies, such as image recognition, visual recognition, and facial recognition [3]. Mostly, CV is used to achieve high-level understanding of digital processing. The task of CV is to capture, examine, and recognize digital objects and extracts in higher dimensions [4]. This task allows it to provide the scientific or representative information needed for the transformation of images used in geometry, physics, statistics, and learning theory [5].
Human-machine interactions (HMIs) involve communicating and cooperating with the machine through a user interface. These interactions are carried out between the user and the machine to control the machine's intuitive behaviors [6]. In recent years, specific sensors have been used to capture the normal, abnormal, and neural postures necessary for controlling the machine [7]. In HMIs, CV is used to acquire high-level photos and videos used in the understanding of these postures/images. CV is related to HMI in that they both detect and monitor objects as well as control them by determining the conditions under which they operate [8]. One of CV's upgraded applications is hand-gesture recognition, which monitors the postures used in HMIs and interacts with them. The goal of this application is to obtain vigorous non-specific vision [9]. It is used in automatic parking, controlling music with a gesture, eye tracking with multi-ple touches, and so forth. CV and HMI are commonly used together in virtual reality applications. In comparison to other baseline technologies, CV combined with HMI produces the smallest errors [10].
The data processing in CV is done by acquiring an image, then extracting the information that needs to be processed. CV obtains the input images using a high-level approach; based on this approach, classifications are attained that render the appropriate results [11]. Thus, recognition methods are developed to detect the images. The obtained images interact with the machine through HMI applications, such as assistance devices relying on voice and hearing [12], developed for specific platforms. Many prediction-based methodologies and machine-learning algorithms have been developed for CV detection that are related to HMI [13,14]. The current work addresses the errors that occur when combining structured and unstructured data. It uses the tree method for classification, which allows it to attain maximum gain, and regression is used to reduce the error in combining the structured and unstructured data, maximizing the precision. The main contributions of this paper are as follows: • We maximize the structured and unstructured data classification accuracy by applying the classification and regression approach. • We reduce the error rate using two-level visual information processing (2LVIP) techniques, which help to minimize the misclassification rate. • Furthermore, we obtain the maximum information gain value for both the structured and unstructured data. • Finally, we improve the precision by classifying the structured and unstructured data and reduce the error by combining the classification and regression methods.
The rest of the paper is arranged as follows. "Related works" describes the various research options in terms of vision-based information processing, "Proposed 2LVIP method" explains the 2LVIP working process, "Results and discussion" evaluates the efficiency of the 2LVIP system, and conclusions are made in "Conclusion".

Monocular robot vision (MRV) was proposed by Chan and
Riek [15] for unseen salient object detection done in parallel with discovery prediction. Unsupervised foraging of objects (UFO) is the fastest and most accurate method for notable object discovery. It is done via the real-world perceptions of robots. The main intention is to improve autonomy and resolve robotic challenges while engaging in a salient object discovery process. The embedded computer-vision system in traffic surveillance was introduced by Mhalla et al. [16] for multi-object detection. This method, which is used for detecting traffic objects in traffic scenarios, consists of a robust detector that makes use of a generic deep detector and enhances detection accuracy.
Wang et al. [17] developed a scale-aware rotating-object detection system at low-level high resolution for obtaining high-level semantic information with aerial imagery. Intersection-over-union (IoU) loss coupled with scale diversity detects orientation. The proposed method is used to improve the accuracy of the rotating bounding box. Kulik et al. [18] addressed CV for intelligent robots by proposing a convolutional neural network for object detection that detects flags indicating unsatisfactory results for different objects. Training and the testing of objects is maintained throughout.
Shin et al. [19] equipped unmanned surface vehicles with object detection and tracking abilities to improve accuracy. The proposed system contributes to an extensive baseline stereo vision system designed to enhance sea surface estimation. It is applicable for long-range object detection, and the semantic segmentation required for detecting objects is done via the oblique convolution designed by Lin et al. [20]. The artificial intelligence developed for the CV uses pixel classification. The hourglass network analyzes the local extremes.
Maggipinto et al. [21] modeled two-dimensional data for CV virtual metrology (VM) using deep learning. The model is used for automatic feature extraction to improve the accuracy and scalability of the VM. The modeling is done using both spatial and time evolutions on real industrial data, including data semiconductor manufacturing. Luo et al. [22] introduced a vision-based detection system for a dynamic workspace involving workers on foot. Multiple detections are used for object tracking and action recognition, and the system determines two types of action data, such as classes and locations. A density-based spatial clustering algorithm is then used to analyze the dynamic workspace.
Jiang et al. [23] proposed fusing spatiotemporal data for hydrological modeling with vision-based data. Three steps are used in this model. The first step involves the fusion of multi-source spatiotemporal for the incorporation of big data. The second step is associated with shortand long-term forecasting, whereas the third step models the streamflow. A multi-object detection (MOD) method is introduced for autonomous vehicle applications that fuses three-dimensional light detection and ranging (LIDAR) and camera data. Zhao et al. [24] provided solutions for recognizing objects by identifying regions of interest (ROI) in the initial processing stage. Later, a convolutional neural network (CNN) is adopted for the recognition of objects. Sliding windows are used for the candidate object region detection of real-time autonomous vehicles. The introduced system maximizes the object detection region and minimizes the misclassification error rate. Liu et al. [25] implemented a reference frame Kanade-Lucas-Tomasi (RF-KLT) algorithm for extracting the features in fixed regions. The dimensions of the features are reduced to detect the class boundaries. This work was done in a realtime environment, allowing the efficiency of the system to be evaluated with the help of an augmented and actual video dataset. The system was able to classify anomalies successfully in a robust and cost-effective manner. Fang et al. [26] proposed integrating CV with an ontology that can be used to identify the hazards on construction sites using a knowledge graph. Shu et al. [27] introduced a human-computer interaction mode through the interactive design of intelligent machine vision. Their objective was to improve the accuracy of the functioning algorithm, and the point-and-click results are based on Fitts' Law.
In this work, two-level visual information processing is considered for performing semantic object detection. The efficiency of the system is evaluated against the methods used in [15,24], and [25] because these methods work perfectly while analyzing objects. In other words, they are effective in detecting objects and regions in a robust and cost-effective manner. In addition, these methods can be used to evaluate real-time datasets and improve the object and region recognition process.

Proposed 2LVIP method
CV is used for learning the instances of semantic objects in an automatic detection manner. This paper's objective is to improve the precision of classifying structured and unstructured data together, thus reducing the error introduced by combining these classification methods. It uses regression methods to do so; that is, the proposed method uses two-level visual information processing (2LVIP) to obtain the maximum gain from the inputs. Figure 1 depicts the proposed model. Figure 1 clearly depicts the overall architecture of the HMI system with 2LVIP. The imaging devices are used to gather information from the environment. The collected information is then processed by the 2LVIP. The regression and classification techniques are incorporated into the 2LVIP to maximize unstructured and structured data processing. The proposed processing method identifies the gain-related feature information in the data in the first level and optimizes it to maximize the gain. In the second level, the errors in the classifications are determined, and they are removed via the regression process, thus stabilizing the precision to meet the HMI application's demands. The following equation represents the structured and unstructured images used as the input for the CV system: Equation (1) indicates the analysis l 0 of structured s r and unstructured u r images, where the structured image includes the size of the image and presentation, whereas the unstructured image includes variations in size, frame, and patterns. The structured image remains the same for the single input image when a number of images i 1 0 , i 2 0 , . . . ni 0 are used. In this case, ni 0 represents the number of images that are captured at the appropriate time t a . The time the image is captured is denoted as d t (c 0 ). After the classification of the structured and unstructured data is done using the tree model. Equation (2a) below indicates the grouping of the trees: Object detection occurs for both structured and unstructured data, where 1 + i 0 ni 0 * s r represents the structured image data. The analysis of the unstructured data is denoted as i 0 ta l 0 + u r . The combination of these two parts contains the necessary information. Equation (2b) is used to obtain useful information: The extraction of useful information β from the data is the initial step for classification here; the analysis is done by evaluating Eq. (2b). The data processing is illustrated in Fig. 2.
In Fig. 2, the processing is carried out for the structured and unstructured data via ni 0 s r * (d t − t a ). Thus, it considers the two types of data with based on the classification based on the mathematical computation 1+ c 0 +(u r −s r ) ni 0 and examines the data that are necessary for automatic detection in a realtime environment.

Classification via the discrete finite value
The discrete finite value is used to derive the classifications for structured and unstructured data. It is based on data that are acquired at a particular time and indicates two sets, such as the finite ∂ 0 and infinite ∂ . Thus, by combining Eqs. (2a) and (2b), Eq. (3) below is derived for identifying the data in f : ( The useful information is extracted for the classification α(β) is formulated above. The structured data s r ∈ f + c 0 is identified when the image is captured. The unstructured u r ∈ 1 + l a ni 0 is used in the analysis of a number of input images. The information is extracted in this step, and a discrete set of values is identified. Equation (4a) below is then used to determine the classification process for this discrete set of values: Datasets are either finite or infinite based on this classification scheme. The data belonging to finite sets are distinguished as ∂ 0 (i 0 + l 0 ); these are the data that are acquired at a particular time. If data are not gathered on time, then it is considered to be unstructured data. Equation (4b) is then used to represents the finite and infinite classifications: For discrete values e , the classification is carried out by observing if the data are processed at fixed times with the structure e s r + l 0 * f + (d t − t a ). The unstructured data that are not processed at the specific time are represented by l 0 * u r ni 0 + u r + e − t a . The classification is then carried out and attains maximum gain. The maximum gain is obtained by combining the structured data s r and unstructured data u r . The error that occurs during processing is l 0 + (s r + u r ). The error is due to the misclassification of discrete values (α 0 ), which is computed using Eq. (4c): A misclassification leads to a finite value e (∂ 0 ) that is not discrete, as observed in Eq. (4c). In these cases, l 0 + f ni 0 * e + β represents the discrete value that extracts useful information, although the identification is not made on time. The classification process is illustrated in Fig. 3.
The computation of (l 0 + β) * u r results in an infinite set of values. Following this process, the accurate classification of the finite set is examined, and the error is reduced by deriving the regression algorithm.

Minimizing the errors with regression
The regression method has been used to predict the structured and unstructured data to reduce the error caused at the time of classification ∂ . This second-level extracts better precision from the HMI application. The process obtained in the first level extracts the maximum gain by evaluating both s r and u r . During this process, some errors occur. A prediction is made by observing the training set that is processed in the preceding process. Equation (5a) below is used to evaluate the regression through prediction: The regression allows the prediction of ∂ 0 and ∂ . The dependent value is used to find the prediction q 0 used in f ni 0 + (β + α) to identify the unstructured data. Here, l 0 t a f + β represents the useful information that is extracted for further processing; the independent value is derived via Eq. (5b) below: Here, 1+ s r +u r ni 0 denotes the structured and unstructured data that are combined to attain the maximum gain. The prediction method is formulated in Eq. (6), as follows:

Fig. 4 Illustration of the regression process
Here, β(i 0 ) α * c 0 is used to gain information for the input image. Figure 4 illustrates the regression process.
It determines the (t a − β) targeted for extraction from the obtained gain at a particular time. The error o is used to examine the foregone and forthcoming data for the HMI applications g 0 and h 0 . In this manner, the error of the foregone data is compared with that for the forthcoming data to provide an optimal result. The objective of the second level is to reduce the errors made in information processing. Thus, the training set is used to compare and improve processing further, see Eq. (7).
The training data ∅ are used to formulate the 'if and otherwise' conditions for (l 0 + f ) * i 0 , thus β+e i 0 is used for identifying the discrete data. The foregone and forthcoming data are used to evaluate this prediction-based method. The foregone data is again trained and used for the forthcoming data based on the 'if' condition. The training calculation in Eq. (7) is used to make the comparison needed for the HMI application. The cost function calculated for the regression method minimizes the error between the predicted and actual values used for processing the data. Equation (8) below is used to obtain the cost function: The cost function is used to attain the data points j used to find the prediction value, then forthcoming data is subtracted, along with the prediction value. The β + p +q 0 i 0 is processed to obtain the actual value. The cost function is used for obtaining better results for the classification and regression method. Next, the prediction-based regression method is used to reduce the error with Eq. (9), as follows: The reduction in the error ρ made in Eq. (9) is used to evaluate the error. The cost function + β+e i 0 is used in the analysis of the regression method. The training set is used to determine the data from the maximum gain β+γ ∅ + (g 0 − h 0 ). It derives the lesser error. Afterwards, a final regression method is used to reduce the error from the classification of the structured and unstructured data, see Eq. (10): In this regression, (α + γ ) − d t has more errors at the time of computing. Figure 5 illustrates this regression-based detection method.
The second condition (β +γ ) * (t a −d t ) has a lesser error. In this equation, the first condition does not satisfy the objective, whereas the second condition satisfies the proposed method (2LVIP). If an object is not classified correctly, the derivative f +β ni 0 * l 0 is applied. Thus, the process is updated for better classification. As a result, the precision is enhanced, and error is reduced in the HMI application through the evaluation of (α + γ ) * δ(i 0 ).

Results and discussion
This section discusses the performance evaluation of the proposed method that is done through a comparative study of different metrics. In this assessment, the metrics considered are precision, gain ratio, error, and processing time. The image input from [28] is used in this analysis. It consists of 102 sets of images, along with their annotations. A region's image with 45 training data points is verified in the analysis, and the results are discussed. The size of the training dataset is 149 mb, and the annotations fill nearly 7 mb. The number of classification and regression instances considered are 20 and 40, respectively. The RF-KLT [25], MOD-3D LIDAR [24], and UFO [15] methods are used in the comparison.

Precision analysis
In Fig. 6a and b, the precisions of the classification instances and information gain ratios are compared. The precision for the proposed 2LVIP is higher due to the evaluation of i 0 ta l 0 , i.e., the data obtained from the classification tree. The analysis of the unstructured data 1 + i 0 ni 0 * s r + u r deter-mines the resultant data (c 0 + ni 0 ). The analysis proceeds by introducing the regression methods to acquire the number of classification instances based on time ∂ 0 + s r +u r i 0 . The data so attained have higher classifications; thus, the gain also increases. The gain is determined by deriving α + f i 0 + β e , which represents an actual stage in the processing. The data processing in this 2LVIP method is analyzed by evaluating ∅ + i o * (g 0 + h 0 ). The foregone and forthcoming data are used in the analysis of the precision. By comparing the classification of the proposed method with those of the other three methods, it can be seen that a smaller precision value is obtained when the information gain decreases. Thus, if the classification is improved, then the gain also increases. Thus, the error reduces i 0 + α+ f β * t a −d t and vice versa. In Fig. 6, a higher precision is obtained by computing t a + f + β.

Information gain analysis
The gain for the proposed method is shown in Fig. 7, which indicates that as the classification increases, the gain increases. When comparing the 2LVIP method with the other three methods, namely, RF-KLT, MOD-3D LIDAR, UFO, the 2LVIP method has the highest gain. The gain performance is found through 1 + c 0 +(u r −s r ) ni 0 , which acquires the image and classifies the data. Improvements in the gain can be analyzed by determining s r ∈ f + c 0 and u r ∈ 1 + l a ni 0 . The maximum gain is obtained when the classification of the structured and unstructured data is done correctly. The classification methods are derived in Eq. (4a), where β * (∂ 0 + ∂ ) represents the extraction of the useful information. The data belonging to finite sets are distinguished as ∂ 0 (i 0 + l 0 ). If the data are not collected promptly, then they are represented as unstructured data, namely, by l 0 + f ni 0 * e s r . These data are not processed at the specific time l 0 * u r ni 0 + u r + e − t a .

Error analysis
In the 2LVIP method, the error is reduced by increasing the number of correct classifications at the specific time l 0 * u r ni 0 + u r + e − t a . Thus, these classifications signify the useful information obtained and also the error that occurs during processing, namely, l 0 + (s r + u r ). The error is due to the misclassification of discrete values. The expression 1+ s r +u r ni 0 denotes the structured and unstructured data that are combined to attain the maximum gain. Next, the prediction γ +β(i 0 ) * α is determined to reduce the error in the forthcoming data. The analysis is done by computing the process at a specified time. Thus, the prediction is used to obtain errorless unstructured and structured data. Equation (9) is used to reduce the error, and + β+e information gain obtained from the classification increases, so that the error is reduced. The error is smaller for the 2LVIP method than the errors for the existing three methods, as seen by evaluating the o + f (i 0 ) (refer to Fig. 8a, b).

Analysis of the processing time
Although there is a high amount of data for classification when using the 2LVIP method, it has the smallest processing time, as shown in Fig. 9a. As the number of classifications increases, the analysis necessary increases. It is found by determining (t a * c 0 ) * l 0 * f ni 0 , which evaluates the specific time needed for processing. Using Eq. (4a), the appropriate classification is carried out for the acquired data.
If l 0 + f ni 0 * e + β represents the discrete value that extracts useful information and identification is not made on time, it must be improved. The data obtained from the classification reduces the error at the time of classification ∂ 0 , thus providing a better processing time for the HMI application. The classification analysis uses α(l 0 ) * (t a − d t ), which extracts the data at the appropriate time. The data belonging to finite sets are distinguished as ∂ 0 (i 0 + l 0 ). If the information gain increases, then the processing time for the proposed method, formulated with γ + β(l 0 ), holds constant (refer to Fig. 9a, b). The results of the above comparison are tabulated in Tables 1 and 2, respectively. The cumulative outcome of the proposed method shows that it improves precision and gain% by 3.61% and 9.42%, respectively. It also reduces the error and processing time by 6.51% and 21.75%, respectively.
In terms of the information gain, the proposed method achieves 4.54% higher precision, 7.65% less error, and 26.48% less processing time.

Analysis of the instances
In Fig. 10, the regression instances for the classified data increase, so the error decreases through the derivation of β(i 0 ) α * c 0 . It determines (t a − β) for extraction at a particular time from the obtained gain. The error o is used to examine the foregone and forthcoming data for the HMI applications g 0 and h 0 . Here, (l 0 + f ) * i 0 is used to identify the analysis of the input data. Thus, the β + p +q 0 i 0 are computed for better precision. The maximum gain β+γ ∅ + (g 0 − h 0 ) is used to reduce the error.
The maximum gain is increased by finding l 0 f + β − (d t − e ), which is used to decrease the error. In Fig. 11, if the number of classifications the proposed method makes  increases, then its error is reduced. For the 2LVIP method, the (g 0 − h 0 ) + (α + β) obtained with the final regression method is used to reduce the error in the classification of structured and unstructured data. The reduction in the error ρ is found with Eq. (9), then used to evaluate the error. Given the number of instances, the regression produces more gain than the classification; hence, the error is reduced. The classification of the structured and unstructured data is evaluated with the regression equation δ(i 0 ). Equation (10) is used to reduce the error and determines the regression process based on the prediction γ + f +e i 0 (refer to Fig. 12).

Conclusion
In this paper, a two-level visual information processing method is discussed for improving the precision of humanmachine interaction systems. The input obtained from the imaging devices is classified into structured and unstructured data based on the available information. The maximum information gain is then extracted, and regression is used to mitigate the errors in the information gained. The regression process uses the training set data to reduce the error and processing time via recursive discrete and finite value estimations. The regression analysis is recursively handled using predictive cost estimation. Training is performed to reduce the processing time used in extracting useful information from the visual input. By attuning the classification and regression processes, the proposed method is found to maximize the precision and information gain in detecting the objects and reduce the error and processing time. In the future, an optimization-based regression approach will be applied to classify the objects.