1 Introduction

The surfaces produced by machining methods will have various yet distinct textures. These textures are made up of repeated patterns with some attributes that vary regularly. Surface texture identification is used in classification (categorization) as well as in industrial inspection methods. Additionally, considering the engineering analysis stance, automatic identification of machining methods from the surface texture will help the designer in availing crucial production data pertaining to tools viz., geometry, kinematics and detection of machining anomaly. Categorization is the procedure of generating a decision process relying on a set of familiar cases (labeled data). Based on training data, the learning algorithm generates a function f: X\(\to\)Y (where X and Y are input and output, respectively) that maps new unrevealed data sets from the input to the output field [1].

Image processing and machine vision-based studies are gaining a lot of attention from both academia and industries in various fields viz., tool condition monitoring [2], surface defect identification [3], machine tool fault diagnosis [4], evaluation of tool wear and surface roughness [5] and chatter identification [6], etc. Machine vision-centered error prognosis and monitoring of the machine conditions are often data-directed practices largely used in unmanned machining and 3D printing. It ensures periodic maintenance and operations reliability in an industrial management setup. Because of the availability of low-cost instruments for data collection [7], as well as upcoming technologies in image processing, feature extraction [8, 9] and machine vision [10, 11], computer vision-centered condition monitoring has emerged as the best-suited method for on-machine surveillance. The surface texture identification through image processing and machine learning (ML)-based techniques has helped to monitor the machine tool and the machining process.

In the present age of big data characterized by enormous data sets and high dimensionality, extracting usable information from vast amounts of data is a big hurdle. High-dimensional data sets contain many irrelevant and redundant features, which makes data processing, knowledge mining, and pattern categorization more challenging. In such cases, the best strategy to solve the problem will be to apply dimensionality reduction techniques (DRT) to bring down data from the original high-dimensional space to the low-dimensional intrinsic space by filtering out noise and redundant information [12]. Implementation of DRT produces linear separability by mapping a lower and reduced dimensional feature space onto a higher dimensional feature space. Therefore, dimensionality reduction is a helpful, important, and necessary strategy for trimming, distinguishing, and exemplifying datasets by turning high-dimensional datasets into lower-dimensional datasets [13]. The amount of data needed to perform a credible analysis grows exponentially as the complexity of the data increases in ML. When it came to challenges in dynamic optimization, Bellman coined the phrase "curse of dimensionality" [14]. The issue can be better addressed by identifying the appropriate techniques to reduce the number of features taken during analysis. In this regard, the frequent methods employed are,

  1. a.

    Feature selection

  2. b.

    Feature extraction

Feature selection is a prominent method used in ML to reduce data dimensionality. It identifies significant features from the dataset and removes those that are redundant [15, 16]. The three strong points in favor of the feature selection technique are that it alleviates the bane of dimensionality and quickens the learning process, thereby reducing the repository space [17]. Conversely, the feature extraction technique typically relies on feature transformation, which facilitates projecting high-dimensional data into a low-dimensional subspace. The feature extraction technique helps in covering the potential structure of the original data without changing the actual relative distance between the extracted features, thereby avoiding any significant data loss [12]. Principal component analysis (PCA) is a commonly used DRT based on feature extraction.

The image textures are represented using texture features, which are extracted using texture operators. Gray level co-occurrence matrices (GLCM) are one among the various texture operators. GLCM is a second-order statistical texture feature used to identify the surface texture or microstructure. GLCM is an important tool in terms of ease of calculation and uses and also its ability to determine the coarseness, roughness and orientation in a single calculation [18, 19]. Further details of GLCM and its calculations are given in Sect. 3 of this article.

In this paper, it is proposed that GLCM-based texture features in analyzing the machined surfaces and classifying them using different ML-based models. In the present work, 21 GLCM-based image features are extracted defined for texture analysis. This approach uses the image pre-processing stage using various methods viz., grayscale conversion, resizing, contrast enhancement, histogram equalization, and partitioning of an image into sixteen divisions. The partitioning of an image aids in increasing the number of images and the generalization of the ML models. Further, we have investigated different distance-based feature selection methods to identify the most sensitive features used as input to the ML-based classifiers. Thus, the developed model effectively identifies and classifies machined surface images. The contributions of this paper are:

  • Synergize the performances of image processing, GLCM feature extraction, dimensionality feature reduction and ML-based classification to identify the surface texture of the machined surface images.

  • Selection of sensitive features from the images that form the most effective set of inputs to the ML classifiers, thereby reducing the burden of computational complexity.

  • Evaluate the effectiveness of the threshold criterion function (TCF) in the DRT.

Figure 1 shows the scheme of the customized program developed to achieve the objectives set in this paper. The various types of image pre-processing are implemented in MATLAB customized program. The dimensionality reduction is implemented using Microsoft Excel, and the ML-based classifiers are implemented in Python.

Fig. 1
figure 1

The overall methodology adopted in the current work

The rest of the paper organization is as follows: Sect. 2 provides a brief literature review on machine surface texture analysis and DRT. Section 3 gives a clear introduction to GLCM and feature extraction from GLCM. The distance-based feature selection methods and TCF used to select the most sensitive features are detailed in Sect. 4. The different ML models used for classification are discussed in Sect. 5. The image data set details and image pre-processing steps are presented in Sect. 6, and Sect. 7 provides the selection of sensitive features. Section 8 gives ML-based classification results. Finally, Sect. 9 is fully about the conclusions of this work.

2 Literature Survey

2.1 Machined Surface Texture Analysis

The texture is a recurring pattern consisting of a group of local statistics that vary slowly. On the other hand, image texture is a function of spatial variation in pixel intensities (grey values). Tuceryan and Jain [20] highlighted the usefulness of image texture which was being extensively employed in various other domains and furthermore extended to the field of machining to analyze machined surfaces.

In this regard, the past research work is summarized below. Dave et al. [21] performed the categorization of the machined surfaces by partitioning images into sixteen non-overlapping-equal sub-images. The wavelet-based features were extracted from the sub-images up to the fourth level of decomposition and fed to the logistic regression classifier. The classifier was able to categorize the machined surfaces with a training accuracy of 100% and a testing accuracy of 92.3%. Patel et al. [22] used GLCM-based features to classify the machined surface images. With tenfold cross-validation, artificial neural network (ANN) and random forest (RAF) models gave better classification results among the different classification models. Ashour et al. [1] used the PCA technique to reduce the histogram-based feature vectors of images that were later given as inputs to various classifiers to classify the machined surfaces. Compared to all other kernels for Support vector machine (SVM), including the ANN classifier, the SVM with linear kernel outperformed all. Manish et al. [23] used canny edge detection and histogram analysis-based image processing techniques to inspect the surface finish and surface flaws of the ground surface. Penumuru et al. [24] worked on identifying and classifying materials using ML and machine vision in the context of industry 4.0. The Red–green–blue (RGB) color components were used as an input feature to ML models. The SVM model was able to produce better results, and finally, the author recommended that the developed model could be executed in robots commissioned in an intelligent manufacturing setup. John and Arunachalam [25] suggested that illumination compensation was necessary to suppress the illumination variation to achieve consistency and accuracy in surface texture analysis. A similar study in [26] observed that measured and predicted surface roughness values correlated well under blue light illumination. ANN model was used for predicting the surface roughness values using the GLCM feature of 3D printed parts. Boudiaf et al. [27] worked on surface defect detection of steel strips using deep learning and SVM. The proposed model was able to identify the defect with an accuracy of 99.70%. Summary of above literatures is provided in Table 1.

Table 1 Summary of literatures on surface texture identification

2.2 Dimensional Reduction Techniques (DRT)

Distance-based feature selection methods use a statistical approach to identify the most important feature and thus, discriminate between the most sensitive and no sensitive features for classifications [28, 29]. The past literature reports many distance-based statistical measures to quantify the amount of dissimilarity between the two probability distributions.

This section reviews the literature on distance-based DRT. To track the manufacturing process, Shao et al. [30] developed a novel feature selection method that combines forward feature selection with k-fold cross-validation and Fisher's discriminant ratio (FDR). The proposed method successfully picked the most relevant features and statistical process control limits. Chen et al. [31] applied Fisher’s discriminant analysis (FDA) to identify the most sensitive features for chatter detection from multi-channel vibration signals. Thus, suggested feature ranking is significant, which increases the accuracy of computational models. Sun et al. [32] suggested a feature selection technique wherein Fisher's ratio is integrated with the novel clustering overlap while classifying the ultrasonically welded carbon fiber reinforced polymer composites based on the weld attributes, joint performance and welding process parameters. The suggested method produced better classification results than those obtained using linear discriminant analysis or Fisher's ratio alone. On the other hand, Shao et al. [33] used Fisher's ratio to rank the features extracted from space and frequency domains of cross-sectional profiles on tool surfaces. Further, the ranked features were classified using a quadratic classifier and were used to monitor the tool wear in ultrasonic welding of lithium-ion batteries. In another instance, Bhat et al. [34] used Fisher's ratio to pick the prominent characteristics from a huge set of features extracted using GLCM and those characteristics were used to classify the tool wear state using the hidden Markov model. The suggested model was worthwhile, with an average accuracy of 95%. Vijay et al. [35] differentiated the DRT like PCA, Fisher's criterion (FC) and separation index (SI) using the radial basis function neural network for bearing condition monitoring. The vibration signals generated from the bearings under different conditions were denoised, and time-domain and frequency-domain-based features were extracted. The FC and SI techniques successfully improved the performance of the developed model by reducing the dimensionality of the feature vector to 1/6th. Zhu et al. [36] classified the tropical forest from multi-temporal sentinel-2 images in Hainan Island. The feature selection method using weighted separation-index and classification using RAF with an average accuracy of 93% shows the usefulness of the developed technique.

Momenzadeh et al. [37] evaluated the different feature selection techniques, including Bhattacharyya distance, to classify the gene expression data. The authors concluded that effectiveness in the feature selection method resulted in robust classification from the hidden Markov model. Jiang and Zhu [38] proposed a dynamic Bhattacharya bound-based feature selection technique for fault classification. The proposed method showed better performance compared to FDA, dynamic FDA and locality-preserving FDA. Shah and Dang [39] worked on an automatic modulation classification using the Bhattacharya distance (BD)-based feature selection method and ANN classifier. The authors reported that the proposed feature selection method for three different neural network architectures showed superior performance with reduced computational complexity when compared to PCA. Sivakkolundu and V. Kavitha [40] proposed a Bhattacharya coefficient-based feature selection technique for handling huge data from the warehouse. BD-based attributes selection method efficiently dealt with the big data with a minimal false positive rate and time consumption. Buchaiah and Shakya [41] used BD combined with SVM for fault diagnosis of bearing using 2D fault features extracted from vibration signals. The proposed methodology was able to classify the single and combined bearing defects successfully and recommended to use of the same methodology in fault monitoring of shafts, gear motors, etc. Table 2 provides the summary of above cited literatures.

Table 2 Summary of the literature on distance-based DRT

3 GLCM

The GLCM-based technique for image texture analysis was put forth by Haralick and group [42] way back in 1973. Each element of the GLCM matrix represents the number of co-occurrences of corresponding pixel pairs in a particular spacing (s) and direction \(\left( \theta \right)\) in an image matrix, and it can be written as (Eq. 1):

$${\text{GLCM}}\left( {i,j} \right)_{\theta } = \left| {\left\{ {\left( {\left( {p_{1} ,p_{2} } \right)|I\left( {p_{1} } \right)} \right) = i,\;I\left( {p_{2} } \right) = j} \right\}} \right|$$
(1)

where p1 and p2 represent the position of pixels in the image matrix M × N as shown in Fig. 2a and the construction of the GLCM matrix from the image matrix is depicted in Fig. 2b.

Fig. 2
figure 2

Construction of GLCM matrix (a) Image matrix (b) GLCM matrix

The image matrix (Fig. 2a) consists of a set of pixel values, which indicates the intensity of the gray level present in an image. In the image matrix, there is a co-occurrence of intensity values, viz., eight and seven two times along a 00 direction with a spacing of one (highlighted with an ellipse, Fig. 2a). Thus, value two in the GLCM matrix (highlighted with a circle, Fig. 2b) along the third row and the second column indicates the pixel value of eight, which occurred two times along with pixel value seven with pixel spacing is one and pixel pair direction 00. In the GLCM matrix, i and j denote the gray level values of an image pixel. Similarly, the GLCM matrix is constructed [43, 44]. Using the GLCM matrix, twenty-one texture features can be extracted and are given in Table 3. These features are used by many researchers [42, 45, 46] for different applications, and the same can be referred to for more information. where P(i, j) is (i, j)th entry in a normalized GLCM matrix, µx and µy are means of row and column of GLCM matrix further \(\sigma_{x}\) and \(\sigma_{y}\) are the standard deviations of row and column of the GLCM matrix and G is the number of distinct grey levels in the quantized image matrix.

Table 3 GLCM features

4 Feature Selection using Distance-Based Methods

The computation speed and generalization nature of the ML classification model are directly proportional to the possibility of using feature vector space in lower dimensions. The extracted feature matrix might contain correlated features that could form similar or repeated inputs to the ML classifiers for the categorization of machined surface images considered in the present research. Conversely, unique feature input contains features that are uncorrelated. Also, some of the features might not form sensitive inputs to the classifier. A feature is said to be sensitive if the average distance between the feature classes is large and at the same time, the average distance within the class should be less. Thus, it is essential to select uncorrelated sensitive features that would fulfill the requirement of a compact classifier viz., the dimensionality of classifier input would be reduced. Figure 3 shows the difference between ‘class separability’ and ‘classification separability.’ The eigenvalues along principal axis 2 are larger than the principal axis 1. However, when the classes are mapped over axis 2, the probability density functions are overlapped to a larger extent. A smaller distance between the mean lines and larger standard deviation makes the feature measurement along axis 2 tedious for classification purposes. However, suppose the classes are mapped on axis 1; in that case, the probability density functions are well separated with a larger distance between their means and smaller standard deviations, making feature measurement most suitable for classification purposes. Thus, the distance measure is necessary to acquire a quantitative assessment of the two classes separability. In that regard, details about three distance-based feature selection methods are explained in below subsections.

Fig. 3
figure 3

Feature extraction for Classification

4.1 Fisher's Criterion (FC)

The FC criterion relies on calculating the separation distance between the two categories of interest [48]. Further, the separation distance depends on the mean and standard deviation of two categories of interest and is mathematically expressed in Eq. 2, as given by Yen and Lin [49].

$$F_{k}^{a,b} = \frac{{\left| {Mean\left( {t_{k}^{a} } \right) - Mean\left( {t_{k}^{b} } \right)} \right|^{2} }}{{\left[ {std\left( {t_{k}^{a} } \right)} \right]^{2} + \left[ {std\left( {t_{k}^{b} } \right)} \right]^{2} }}$$
(2)

where \(F_{k}^{a,b}\) represents the Fisher's separation distance between the two categories a and b for the kth feature (a and b each may be turned, ground and shaped surface). Mean () and std () represent the mean and standard deviation of the GLCM feature vector. Equation 2 is applicable for a two-class problem. For a three-class problem (turned, ground and shaped), Fisher's discriminant power (FDP) for the kth feature is expressed as Eq. 3.

$$FDP_{k} = F_{k}^{t,g} + F_{k}^{g,s} + F_{k}^{t,s}$$
(3)

where \(F_{k}^{t,g}\) \(F_{k}^{g,s} and\) \(F_{k}^{t,s}\) are the Fisher’s separation distance between turned-ground, ground-shaped and turned-shaped classes, respectively; and t, g and s represent the turned, ground and shaped surfaces.

4.2 Separation Index (SI)

It is the ratio of the difference between the average of two classes to the sum of the standard deviation of two classes [50]. SI is a non-dimensional metric that presents the graphical difference between two class groupings. It is mathematically expressed as shown in Eq. 4

$$S_{k}^{a,b} = \frac{{\left| {Mean\left( {t_{k}^{a} } \right) - Mean\left( {t_{k}^{b} } \right)} \right|}}{{std\left( {t_{k}^{a} } \right) + std\left( {t_{k}^{b} } \right)}}$$
(4)

where \(S_{k}^{a,b}\) gives the SI between the two classes or groupings, say, two groupings a and b for the kth characteristic (a and b each may be turned, ground and shaped surface). All other terms remain the same as above. Kim et al. [50] highlighted that a better characteristic is one that has a higher difference in average values and lower standard deviation values between the classes of interest. SI discriminant power (SIDP) for a specific feature k is given by Eq. 5.

$$SIDP_{k} = S_{k}^{t,g} + S_{k}^{g,s} + S_{k}^{t,s}$$
(5)

where \(SIDP_{k}\) is discriminant power for a specific feature k; \(S_{k}^{t,g}\), \(S_{k}^{g,s}\) and \(S_{k}^{t,s}\) are the separation distances between turned-ground, ground-shaped and turned-shaped classes, respectively.

4.3 Bhattacharya Distance (BD)

It is a measure of similarity between two continuous or discrete probability distributions. The BD between two classes for the kth feature can be determined from Eq. 6 [16].

$$b_{k}^{a,b} = \frac{1}{4}\ln \left( {\frac{1}{4}\left( {\left( {\frac{{\sigma_{k}^{a} }}{{\sigma_{k}^{b} }}} \right)^{2} + \left( {\frac{{\sigma_{k}^{b} }}{{\sigma_{k}^{a} }}} \right)^{2} + 2} \right)} \right) + \frac{1}{4}\left( {\frac{{\left( {\mu_{k}^{a} - \mu_{k}^{b} } \right)^{2} }}{{\left( {\sigma_{k}^{a} } \right)^{2} + \left( {\sigma_{k}^{b} } \right)^{2} }}} \right)$$
(6)

where \(\mu_{k}^{a}\), \(\mu_{k}^{b}\) and \(\sigma_{k}^{a}\), \(\sigma_{k}^{b}\) are the means and standard deviations for two classes a and b, respectively. The term \(b_{k}^{a,b}\) represents the BD for a specific feature k between the two machined surfaces a and b. For a three class problem, the Bhattacharya distance discriminant power (BDDP) for the kth feature is given by Eq. 7.

$$BDDP_{k} = b_{k}^{t,g} + b_{k}^{g,s} + b_{k}^{t,s}$$
(7)

where \(b_{k}^{t,g}\), \(b_{k}^{g,s}\) and \(b_{k}^{t,s}\) are the BD distances between turned-ground, ground-shaped and turned-shaped classes, respectively, for the kth feature. The features with higher FDPs, SIDPs and BDDPs form sensitive features to classifiers.

4.4 Threshold Criteria Function (TCF)

It was proposed by [51] and used for selecting the most sensitive features from different methods (FC, SI and BD) and is expressed analytically in Eq. 8.

$$\theta = \frac{{\mathop \sum \nolimits_{k = 1}^{s} G_{k}^{*} }}{{\mathop \sum \nolimits_{k = 1}^{f} G_{k} }}$$
(8)

where Gk is the discriminant power for kth feature (k = 1 to f) and G* is the vector in which the elements of G are stored in the descending order, \(\theta\) is the threshold, s is the number of sensitive features and f is the total number of features (s < f). The selected value of \(\theta\) can vary in the range from 0.75 to 0.9. If \(\theta\) = 0.85, it means that the cumulative sum of discriminant power of s selected features is approximately equal to 85% of the cumulative sum of discriminant power of all f features. As a result, instead of all f features, only the selected s sensitive features should be given as input to the classifier, viz., a dimensionality reduction from f to s is achieved [51]. The methodology of selecting the sensitive features using TCF is given in algorithm 1.

figure a

5 Machine Learning (ML) Models

The features selected using the three distance-based feature selection methods are fed to the four ML classifiers. The four classifiers details are given below;

5.1 Decision Tree (DCT)

A DCT is used to examine the values of a few of the object's characteristics to establish its classification (variables). It mainly consists of the root node, decision node and leaf node. The DCT begins at the root node, and it represents the complete dataset, which will be further split into two or more homogeneous sets. The decision node/intermediate nodes are test nodes for the objects' characteristics. Leaf node/final nodes/leaves are decision or classification nodes in a DCT. The development of DCT is a cyclical process. The frequency with which an attribute appears in a tree indicates its importance. The attribute selection at the nodes depends on the highest information gained by that particular attribute. Thus, an attribute with higher gain or lesser entropy will be selected as the tree's root node. Hierarchically further segmentation is performed usually in a binary manner until the final node is reached [52].

5.2 K-Nearest Neighbor (KNN)

It is one of the most basic supervised learning-based ML algorithms. The KNN algorithm identifies similarities between the latest and existing data and assigns the latest case to the group that is most similar to the available groupings. In other words, when the latest data arrive, it can be easily classified into a well-suited group by applying the KNN. Even though the KNN can be applied for solving both regression and classification problems, but most often, it is used for solving classification problems. There are no assumptions made about the underlying data due to the nonparametric nature of KNN. Usually, KNN does not act on the data set once available, instead saves the data and acts on the data only during the classification process [53, 54].

5.3 Random Forest (RAF)

A reliable tree-based classification method acts as an ensemble tool for constructing a DCT from a subset of observations and variables belonging to a data set. Each DCT in a forest is used to judge unmarked samples, and the majority voting results of all DCT are used to predict the unmarked sample categories. The vote given by each tree carries the same weightage. The more trees in the forest, the greater the accuracy and the less chances of overfitting [22, 36]. The classification can be further improved by including many DCT in RAF from bagged datasets. The goal of bagging is to reduce variance in a dataset while avoiding over-fitting.

5.4 Support Vector Machine (SVM)

It is a supervised learning method capable of solving direct and indirect double-class binary categorization problems. In n-dimensional space, independent data xi (i = 1…k) belong to the first or second class, and the data labels indicate -1 for the first-class and + 1 for the second class. The main purpose of SVM is to segregate the class data through a maximal margin hyperplane. In other words, the SVM undertakes to augment the length between data closest to the separation plane. The separation hyperplane for training the data set can be expressed mathematically as given in Eq. 9, provided independent data are segregated linearly [55].

$$f\left( x \right) = w^{T} x + b$$
(9)

where w represents the n-dimensional weight vector, b represents a scalar multiplier or bias value and x is an input feature vector. This equation determines the maximum margin that can be used to distinguish positive from negative classes. The trained data obtained from Eq. 10 are classified based on margin and later tested using the decision function shown in Eq. 10.

$$y_{i} \left( {wx_{i} + b} \right) \ge 1 i = 1 \ldots k$$
(10)

6 Image Dataset and Pre-Processing

To assess the performance of the ML techniques in the classification of machined surface images and to check the feasibility of different distance-based DRT, the dataset consisting of images of the turned, ground, and shaped surfaces was considered for classification purposes. Each machined surface set consists of eighteen images. Hence, there were three classes of machined surfaces that are needed to be classified. The dataset was provided by the Department of Mechanical Engineering, NMAMIT, Nitte. More information about the image capturing system is given in [56]. The steps followed in image pre-processing are depicted in Fig. 4. Initially, eighteen images from each of the three classes were transformed from RGB to grayscale to simplify the classification process. The resulting grayscale images were resized from 400 × 500-pixel resolution to 500 × 500-pixel resolution. Then, the contrast enhancement and histogram equalization methods were applied to correctly identify and analyze the surface texture. Then, to generalize the classification model, each histogram equalized image was partitioned into sixteen non-overlapping regions as suggested in [57, 58]. This produced 288 (18 × 16 = 288) images for each class and 864 (288 × 3 = 864) images for three classes.

Fig. 4
figure 4

Image pre-processing methodology (a) RGB image (b) grayscale image (c) resized image (d) contrast-enhanced image (e) histogram equalized image (f) partitioned image

7 Selection of Sensitive Features

From each of the partitioned images, the 21 GLCM features (depicted in Table 3) are extracted. The feature vectors are stored in a matrix of size (864 × 21). The DRTs viz., FC, SI and BD are used to select the sensitive features.

7.1 Selection of Sensitive Features using Fischer’s Criterion

The Fisher's separation distance between turned-ground (\(F_{k}^{t,g}\)), ground-shaped (\(F_{k}^{g,s}\)) and turned-shaped (\(F_{k}^{t,s}\)) classes is computed using Eq. 2. The FDPs for 21 features are computed using Eq. 3. Figure 5 shows the FDPs for 21 GLCM features. It can be observed from Fig. 5 that FDPs of the features with label numbers 21, 1, 15, 6, 16, 20, 9, 19, 17 and 8 are relatively higher when compared to other features (features are numbered as per Table 3). These features are the ones with better classification separability and are suitable input candidates for the ML classifiers (as discussed in Sect. 4). Table 4 shows the use of three different threshold values, viz., \(\theta =\) 0.8, 0.85 and 0.9, in selecting the sensitive features under FC. The sensitive features are selected using TCF as per algorithm 1.

Fig. 5
figure 5

FDP of 21 GLCM features

Table 4 Features selection for different threshold values based on FC

With \(\theta =\) 0.8, the cumulative FDPs of the eight (s = 8) features 21, 1, 15, 6, 16, 20, 9 and 19 amount to 117.71, which is approximately 80% ((117.71 ÷ 147.48) × 100 = 79.8%) of the sum of the FDPs of all the 21 features. Thus, the dimensionality is reduced from 21 to 8, with a dimensionality reduction of 62% (((21–8) ÷ 21) × 100 = 61.9%). Thus, for \(\theta =\) 0.8, the DRT FC selects 8 sensitive features.

With \(\theta =\) 0.85, the cumulative FDPs of nine (s = 9) features 21, 1, 15, 6, 16, 20, 9, 19 and 17 amount to 126.81, which is approximately 85% ((126.81 ÷ 147.48) × 100 = 85.9%) of the sum of the FDPs of all the 21 features. Thus, the dimensionality is reduced from 21 to 9, with a dimensionality reduction of 57% (((21–9) ÷ 21) × 100 = 57.1%). Thus, for \(\theta =\) 0.85, the DRT FC selects 9 sensitive features.

Similarly with \(\theta =\) 0.9, the cumulative FDPs of the ten (s = 10) features 21, 1, 15, 6, 16, 20, 9, 19, 17 and 8 amount to 132.53, which is approximately 90% ((132.53 ÷ 147.48) × 100 = 89.6%) of the sum of the FDPs of all the 21 features. Thus, the dimensionality is reduced from 21 to 10, with a dimensionality reduction of 52% (((21–10) ÷ 21) × 100 = 52.4%). Thus, for \(\theta =\) 0.9, the DRT FC selects 10 sensitive features.

7.2 Selection of Sensitive Features using Separation Index

Using the similar procedure explained in Sect. 7.1, the SI distances between the three classes of the machined surfaces are computed using Eq. 4 and the SIDP is computed using Eq. 5.

Figure 6 shows the SIDP for the 21 features, and Table 5 shows the details of selecting the sensitive features for different threshold values, viz., \(\theta =\) 0.8, 0.85 and 0.9. With θ = 0.8, ten sensitive features (s = 10) out of 21 (21, 1, 15, 16, 6, 20, 9, 19, 17 and 8) are selected resulting in a dimensionality reduction of 52.4%. With θ = 0.85, twelve sensitive features (s = 12) out of 21 (21, 1, 15, 16, 6, 20, 9, 19, 17, 8, 3 and 18) are selected resulting in a dimensionality reduction of 42.85%. With θ = 0.9, thirteen sensitive features (s = 13) out of 21 (21, 1, 15, 16, 6, 20, 9, 19, 17, 8, 3, 18 and 14) are selected resulting in a dimensionality reduction of 38.1%.

Fig. 6
figure 6

SIDP of 21 GLCM features

Table 5 Features selection for different threshold values based on SI

7.3 Selection of Sensitive Features using Bhattacharya Distance

Using the similar procedure explained in Sect. 7.1, the BDs between the three classes of the machined surfaces are computed using Eq. 6 and the BDDP is computed using Eq. 7.

Figure 7 shows the BDDP for the 21 features, and Table 6 shows the details of selecting the sensitive features for different threshold values, viz., \(\theta =\) 0.8, 0.85 and 0.9. With θ = 0.8, ten sensitive features (s = 10) out of 21 (21, 1, 15, 16, 6, 20, 9, 19, 17 and 8) are selected resulting in a dimensionality reduction of 52.4%. With θ = 0.85, eleven sensitive features (s = 11) out of 21 (21, 1, 15, 16, 6, 20, 9, 19, 17, 8 and 7) are selected resulting in a dimensionality reduction of 47.6%. With θ = 0.9, thirteen sensitive features (s = 13) out of 21 (21, 1, 15, 16, 6, 20, 9, 19, 17, 8, 7, 14 and 3) are selected resulting in a dimensionality reduction of 38.1%.

Fig. 7
figure 7

BDDP of 21 GLCM features

Table 6 Features selection for different threshold values based on BD

7.4 Selection of Sensitive Features Common to FC, SI and BD

By comparing the sensitive features selected using the three DRTs, the common feature set corresponding to each threshold is tabulated in Table 7. The percentage reduction in the dimensionality is also depicted in the table.

Table 7 The common features selected for three threshold values for all three methods

8 ML-Based Classification

The sensitive common features selected using the three different DRT (Table 7) are given as inputs to the ML classifiers. The data available is divided into two parts: 80% of the data is used for training, and the remaining 20% of data is used for testing. In this work, the classifiers are implemented using Python with the Jupyter Notebook editor on an ASUS laptop (AMD Ryzen 7 with 2.30 GHz processor, 8 GB RAM). The performance of the ML models depends on the tuning of the hyperparameters. Before the learning process can begin, several hyperparameters must be pre-configured. The Grid Search technique is used to tune the hyperparameters of the ML classifiers. Table 8 shows the hyperparameters selected for different ML classifiers and TCFs. The K-fold cross-validation technique is used to determine the performance of the ML classifier. After several trials, the number of folds is fixed as 10. In order to compare the effectiveness of the DRTs, the performance of the ML classifiers with all 21 features is compared against those with the selected sensitive feature set.

Table 8 The hyperparameters selected for different ML classifiers and TCFs

The details of the confusion matrices obtained for different scenarios are provided in Appendix A. The diagonal elements in the confusion matrix indicate the number of correct classifications, whereas the other elements portray the number of miss classifications. Also, the numbers 0, 1 and 2 along the axis of the confusion matrix denote the turned, ground and shaped surfaces, respectively. With all the 21 features as input to the ML classifiers, the confusion matrices obtained by DCT, KNN, RAF and SVM are provided in Fig. 9 (a), (b), (c) and (d), respectively. With the common features corresponding to θ = 0.8 as input, the confusion matrices obtained by DCT, KNN, RAF and SVM are provided in Fig. 10 (a), (b), (c) and (d), respectively. With the common features corresponding to θ = 0.85 as input, the confusion matrices obtained by DCT, KNN, RAF and SVM are provided in Fig. 11 (a), (b), (c) and (d), respectively. With the common features corresponding to θ = 0.9 as input, the confusion matrices obtained by DCT, KNN, RAF and SVM are provided in Fig. 12 (a), (b), (c) and (d), respectively.

Table 9 shows the classification accuracies of the training and the testing phases of the ML models using common features and also using all 21 features. Figure 8 represents the testing accuracies using different classification models. From Fig. 8, it can be seen that the use of the DRT is beneficial in the classification to enhance the test accuracy, except for SVM. During training, DCT, KNN and RAF models can identify and classify the surfaces with 100% accuracy. However, during testing, the classification accuracy decreases a little bit. DCT model shows a decrease in classification accuracy with an increase in feature set. For θ = 0.8, highest classification accuracy of 93% is noted for DCT when compared to other threshold values. The KNN model also shows a classification accuracy of 93% for θ = 0.8 and 0.9. Compared to all the classifiers, the RAF model performs better with the highest testing accuracy of 95.3% for a threshold of 0.9. However, the lowest testing accuracy of 90.75% is obtained for the kNN classifier with all the features. The SVM classifier shows a classification accuracy of 93.34% for all the features as input and for θ = 0.8. However, at θ = 0.85 and 0.9, it reduces to 93%. Compared to all the classifiers, the overall performance of the DCT classifier is found to be the least, whereas the overall performance of the RAF model is better. For the same image set, the wavelet-based machined surface recognition yielded a classification accuracy of 88.89% [56]. Thus, a 7.2% improvement can be observed over the previous study. Therefore, the proposed methodology of the DRT with the TCF and the RAF classifier proves to be better in the classification of machined surfaces.

Table 9 Classification accuracy of ML models using the common features
Fig. 8
figure 8

Classification accuracy of ML model

From Table 7, it is clear that the dimensionality reduction of 62%, 57% and 52% for threshold values of \(\theta =\) 0.8, 0.85 and 0.9, respectively, shows a significant reduction in the computation burden on the computer. The time required for training the ML classifiers using a reduced feature set and all 21 features is given in Table 10. It is clearly noticeable that the time required for training the ML classifiers decreases with the number of features given as input. Thus, the proposed dimensionally reduction methodology effectively selects the reduced set of most sensitive features to be given as input to the ML classifier, reducing the computational burden and time.

Table 10 Time for ML model development

Thus, the effective utilization of GLCM-dependent image characteristics in the texture analysis of the machined surface using machine vision technology is evident. Additionally, the sensitive feature selection or dimensionality reduction in the feature vector helps in improving the accuracy of ML models by removing the non-sensitive or repeated features. The proposed methodology can be verified for different DRT (other than those used in the present study) and will be considered as a scope for future work. Also, the effectiveness of the proposed methodology for other feature extraction techniques will be considered as future work. Deep learning networks are data-hungry systems. Because of the limited data set, deep learning models are not tried. However, the availability of a larger set of machined surface images can be used as input and deep learning networks can be used as classifiers. The feature set may be enhanced by considering newer image feature extraction techniques. The classifier's accuracy can be further improved by training the classifier with more images. Additionally, the proposed methodology can be further extended in robotics to identify materials by analyzing the surface textures. This will benefit the manufacturing industries to move toward the concept of industry 4.0.

9 Conclusions

This study proposes a methodology to classify the machined surface images using GLCM-based image features. This is accomplished by using image processing and ML techniques. The images of the turned ground and shaped machined surfaces were subjected to gray-level conversion, contrast enhancement, and histogram equalization. To address the generalization of inputs to the ML classifiers, the processed images were divided into sixteen evenly spaced areas. Further distance-based DRT viz, FC, SI and BD, along with TCF, were adopted to identify and select the most sensitive GLCM features. Most common features from FC, SI and BD for three different threshold values of 0.8, 0.85 and 0.9 were used as inputs to the ML classifiers (DCT, KNN, RAF, SVM). Among all the classifiers, the RAF model was able to classify with an accuracy of 95.3%, which is validated with tenfold cross-validation technique. The effectiveness of ML models in classifying the machined surfaces was found to be in the order of RAF, SVM, KNN and DCT. Compared to the methodology described in a prior study, the proposed methodology showed a 7.2% improvement in classification accuracy. The overall findings show that, even with a significant reduction in the number of features, the classification accuracies for the training and testing the ML classifiers with reduced feature sets are superior to that with all extracted features. Thus, it can be concluded that GLCM-based image features along with the proposed dimensionality reduction method, can be effectively used to identify and classify the machined surface images.