1 Introduction

With the rapid development of 5G communication of things, cloud computing, and mobile Internet technology, smart city has become a research content [1,2,3,4]. Smart grid monitoring is an important component of smart city construction. Detection of high voltage power line security is one of the main tasks of smart grid. Insulator, as the insulation equipment in the overhead power line, plays an important role in the transmission of electric power [5]. Due to long-term exposure in the wild, insulators can be easily damaged or defected by other things such as the thunderbolt, material aging, contamination, etc. This way may lead to serious accidents if we cannot find it and deal with it in time. To ensure reliable functioning of all the power equipment, regular inspections are necessary. The traditional inspections of insulator are completely manually, which is time-consuming, hard sledding, and higher risk due to the power lines lie in the poor operating environment and the vast distribution area. We collect large amounts of image data using sensor network that is the main technology foundation of Internet of Things. Through the image analysis, we can realize the automatic monitoring of the power grid.

In the past few years, machine learning methods [6] have been widely used in the field of automatic identification of objects and have got a good recognition rate. Some scholars propose the automatic identification of insulators using the machine learn. In [7,8,9], the texture features of insulators are extracted by using wavelet transform and Orthogonal S-Transform Discrete respectively, and then, the feature is fed into a classifier for training and identification. Because of the complexity of the background of insulator, which includes forests, rivers, sky, and various high-voltage lines, it cannot fully describe the characteristics of the insulator by simply extracting the single type of texture feature in reference [8, 9]; the recognition rate is poor and the omission factor is higher.

The single feature cannot represent the image detail. Some papers use different types of feature to describe the image. In [10], the color feature and the shape feature are extracted and combined with a high dimensional feature to describe the object in the images, and a good retrieval results is obtained. In [11], the author extracts the color feature and the shape feature and combines a high-dimensional feature in traffic lights recognition; the recognition rate is 95%. In [12], the color feature and the ellipse shape feature are used for face location and recognition. In addition, the texture feature and the color feature are also used for image content retrieval and have achieved great results [13,14,15,16].

These researchers extract two types of features and combine them in the automatic identification or image retrieval. The average recognition rate is higher than it with the single type of feature.

In fact, human beings receive the image information that is the combination information of three types of features such as the color feature, the texture feature, and the shape feature [17, 18]. Aiming to fully describe the characteristics of the insulators, we combine these three types of feature, which are extracted separately, to a multi-type feature set.

Additionally, there are many different feature extraction methods for each type feature. For example, the color feature extraction methods include color histogram, color invariant, color moments, etc. The texture feature extraction methods include Gray Level Co-occurrence Matrix (GLCM), Tamura, Gabor translation, etc. The shape feature extraction methods have contour-based extraction and region-based extraction. Features extracted from different feature extraction methods in the same type describe image information from various angles. For this reason, it is very difficult to find suitable feature extraction methods in various of ways. In this paper, we use a variety of feature extraction methods for each type and combine them to a multi-type feature. At last, the multi-type feature is sent into the classifier to recognize, and the recognition rate is higher than that of the existing literature.

Recently, k-nearest neighbor algorithm has been widely used in pattern recognition because of its simplicity and good empirical success. In this paper, k-nearest neighbor algorithm is chosen as a classifier to identify the insulator. Firstly, we extract separately the color feature, texture feature, and shape feature from the aerial image. Then, the three different type features are packaged into a multi-type feature vector that is sent to KNN classifier to classify insulator. Experiment demonstrates that the multi-type feature is very effective for fully describing the characteristics of the insulator. In a comparison with single type feature, our method yields a better classification performance.

The paper is organized as follows. In Section 2, we introduce acquisition of dataset. Thereafter in Section 3, we demonstrate the feature extraction method of the color feature, the texture feature, and the shape feature and introduce the k-nearest neighbor algorithm in Section 4. Experiments are presented in Section 5. We conclude with a discussion in Section 6.

2 Dataset acquisition

In this paper, 6548 images are collected using unmanned aerial vehicle equipped with G15 Canon camera in various environments that include forests, gobi, fields, cities, etc. (see Fig. 1 for some examples) based on the Internet of Things. To ensure that the sample data contains different light sources, we also collect images in sunny and cloudy conditions. By removing the fuzzy and invalid data, the dataset finally contains a total of 1548 valid positive samples and 5000 negative samples.

Fig. 1
figure 1

Example of partial data sample in the real work situation

As shown in Fig. 1, we can find that the image background is complicated in practical application. The color of the insulator is close to the color of the tree in the image taken in the forest, and there is a variety of complex light source in these images. The background in the image taken in the city contains various objects such as house, street, field, river, etc. Since these objects do not have the uniform shape and color, it is difficult to separate the insulator from its background. In addition, the shape of the high-voltage tower is similar to that of insulators in the image, so we also cannot identify the insulator by a single type of shape feature.

In the experiments, we divide the sample into two parts: the training set that contains 1000 positive samples and 2000 negative samples and the test set that contains 500 positive samples and 1000 negative samples. In order to improve the efficiency of the program, we normalize each sample data to 100 × 100 pixels.

3 Multi-type feature extraction

Image features are mainly composed of three types of features: color feature, shape feature, and texture feature. The single type of feature cannot fully describe the characteristics of objects in the aerial image with complex background. In this paper, we focus on the extraction of all types of features and the combination of the multi-type feature that sent to the k-NN classifier to identify the insulator. Experiments show high recognition rate.

3.1 Color feature extraction

Color feature is the most widely used visual characteristics in image retrieval and is less dependent on the size, direction, and angle of the image. For this reason, it has a high robustness. In this paper, we extract the color feature using the methods of color histogram, color moments, color coherence vector, and color correlogram. Due to the color feature of the image is closely related to the physiological characteristics of visual system of observer and experience of observer, we also extract the mean, variance, standard deviation, and correlation coefficient of the three color components in different color space such as RGB [19], HIS [20], YIQ [21], and HSV [22]. Finally, we get the 393-dimensional color feature by the above method.

3.1.1 Color histogram

Color histogram reflects the distribution of the image color distribution or the probability of each color appeared [23]. To calculate the color histogram of the image, we firstly need to carry out the color quantization, that is, the color space is divided into some color cells and each cell becomes a histogram of the bin. Then, the color histogram can be obtained by calculating the number of pixels in each bin. Color histogram, which relates to the proportion of the each color in the whole image rather than the space position of each color, is a global color feature. So, the color histogram is not only affected by rotation, translation, and scale changes of image, but also cannot express the color spatial distribution.

3.1.2 Color moments

The color moments [24] that belong to the global color feature proposed by Stricker and Orengo are a simple and effective way to represent the image color feature. Because any color distribution in the image can be expressed by its moments, and the color distribution information is mainly concentrated in the low-order moments, we only use the first-order moments, second-order moments, and third-order moments to describe the color distribution. The color moments can be defined as

$$ \left\{\begin{array}{l}{\mu}_{\mathrm{i}}=\frac{1}{n}\sum \limits_{j=1}^n{h}_{ij}\\ {}{\sigma}_i={\left[\frac{1}{n}\sum \limits_{j=1}^n{\left({h}_{ij}-{\mu}_{ij}\right)}^2\right]}^{\frac{1}{2}}\\ {}{s}_i={\left[\frac{1}{n}\sum \limits_{j=1}^n{\left({X}_{ij}-{\mu}_i\right)}^3\right]}^{\frac{1}{3}}\end{array}\right., $$
(1)

where hij is the probability of the gray value of j in the ith color component and n is the gray levels. We obtain three lower-order moments for each color component in each image, and finally, we get a total of nine color features for each image.

3.1.3 Color coherence vector

Pass et al. [25] proposed the color coherence vector method because the color histogram and the color moments cannot describe the spatial information of color. The pixels of each section in the color histogram are divided into two parts. If an area is larger than the threshold, the area is a coherence vector; otherwise, it is not a coherence vector. Since the color coherence vector includes the spatial information of color, it has a good recognition rate.

3.1.4 Color correlogram

Color correlogram proposed in literature [26] is another way to describe the spatial distribution of the color using color correlation between pixels in an image.

For any image I, Ic(i) is the collection of all the pixels of the color value of Ic(i), then the color correlogram can be defined as

$$ {r}_{i,j}^{(k)}=\underset{p1\in {\mathrm{I}}_{\mathrm{c}(i)},p2\in {\mathrm{I}}_{\mathrm{c}(j)}}{\Pr}\left[p2\in {\mathrm{I}}_{\mathrm{c}(j)},\left|p1-p2\right|=k|p1\in {I}_{\mathrm{c}(i)}\right], $$
(2)

where i and j are the gray, k is the maximum distance between the pixels during calculating the color correlogram, and |p1−p2| is the distance between p1 and p2. So, the color correlogram can use the kth color component of two pixels pair (i, j) to express the probability of two pixels, whose distance is k between c(i) of the color value and c(j) of the color value.

3.2 Texture feature extraction

Texture is a spatial distribution of the pixels in the image, which can be expressed by the correlation between neighboring pixels [27, 28]. Texture analysis is a process of a qualitative or quantitative description of the texture that is extracted by image processing technology. Due to the complicated visual perception of human and the diversity of texture, it is difficult to find a mathematical model to accurately describe the texture features. All of the existing methods on the extraction of the texture feature do not describe the texture feature of all images by using one method.

We extract the texture feature by using some methods such as Gabor transform, GLCM, gray-gradient co-occurrence matrix, gray level difference analysis method, and gray level run-length method.

3.2.1 Gabor transform

Gabor filter [29] uses multi-scale and directional of Gabor wavelet to extract the texture feature. In the feature extraction process, it can expand the amount of filters to obtain richer texture feature. The Gabor filter is defined as

$$ \mathrm{Gabor}\left(x,y\right)=\frac{1}{2{\pi \sigma}^2}\exp \left(-\frac{{\overline{x}}^2+{\overline{y}}^2}{2{\pi \sigma}^2}\right)\left(\cos \frac{2\pi \overline{x}}{l}+j\sin \frac{2\pi \overline{x}}{l}\right). $$
(3)

However, Gabor wavelet is sensitive to the edge of image and has a direction invariance and scale invariance. It is effective on extracting the feature of texture by using the Gabor filter characteristics of illumination insensitive in aerial image.

3.2.2 Gray level co-occurrence matrix

Haralick and Shanmugam [30] studied the gray level dependence of image texture from the point of view of mathematics and constructed the gray level co-occurrence matrix based on the relationship of the azimuth and distance between pixels in the image. This matrix describes the spatial structure and correlation of the pixel pair based on the spatial relationship of the gray value and analyzes the image texture by using the texture feature. The texture feature is extracted from the statistical information of the direction, the distance, and the variation scope of the gray level between the pixel pair in the image. We calculate some values as texture feature in image, including angular second-order moments, contrast, correlation coefficient, entropy, difference entropy, difference moment, inverse difference moment, sum of mean, sum of variance, sum of entropy, and difference variance.

3.2.3 Gray-gradient co-occurrence matrix

Gray-gradient co-occurrence matrix [31] is used to extract the texture feature by using the comprehensive information of gray level and gradient of the image. The element H(x, y) of the matrix is defined as the number of pixels with gray value x in the normalized gray image F(i, j) and gradient value y in the normalized gradient image G(i, j), namely the number of elements in the set{(i, j)| F(i, j) = x ∩ G(i, j) = y, i, j = 0, 1, ⋯, N − 1}, where F(i, j) ∈ [0, L − 1], G(i, j) ∈ [0, Lg − 1]. And then, the gray-gradient co-occurrence matrix is normalized

$$ \overset{\wedge }{H}\left(x,y\right)=\frac{H\left(x,y\right)}{\sum \limits_{x=0}^{L-1}\sum \limits_{y=0}^{L_g-1}H\left(x,y\right)}. $$
(4)

The gradient expresses the maximum change rate of the gray level. The value of the gradient of every pixel in the image indicates the intensive degree of the equal-gray line at the pixel. So, the large gradient advantage and the small gradient advantage can reflect the change intensity of the gray level. The number of pixel of small gradient advantage is large in the image whose gray level changes smoothly, and the number of pixel of large gradient advantage is large in the image whose gray level changes sharply.

In the paper, we calculate some values as texture feature in image, including the small gradient advantage, the large gradient advantage, etc.

3.2.4 Gray level difference analysis method

Gray level difference analysis method (GLDM) [32] is a common method for texture extraction. First, suppose that the position of certain point in the image is (x, y), then the difference of gray level is gΔ(x, y) = g(x, y) − g(x + Δx, y + Δy) between it and another point (x + Δx, y + Δy) close to it, where gΔ(x, y) is gray level difference. For the whole image roaming, we get (x + Δx, y + Δy) of many points. Suppose that the difference of gray level can be m series, we can get the histogram count the numbers of pixel for each order. Finally, the probability PΔ(i) corresponding to each series can be determined according to the histogram.

The texture is more rough when i is smaller and PΔ(i) is larger. On the contrary, smooth probability distribution indicates fine textures.

In the paper, we calculate some values as texture feature in image, including the contrast, average, and entropy, etc.

3.2.5 Gray level run-length method

Gray level run-lengths [33] proposed by Galloway mainly get the number of pixels with the same continuous value of gray level in certain direction. Gray level run lengths matrix is defined as the frequency of gray level of i of j consecutive points in certain direction, denoted by P(i, j), where i = 0,1,2, ..., M-1, j = 0,1,2, ..., N-1, M is gray level series, and N is the maximum run length. So, the gray level run-length matrix is a matrix of M*N which is composed of P(i, j).

The gray level run-length matrix requires to calculate 256-level gray level. Due to the large calculating works, in the paper, we transform it into 8, 16, and 32 level to reduce computation works.

3.3 Shape feature extraction

Since the insulator has a regular shape, shape feature can improve the recognition rate in some specific background.

This paper uses moment invariants method [34] to extract the shape feature. Moments mainly characterize the geometric characteristics of the image area, and because it has invariant characteristics such as rotation, translation, and scaling, it is also known as moment invariants. The main idea of moment invariants is to use some moments insensitive to changes and based on area as the shape feature. In the image processing, the moment invariants can be used as an important feature to represent the object and then be used in the image classification.

Given the image area f(x, y), its (p + q) order moment is defined as:

$$ {m}_{pq}=\sum \limits_{x=1}^M\sum \limits_{y=1}^N{x}^p{y}^qf\left(x,y\right). $$
(5)

The origin moment is:

$$ {u}_{pq}=\sum \limits_{x=1}^M\sum \limits_{y=1}^N{\left(x-{x}_0\right)}^p{\left(y-{y}_0\right)}^qf\left(x,y\right), $$
(6)

where p, q = 0, 1, 2, …, n. As can be seen, mpq is associated with the position of the origin, and therefore, the moment invariants do not have the translation invariance. On the contrary, upq subtracts the position of the origin, and therefore, it has the translation invariance but does not have the rotational invariance. So, it cannot have both of the invariant characteristics such as rotation, translation, and scaling for either moments or central moments.

To make the image features have translation, rotation, and scale invariance, normalize the central moments as follows:

$$ {y}_{pq}={u}_{pq}/{u}_{00}^r.\kern0.5em \mathrm{where}\kern0.5em r=\left(p+q+2\right)/2,\mathrm{p},\mathrm{q}=2,3,\dots $$
(7)

It has a high recognition rate and recognition speed by using the Hu moment invariants [35] to identify the image with simple texture or large scale. But for images with rich texture, since Hu moment invariants only use lower-order moments, a lot of texture details cannot be well represented, and it has a poor recognition rate.

4 Classifier

The k-nearest Neighbors algorithm (k-NN) [36, 37] is one of the most basic and simple machine learning method for pattern recognition. The basic principle of k-NN is that the type of a sample is same to the type of the largest number of samples among the k samples nearest to that query point.

k-NN algorithm as follow:

  1. 1.

    Calculate the distance between the current point and each point of known type in the data set;

  2. 2.

    Select k points in in the minimum distance from the current point;

  3. 3.

    Calculate the number of samples of each type among k points;

  4. 4.

    Return the type of the largest number of samples in the k point as the type of the current point.

As for the distance measurement, Euclidean distance is the most common way to represent the distance between two or more points. In Euclidean space, the distance between two points x = (x1, ..., xn) and y = (y1, ..., yn) is

$$ d\left(x,y\right)=\sqrt{{\left({x}_1-{y}_1\right)}^2+{\left({x}_2-{y}_2\right)}^2+\dots +{\left({x}_n-{y}_n\right)}^2}=\sqrt{\sum \limits_{i=1}^n{\left({x}_i-{y}_i\right)}^2}. $$
(8)

We can describe the algorithm according to the above procedure:

figure a

In the current paper, we applied the k-NN classifier to identify insulators. Specific experimental process will be discussed in detail in the next section.

5 Numerical results and discussions

In this section, we will examine and verify the analytical results from multiple angles. We make an additional discussion to end this section. The simulation is carried out based on the dataset mentioned in the second section. MATLAB 2014a software environment is used to simulate different localization algorithms and compare their performance.

Different value of k leads to different recognition effects in the k-NN. In order to get the best value of k, we take different value of k to experiment with the color feature, the texture feature, the shape feature, and the combination feature of them respectively, and find that the recognition rate is highest for the color features, the texture feature, and the combination feature when k = 1, and for the shape feature, recognition rate is the highest when k = 4. As shown in Fig. 2, although the recognition rate, in k = 4, achieves the highest for the shape feature, but only is 72.8%, this is much lower than that of other types. So, no matter what the value of k is, the recognition rate based on shape feature is lower than the recognition rate of other type of features because the number of shape features we extract is far less than the number of other types. On the other hand, the total amounts of the shape feature are far less than that of other types; therefore, the shape feature has little effect on the recognition rate. To get the better recognition rate, k-NN is 1 in our experiments.

Fig. 2
figure 2

Accuracy from different k of k-NN. As shown in the figure, the blue line that is the highest recognition rate denotes the recognition rate of combined features. The red line is the recognition rate of color features. The green line is the recognition rate of texture features and the purple line, i.e., the recognition rate of shape features, denotes the lowest recognition rate

Our data set is collected in various environments that include forest, gobi, fields, cities, etc. Real targets have relative complex background, low contrast, and insufficient brightness, which usually lead to decrease of the ability of target detection and recognition and even miss the target. In order to test the robustness of our model, we divide the image into three categories according to the background whose category consists mainly of three parts: the building, the sky, and the forest. We take different value of k to experiment with the images whose background are mainly the building, the sky, and the forest respectively. As shown in Fig. 3, the images whose background is mainly building have a higher accuracy than other types of images when the k is greater than 2. Due to the color of the insulator is very similar to the background color, the accuracy of the images whose background is mainly the forest is the lowest, only 70%. The recognition rate of the image whose background is mainly sky reaches more than 90% regardless of the value of k because the object is obviously different from its background in color, texture, and shape. The analysis shows it is necessary to improve the recognition rate of the image whose background is mainly forest if the robustness of the model is further improved.

Fig. 3
figure 3

Accuracy with images of different category background. As shown in the figure, the blue curve denotes the accuracy of the image with the background of the buildings. The green curve denotes the accuracy of the image with the background of the sky, and the red curve denotes the accuracy of the image with the background of the forest

The recognition rate will be greatly improved by using richer multi-type feature instead of the single type of feature that cannot fully describe the characteristics of the insulator in aerial image with diverse background and complex light. We randomly select different amounts of features from four types of feature such as the color feature, the texture feature, the shape feature, and the multi-type feature. Experiment shows that the recognition rate is proportional to the number of features. As shown in Figs. 4, 5, 6, and 7, the recognition rate is the highest for the multi-type feature; the recognition rate for the color feature close to it for the texture feature and the recognition rate for the shape feature are the lowest because the low total number of features. However, no matter how much is the recognition rate, the recognition rate improves with the increase of the number of features.

Fig. 4
figure 4

Recognition rates with different number of color features

Fig. 5
figure 5

Recognition rates with different number of texture features

Fig. 6
figure 6

Recognition rates with different number of shape features

Fig. 7
figure 7

Recognition rates with different number of multi-type features

The multi-type feature, proposed in this paper, contains the color feature, the texture feature, and the shape feature, and the total number of feature reaches 748 dimensions. Rich features can describe the characteristics of the insulator with a variety of backgrounds, and results in a high recognition rate by using it. As shown in Table 1, the recognition rate using the multi-type feature is much higher than the others, achieving 99%. And from Fig. 8, we can find that using the multi-type feature to identify not only high recognition rate but also its false-positive rate, which also equals to 0, is more effective than the other three types.

Table 1 Accuracy from different types of feature

6 Conclusion

With the wide application of Internet of Things, we can get a lot of monitor images data of power line. By analyzing these data, we find the insulators are commonly embedded in the complex background, and a single type of feature is not enough to discriminate insulator from its background in the aerial image. In this paper, we propose the combination of multi-types feature involving color feature, texture feature, and shape feature. The combination of multi-type feature and k-nn algorithm is combined into a classification model for identifying insulator. As shown in the experiment, we have obtained better classification performance over a single type of feature. Moreover, our explorations indicate that multi-type feature can fully meet the application requirements of the automatic identification of insulator proposed by the power sector.

Fig. 8
figure 8

ROC curves. As shown in the figure, the purple curve is the ROC of combined features. The blue curve is the ROC of color features. The green curve is the ROC of texture features, and the soil green curve is the ROC of shape features

In the future, we believe that investigating more sophisticated techniques for improving the accuracy, including the deep convolution neural network [38], will be beneficial. Moreover, since it is time-consuming for fitting parameters of model, we propose the share feature method between object detection using CNNs in order to further improve the efficiency of model training.