1 Introduction

At present, my country has entered the information age, and the form of "Internet + " has become an upsurge in people's work and life, and the distance between people has gradually narrowed with the change of communication methods [1, 2]. Especially in recent years, 5G technology has been continuously developed and gradually applied in various fields [3]. With the further development of the mobile network, mobile network application software and application web pages are widely spread among the population, and people are increasingly exposed to a large amount of image information [4,5,6]. As the application scope of mobile network continues to expand, its image information will be more and more. Faced with the explosive growth of image data, how to quickly retrieve interesting images from massive image data is particularly important. The early image retrieval was carried out in the form of text annotation, mainly by manually providing the corresponding keywords of the image, and then searching for related images. However, with the development of the current situation, the number of pictures in the mobile network continues to increase, and the method of manually adding annotations to pictures can no longer meet the development status. It takes a lot of manpower and material resources, and manual judgment is too subjective. The markings are also different, and it is difficult to unify their standards. Based on this, the image retrieval effect is not ideal. In recent years, researchers have focused their attention on content-based image retrieval technology. Based on computer vision, information retrieval, machine learning and other theories, they have developed image visual content extraction and retrieval technology, and have made certain progress [7].

At present, scholars at home and abroad have studied the search methods of network images to improve the search rate and search quality, and have achieved certain research results. For example, Unar et al. divided query images into two categories: text and non-text according to their different image types, and proposed a content-based deterministic image retrieval method based on the fusion of visual image and text image features [8]. Text function. For non-text images, extract visual features, then fuse visual and text features, and retrieve the most similar images. Although this method can retrieve images with high quality, the retrieval speed is too slow. Singer et al. proposed a content image retrieval method based on supervised learning and statistical moments [9], which can complete image retrieval faster, but the accuracy of image retrieval is poor. This paper studies the quick search of key feature images in mobile networks. Based on the rotation invariant local binary pattern, the key features of the image are extracted, and then the product quantizer is generated to calculate the distance of the key features of the image. Based on the approximate nearest neighbor search algorithm, the experimental results of quick image retrieval verifies that the proposed method has a good image search effect, which can achieve fast retrieval of the target image and improve the retrieval performance.

2 Design of quick search method of key feature images

2.1 Image target key feature extraction based on rotation invariant local binary patterns

Local binary mode is a nonparametric operator defined by local neighborhood texture to describe the local spatial structure of an image. It can also be defined as a texture operator with invariant gray scale. The basic idea is that the gray value of the center pixel in the local area of the image is used as the threshold value, and the binary code obtained by comparing with the gray value of its neighborhood pixel is used to express the local texture feature In mobile networks, rotation-invariant local binary mode is used to extract key features from images [10, 11].

The local binary pattern covers only a small area of 8 neighbor pixels. The biggest drawback of texture representation is the incomplete correlation with neighboring pixels. In the mobile network, the expression defects of image texture features will also affect the transmission of image data [12, 13]. The 3 × 3 neighborhood of the basic local binary mode is extended to any neighborhood,, that is, the computation of local binary pattern features is no longer limited to 3 × 3 neighborhood, but select a circular neighborhood with a center point as the center and a radius of \(R\) [14, 15]. \(P\) points are sampled at equal intervals around the circle and the binarization processing is performed by comparing the gray values of these points and the center point.

The coordinate calculation equation of these \(P\) points is as follows:

$$\left(x_k,\;y_k\right)=\left(x_c+R\;\mathit{cos}\;2\;\pi k/R\;\mathit{cos}\;P,\;y_c-R\;\mathit{sin}\;2\;\pi k/R\;\mathit{sin}\;P\right)$$
(1)

In Eq. (1), \(\left({x}_{k},{y}_{k}\right)\) represents the coordinates of the \(k\) -th point, and \({x}_{c}\) and \({y}_{c}\) are the vertical and horizontal coordinates of the central pixel respectively. It can be seen from formula (1) that the coordinates of the image sampling points in the mobile network are not necessarily integers, and the coordinates of the pixel points on the digital image must be integers, otherwise the pixel value of the sampling point cannot be found, this affects the security of image data in a mobile network [16, 17]. Bilinear interpolation algorithm is used to calculate the gray value of coordinate complement integer points.

Assuming that \(\left({x}_{k},{y}_{k}\right)\) is a point with non-integer coordinates in the image, it is expressed as \(\left(i+u,j+v\right)\), where \(i,j\) is a non-negative integer and \(u\) and \(v\) are floating-point numbers in the interval \(\left.\mathrm{0,1}\right)\), the pixel value \(g\left(i+u,j+v\right)\) of this point can be determined by the four pixel values corresponding to \(\left(i,j\right)\), \(\left(i+1,j\right)\), \(\left(i,j+1\right)\) and \(\left(i+1,j+1\right)\) in the original image, as shown in the equation:

$$g\left(i+u,j+v\right)=\left(1-(u+v)\right)g\left(i,j\right)+\left[\left(1-u\right)+u\left(1-v\right)\right]g\left(i,j+1\right)+uvg\left(i+1,j+1\right)$$
(2)

In Formula (2), \(g\left(i,j\right)\) represents the pixel value at point \(\left(i,j\right)\) of the original image.

Assuming that the gray value of the center pixel of the image in the mobile network is \({g}_{c}\), and the gray values of \(P\) sampling points are \({g}_{1}\), \({g}_{1}\),…, \({g}_{p}\) in turn, the calculation of the key feature values of the local binary mode around the central pixel is as follows:

$$LB{P}_{P,R}={\sum }_{i=1}^{P}{2}^{i-1}S{g}_{i}-S{g}_{c}$$
(3)

where, \(S\left(x\right)=\left\{\begin{array}{c}1,x\ge 0\\ 0,x<0\end{array}\right.\).

Using formula (3) to calculate the local binary pattern for each pixel of the image, the texture distribution of the whole image can be obtained. If all pixel values on the image increase (or decrease) by a fixed value at the same time, the local binary value will not change. It can be seen that the local binary mode operator is robust to monotonous grayscale changes. As long as the pixel position of the image remains relatively unchanged, its local binary pattern value does not change, so the local binary pattern operator also has translation invariance.

Combine the method of the node data sensing scheduling in the mobile network [18, 19], The cyclic shift of the key feature data of the image is operated to achieve a local binary pattern with rotation invariance, recorded as \(LB{P}_{P,R}^{ri}\), as shown in the equation:

$$LB{P}_{P,R}^{ri}=\mathit{min}\left\{RORLB{P}_{P,R},RORk\left|K=\mathrm{0,1},\cdots ,P-1\right.\right\}$$
(4)

where \(ROR\left(x,k\right)\) represents the right cyclic shift of the \(P\) -bit binary number \(x\) for \(k\) times \(\left(\left|k\right|\le P\right)\). By introducing the definition of rotation invariance, the local binary pattern operator is not only more robust to the rotation of image, but also further reduces the types of patterns to extract key features.

2.2 Quick search of images in mobile network based on approximate nearest neighbor search algorithm

In order to improve the efficiency of image retrieval and reduce the cost of searching key features, it is necessary to encode the image descriptor into codes with smaller data volume, that is, the quantization process of indexing the image descriptor [20]. In this paper, the approximate nearest neighbor search algorithm is used to encode and index the key features of the extracted image so as to achieve quick search. In the search stage, its quick search can be divided into two steps, one is rough quantization, the other is distance estimation.

2.2.1 Coarse quantization of the image to be queried

Before performing coarse quantization processing on the image to be queried, the image must be preprocessed, and now the key information of the image is quantized and encoded [21].The key feature set space of the input original image is divided into smaller codebook spaces by quantizer. For example, suppose the key feature set of the image is \(x\subset {\mathbb{R}}^{D}\) and the input is vector \(x\in X\), the vector quantizer can be understood as the mapping function \(q\), and the vector \(x\) is quantized to \(q\left(x\right)\in C=\left\{{c}_{0},{c}_{1},\cdots ,{c}_{k-1}\right\}\), where \(C\) is the key feature set \(X\), and the set of eigenvalues quantized by the quantizer is also called the codebook, where the codebook size is \(k\), and \({c}_{i=\mathrm{0,1},\cdots ,k-1}\) is the centroid in the codebook. The set of vectors quantized to the same centroid is called a cluster, and the expression of cluster \({V}_{i}\) corresponding to centroid \({c}_{i}\) is as follows:

$${V}_{i}\underline{def}\left\{q\left(x\right)={c}_{i}\right\}$$
(5)

The \(k\) clusters generated by vector quantization are a spatial division of the key feature set of the image, and the vectors that are divided into the same cluster will be represented by the same centroid\({c}_{i}\).Generally, for a \(D\) -dimensional image key feature vector, in order to obtain 64 bit coding (each dimension accounts for 0.5 bits), the quantizer used needs to contain \(k={2}^{64}\) cluster centers or visual words [22]. However, due to the large number of image key features and the complexity of the training quantizer is several times that of \(k\), it is difficult for the general training quantizer to train such a large number of key features. For this problem, multiple dimensions of the feature vector can be selected for joint quantization.

The specific process is as follows: set the image key feature set input vector as \(\in {R}^{D}\), and divide it into \(m\) low-dimensional vectors \({u}_{j}\left(x\right)\),\(j=\mathrm{0,1},\cdots ,m-1\)\({u}_{j}\left(x\right)\in {R}^{{D}^{^{\prime}}}\), where \({D}^{^{\prime}}=D/m\). To ensure that \(m\) is divisible into\(D\), the vector \(x\) is divided into \(m\) sub vectors:

$$\underset{u_0(x)}{\underbrace{x_1,\cdots, {x}_{D^{\prime }}}},\cdots, \underset{u_{m-1}(x)}{\underbrace{x_{D-{D}^{\prime }+1},\cdots, {x}_D}}$$
(6)

Similarly, the original image set \(X\) will also be split into \(m\) subspaces denoted by \({U}_{j=0,\cdots ,m-1}\subset {\mathbb{R}}^{{D}^{^{\prime}}}\). next, the feature vectors in \(m\) subspace are divided into clusters according to the vector quantization processing method. Let the sub vector of the input vector \({u}_{j}\left(x\right)\in {U}_{j}\), \({q}_{j}\) is the vector quantizer on the sub space \({U}_{j}\), and \({C}_{j}\) is the vector quantization codebook of the space, then the quantized value of vector \(x\) quantized by \(m\) groups of independent vectors is as follows:

$$q\left(x\right)={q}_{0}\left({u}_{1}\left(x\right)\right),\cdots ,{q}_{m}\left({u}_{m}\left(x\right)\right)$$
(7)

where \({q}_{j}\) is the low complexity sub quantizer corresponding to the \(j\) -th sub vector. The product quantizer maps the input image key feature vectors into tuples of exponents by quantizing the sub-vectors separately. The mapping result is the product quantization value of vector \(x\), in which the subspace centroid \({q}_{j}\left({u}_{j}\left(x\right)\right)\in {C}_{j}\), the codebook \({C}_{p}\) obtained by product quantization of the original image set \(X\) is the Cartesian product of the \(m\) groups of independent vector quantization codebooks, and the expression is as follows:

$${C}_{p}={C}_{0}\times \cdots \times {C}_{m-1}$$
(8)

The cluster center of this set is the concatenation of \(m\) -sub quantizers. Assuming that all sub quantizers have \({k}_{s}\) codewords, \({k}_{s}\) is usually set to a very small value in order to limit the complexity of key feature allocation of image. In this case, the codebook generated by the product quantizer through the concatenated sub quantizer code-word is still very large:

$$k={{k}_{s}}^{m}$$
(9)

Assuming that in the very extreme case \(m=D\), All dimension elements of the input image key feature vector \(x\) are quantized separately. \(m\times {k}_{s}\) codewords of all sub quantizers are selected and stored, that is, the value of \(m\times {D}^{*}\times {k}_{s}={k}_{s}\times D\) floating-point numbers.

Let the size of the independent image key feature vector quantization codebook \({C}_{j}\) be \({k}^{^{\prime}}\) (in order to ensure the balanced quantization effect, the codebook size of subspace is the same), it can be seen that the codebook size of \({k}^{^{\prime}}\) is \({\left({k}^{^{\prime}}\right)}^{m}\) [23]. Where \(m\) is recorded as the number of division groups of product quantization. When \(m=1\), product quantization degenerates into vector quantization. The product quantization error \(MSE\left({q}_{p}\right)\) is calculated as follows:

$$MSE{q}_{p}={\sum }_{j}MSE{q}_{j}$$
(10)

In Eq. (10), \(MSE{q}_{j}\) is the quantization error of subspace vector quantization, and \(MSE{q}_{p}\) varies with the number of product quantization groups \(m\) and subspace vector dimension \({D}^{^{\prime}}\). After quantifying the key information of the image, the \(N\) database items \(X=\left\{{x}_{1},{x}_{2},\cdots ,{x}_{N}\right\}\) are divided into \(J\) mutually exclusive groups, each group has a representative vector \({\mu }_{j}\in {R}^{D}\), For each group, the key feature vectors representing the image and the residuals \(x\in {X}_{j}\) of each item of data in the group are calculated. The residual \(x-{\mu }_{j}\) is encoded into a product quantization code and stored as an inverted linked list. Given a key feature vector \(y\in {R}^{D}\) of the image to be queried, select the group \({X}_{j}\) with the closest distance. Calculate the residual \(y-{\mu }_{j}\) between the key feature vector of the image to be queried and the representative vector of the group. When the search space of the mobile network image set is limited by coarse quantization, only the selected data items in the group are calculated, thereby completing the coarse quantization processing of the image to be queried.

2.2.2 Distance estimation of key feature of images to be retrieved

After quantifying the key features of the image, the estimation of the feature distance is completed, and the quick search of the key image is realized [24, 25]. Given a key feature vector of an image to be queried and a segment of product quantization code, the approximate distance of the original vector represented by the feature vector and the product quantization code can be effectively calculated by the asymmetric distance calculation method. First, a vector \(y\in {R}^{D}\) to be queried and a product quantization code \({i}_{x}={\left[{i}^{1},{i}^{2},\cdots ,{i}^{m}\right]}^{T}\in {\left\{\mathrm{1,2},\cdots ,K\right\}}^{M}\) can be defined. In order to calculate the distance \(d{\left(y,x\right)}^{2}\) between the vector to be queried and the key feature vector of the original image represented by the product quantization code, an asymmetric distance \(\widetilde{d}{\left(\cdot ,\cdot \right)}^{2}\) is defined to represent the distance between the vector to be queried and the decoded restoration vector, namely:

$$d{\left(y,x\right)}^{2}\approx \widetilde{d}{\left(y,x\right)}^{2}={\left(dy,d\widetilde{x}\right)}^{2}$$
(11)

The above equation can be solved by calculating the decoding vector obtained by Eq. (11), but the direct calculation is almost the same as the traversal calculation from the original vector library, and the calculation is too complex.

After completing the approximate expression of data set vectors, we can discuss the distance estimation between query vectors and candidate vectors based on area quantization. Let the query vector be \(x\) and the candidate vector set be \({L}_{c}\), the distance between vector \(y\in {R}^{D}\) is obtained from Eq. (12):

$$d{\left(x,y\right)}^{2}={\sum }_{p=1}^{P}{\left[d{u}_{p}\left(x\right),d{u}_{p}\left(y\right)\right]}^{2}$$
(12)

The distance between the image key feature vectors \(x\) and \(y\) in the mobile network can be decomposed into the sum of the distances between the sub-vectors in the second-layer product quantization space. In the \(p\) -th subspace of the second layer product quantization, the distance between the sub vector \({u}_{p}\left(x\right)\) of the query vector \(x\) and the sub vector \({u}_{p}\left(y\right)\) of the candidate set vector \(y\) can be defined as:

$${d}_{p}\left(x,y\right)\underline{def}d{u}_{p}\left(x\right),d{u}_{p}\left(y\right)$$
(13)

Point x \({u}_{p}\left(y\right)\) is approximated by point \(S\), the distance \({d}_{p}\left(x,y\right)\) can be approximated by the distance \({h}_{p}\left(x,y\right)\underline{def}d\left({u}_{p}\left(x\right),s\right)\) between \({u}_{p}\left(x\right)\) and the projection point \(S\). After trigonometric operation:

$${h}_{p}\left(x,y\right)={{b}^{^{\prime}}}_{p}^{2}+{{\lambda }^{^{\prime}}}_{p}^{2}\cdot {{c}^{^{\prime}}}_{p}^{2}+{\lambda }_{p}^{^{\prime}}{{a}^{^{\prime}}}_{p}^{2}-{\lambda }_{p}^{^{\prime}}{{b}^{^{\prime}}}_{p}^{2}-{\lambda }^{^{\prime}}{{c}^{^{\prime}}}_{p}^{2}$$
(14)

where

$${{b}^{^{\prime}}}_{p}^{2}={a}_{p}^{2}+{\lambda }_{p}^{2}\cdot {c}_{p}^{2}-{\lambda }_{p}{a}_{p}^{2}-{\lambda }_{p}{b}_{p}^{2}-{\lambda }_{p}{c}_{p}^{2}$$
(15)
$${{c}^{^{\prime}}}_{p}^{2}={a}_{1p}^{2}+{\lambda }_{p}^{2}\cdot {c}_{p}^{2}-{\lambda }_{p}{a}_{1p}^{2}-{\lambda }_{p}{b}_{1p}^{2}-{\lambda }_{p}{c}_{p}^{2}$$
(16)
$${\lambda }_{p}={\widetilde{\lambda }}_{p}/\left(1-{\overline{\lambda }}_{p}\right)+{\widetilde{\lambda }}_{p}/{\overline{\lambda }}_{p}{\widetilde{\lambda }}_{p}$$
(17)

The final obtained image key feature distance calculation expression is as follows:

$$d{\left(x,y\right)}^{2}={\sum }_{p=1}^{P}\left({{b}^{^{\prime}}}_{p}^{2}+{{\lambda }^{^{\prime}}}_{p}^{2}\cdot {{c}^{^{\prime}}}_{p}^{2}+{\lambda }_{p}^{^{\prime}}{{a}^{^{\prime}}}_{p}^{2}-{\lambda }_{p}^{^{\prime}}{{b}^{^{\prime}}}_{p}^{2}-{\lambda }_{p}^{^{\prime}}{{c}^{^{\prime}}}_{p}^{2}\right)$$
(18)

After obtaining the key feature distance of the image, the product quantization code of the selected group can be obtained by traversing the reverse linked list, and the image quick search can be completed by calculating the asymmetric distance between the residual vector \(y-{\mu }_{j}\) and these PQ codes.

2.2.3 A fast image retrieval method based on non-exhaustive search

Linear search based on asymmetric distance calculation is much faster than direct linear search, but when \(N\) is large, the speed is still not fast enough. In order to handle the search of millions or even billions of data, a search method combined with inverted index is proposed.

The indexing process of the key feature vector \(y\) in the mobile network image library is as follows:

  1. (1)

    The coarse quantizer is used to quantize the key feature vector x to Y

  2. (2)

    Residual vector \(r\left(y\right)=y-{q}_{c}\left(y\right)\) of image key feature vector \(y\) in mobile network is calculated.

  3. (3)

    The product quantizer \({q}_{p}\) quantizes residual vector \(r\left(y\right)\) to \({q}_{p}r\left(y\right)\), that is, \({u}_{j}\left(y\right)\) to \({q}_{j}\left({u}_{j}\left(y\right)\right)\).

  4. (4)

    In the inverted table related to \({q}_{c}\left(y\right)\), add an item including the ID and binary encoding of the image key feature vectors in the mobile network,, that is, the index of the product quantizer.

Querying key feature vectors \(x\) of images and searching the nearest neighbor vector includes four steps:

  1. (1)

    The key feature vector \(x\) of the query image was quantified to the nearest \(w\) clustering centers through a crude quantizer.

  2. (2)

    The square distance \(d\left( {u_{j} \left( {r\left( x \right)} \right),c_{j,i} } \right)^{2}\) between each sub-vector and each cluster center \(c_{j,i}\) in its corresponding sub-quantizer is calculated and stored in the table.

  3. (3)

    Calculate the square distance between the residual vector \(r\left(x\right)\) and the index vector on the inverted list. By calculating the distance between the obtained sub vector and the cluster center, the distance between the residual vector and the index vector on the inverted list is added to obtain the square distance of the two vectors.

  4. (4)

    Select the \(K\) key image feature vectors with the closest distance to the query feature vector \(x\).

In the process of querying the key feature vector \(x\) and searching for the nearest neighbor vector, Only step 3 is related to the size of the mobile network image library.Compared with the general asymmetric distance method, this method has more steps to quantify the query features \(x\) to \({q}_{c}\left(x\right)\), including \({k}^{^{\prime}}\) times of caculation of the distance of D-dimension feature vector. Assuming its inverted tables are balanced, each table has approximately \(n\times w/{k}^{^{\prime}}\) entries. Compared with the general asymmetric distance method, the efficiency of quick image retrieval using the non-exhaustive search method is significantly improved.

3 Results

3.1 Test set preparation

The experiment takes a specific object image as an example, and uses a Sony a6400 mirrorless camera to collect target image feature. The image data includes the test image data set, which is mainly used for method performance testing. In addition, there are some images of complex scenes to test the performance effect of objects. The parameters are shown in Table 1:

Table 1 Camera and image parameters
Fig. 1
figure 1

Image test 1

Fig. 2
figure 2

Image test 2

3.2 Functional test

3.2.1 Test image

The test image files are shown in Figs. 1 and 2:

3.2.2 Positioning result graph

Due to the interference of similar image features in image files, the results of feature extraction from Fig. 3 to Fig. 4 show that the method in this paper can accurately find the image target features. It can be seen from the extracted key feature images that the retrieval success rate is very high when the background is not too complicated. The success rate of retrieval is also guaranteed under complicated background.

Fig. 3
figure 3

Positioning results of image test 1

Fig. 4
figure 4

Positioning results of image test 2

3.3 Analysis of experimental results

In order to test the method proposed in this paper, the key features of rotating invariant local binary mode images are used. Methods in reference [8] and reference [9] were selected as comparison methods to study the image retrieval performance of key features under different complex backgrounds together with the method in this paper. The statistical results are shown in Table 2.

Table 2 Key feature extraction performance

As can be seen from the experimental results in Table 1, the method in this paper adopts rotation-invariant local binary mode to retrieve key feature images, which has good extraction performance in different complex environments., the proposed method can effectively extract key features under the conditions of scale change, image rotation, image blur, and strong illumination. which verifies that the proposed method has good performance of key feature extraction. Although the other two methods can effectively extract key features of images, the feature extraction performance is poor under the conditions of image rotation and image blur. The comparison results effectively verify that the proposed method has superior key feature extraction performance and can be applied to key feature extraction of mobile network images.

The method in this paper uses the rotation-invariant local binary mode to extract the target feature images of the mobile network image, and compares the image extracted by the key features of the rotation-invariant local binary mode with the image extracted by the local binary mode. The comparison results are as follows shown in Fig. 2.

It can be seen from Fig. 5 that the method in this paper uses the rotation-invariant local binary pattern to extract the target feature image of mobile network, which can transform the tiny feature quality of image into a single rotation-invariant pattern and to achieve the effective extraction of key features such as edges and highlights in the image. The key features of the image extracted by the method in this paper have high clarity and intuition, and the number of key feature points extracted is significantly higher than the local binary pattern; the key features of the image extracted by the local binary mode contain more noise. The proposed method has good performance in image key feature extraction using rotation invariant local binary mode, which provides a basis for quick image retrieval.

Fig. 5
figure 5

Comparison of key feature extraction histograms. (a) Local binary mode, (b) Rotation invariant local binary mode

The retrieval performance of retrieval algorithms is very important, the method of literature [8] and the method of literature [9] are selected as the comparison methods, and The f-score value of images quickly retrieved by the proposed method and the other two comparison methods is calculated. The F-score value is the ratio of the arithmetic mean and the geometric mean, which can balance the influence of accuracy and recall rate to evaluate the retrieval results scientifically.. The larger the F-score value, the better the method. The statistical results of the F-score values of the three methods under different sample are shown in Fig. 6.

Fig. 6
figure 6

Comparison result of F-score value

As can be seen from the data in Fig. 6, f-Score values distribution of fast image retrieval in mobile network using this method is more uniform and significantly higher than those obtained by other two methods. The F-score value of quick image retrieval using the method in this paper is higher than 0.9, and the value floats between 0.90 and 0.98,, so the retrieval effect is relatively stable; The F-score values of fast image retrieval methods in literature [8] are not uniformly distributed, with values floating between 0.35 and 0.82. the F-score value distribution of the quick image retireval in literature [9] is relatively stable, but its value is lower than 0.8. The above comparison results verify that the proposed method has high retrieval performance and can achieve accurate image retrieval according to the input keywords.

The average accuracy of quick image retrieval in mobile network is counted.. The mean Average Precision (mAP) can effectively deal with theone-sidedness of recall, precision rate and F-score value. It also considers the ordering of similar images, the higher the rank of similar key feature images returned by the retrieval method, the higher the mAP value. The mAP value reflects the average value of all relevant results in each retrieval.. The mAP value is equivalent to the area contained in the ROC curve with the accuracy as the ordinate and the recall as the abscissa. When the area is 1, the accuracy of quick retrieval reaches is the best. Statistically, the average accuracy rate of quick image retrieval using this method is counted, and the results obtained by the method in this paper and the two contrasting methods are compared. The statistical results are shown in Table 3.

Table 3 Accuracy comparison of different methods

According to the data analysis in Table 3, it can be seen that the average accuracy of the method in this paper is higher than 0.93, while the average accuracy of the other two methods is lower than 0.90. The average accuracy of the method in this paper is significantly higher than the other two algorithms. The method in this paper is used to quick retrieval images in the mobile network. The average accuracy rate can reflect the comprehensiveness and accuracy of the positioning search. The method in this paper has high quick retrieval performance and high applicability.

ANMRR (Average Normalized Modified Retrieval Rate, Average Normalized Retrieval Rank) is an important index to evaluate the retrieval performance And the smaller the ANMRR value, the better the retrieval performance of the algorithm. The ANMRR values of the images were retrieved by this method and the other two methods are counted, and the comparison results are shown in Fig. 7.

Fig. 7
figure 7

Comparison results of ANMRR values

It can be seen from the experimental results in Fig. 7 that the method in this paper can get the desired image based on the input key target image features, and retrieve the image quickly. The average normalized retrieval rank value is less than 0.2, which is significantly lower than the other two methods, which verifies that the method in this paper has high detectability.

4 Conclusion

This paper studies the quick search of key feature images in the mobile network, taking the images in the Internet as a guide, according to the extracted key features, to explore the quick retrieval of images. Experiments show that the method can quick image retrieval based on the extracted key features of the image. Comparing with other methods, the distribution of F-score values obtained by the method in this paper is relatively uniform and higher than 0.90, the ANMRR value is less than 0.2, and the average accuracy of the image search is higher than 0.93, which is better than the comparison method and has better retrieval performance. The main work, advantage and disadvantage of the proposed method and the compared recent works are shown in Table 4. However, the method in this paper does not consider the impact of image clarity on retrieval performance, it is necessary to further optimized in the future.

Table 4 Qualitative analysis form