Research on application of multimedia image processing technology based on wavelet transform
- 111 Downloads
With the development of information technology, multimedia has become a common information storage technology. The original information query technology has been difficult to adapt to the development of this new technology, so in order to be able to retrieve useful information in a large amount of multimedia information which has become a hot topic in the development of search technology, this paper takes the image in the multimedia information storage technology as the research object, uses the wavelet transform to divide the picture into the advantages of the low-frequency and high-frequency characteristics, and establishes the multimedia processing technology model based on the wavelet transform. The simulation results of face, vehicle, building, and landscape images show that different wavelet basis functions and different layers of images are decomposed, and the retrieval results and retrieval speed of images are different, When taking four layers of wavelet decomposition, the cubic b-spline wavelet as the wavelet basis function makes the classification result optimal, and the accuracy rate is 89.08%.
KeywordsImage retrieval Wavelet transform Wavelet base Multimedia retrieval
Joint Photographic experts group
Support Vector Machine
Multimedia generally refers to images, graphics, texts, and sounds. As an important information carrier, images have features such as intuitive images and rich content. They are an important way of expressing information. Image processing technology has become a major content of multimedia processing technology. Especially with the development of multimedia technology and the arrival of the information age, people are increasingly exposed to a large number of image information. How to effectively organize, manage, and retrieve large-scale image databases has become an urgent problem to be solved.
In the research of multimedia image retrieval technology, from the aspect of feature representation, it can basically be divided into three directions: (1) Based on the color features of the image, the color feature is the most widely used visual feature in image retrieval and is the most intuitive, and most obviously, it is one of the most important perceptual features of image vision. The main reason is that the color is often very much related to the objects or scenes contained in the image. In addition, compared with other visual features, the color feature has less dependence on the size, orientation, and viewing angle of the image itself, so that it has higher stability and higher robustness, and the calculation is simple, so it is widely used at present. Users can input the color features that they want to query and match the information in the color feature library. The color-based feature extraction method can better represent the color information of the image. At present, the methods of color feature extraction mainly include color histogram [1, 2], color moments [3, 4], color sets , color coherence vector [6, 7], and color correlogram . (2) Based on the retrieval of image texture features, texture features are visual features that reflect the homogeneity of the image independently of color or brightness. It is a common intrinsic feature of all surfaces. Texture features contain important information about the structure and arrangement of the surface and their relationship with the surrounding environment. Because of this, texture features are widely used in content-based image retrieval, and users can find other images that contain similar textures by submitting images that contain some kind of texture. In the texture feature retrieval method, co-occurrence matrix [9, 10, 11] and Gabor filter [12, 13, 14] are two commonly used methods. (3) Based on the retrieval of the image shape feature, the shape information of the image does not vary with the color and other characteristics of the image, so it is a stable feature of object. Especially for graphics, shape is its only important feature. In general, there are two kinds of representations of shape features, one is a contour feature and the other is a region feature. The former only uses the outer boundary of the object, while the latter relates to the entire shape area. The most typical methods for these two types of shape features are Fourier shape descriptions [15, 16] and moment invariants [17, 18].
The concept of wavelet transform was first proposed by J. Morlet, an engineer engaged in petroleum signal processing in France, in 1974. The inversion formula was established through practical experience of physical intuition and signal processing. The essential difference between wavelet analysis and Fourier analysis is that Fourier analysis only considers the one-to-one mapping between the time domain and frequency domain. It uses the function of a single variable (time or frequency) to represent the signal, while the wavelet analysis uses the joint time scale function to analyze the non-stationary signal. The difference between wavelet analysis and time-frequency analysis is that time-frequency analysis represents a non-stationary signal in the time-frequency plane. Wavelet analysis describes that the non-stationary signal is also in the two-dimensional plane; it is not on the time-frequency plane but on the so-called time scale plane. In the short-time Fourier transformation, the signal is observed at the same resolution (that is, a uniform window function), and in the wavelet analysis, the signal is observed at different scales or resolution. This multi-scale or multi-resolution view in signal analysis is the basic point of wavelet analysis.
The basic idea of wavelet analysis is derived from Fourier analysis, which is a breakthrough development of Fourier analysis. It is not only a powerful analytical technique, but also a fast calculation tool, which has both important theoretical significance and practical value. Wavelet analysis is a powerful tool for characterizing the internal correlation of signal data and has powerful power in data compression and numerical approximation. Due to its “self-adaptive” and “mathematical microscope properties,” it has become the focus of much attention in many disciplines.
In the research of pattern recognition, wavelet analysis can be used to decompose the low frequency and high frequency of the frequency to represent the characteristics of the signal. It is widely used in many fields of signal analysis. Wavelet analysis is a frequency analysis method that has been widely used in many fields for feature analysis [19, 20, 21, 22, 23].
Wavelet analysis is also often used to analyze image features during image analysis. In the research of Li et al. , the fusion of multi-sensor images is realized by wavelet transform. The goal of image fusion is to integrate supplementary information from multi-sensor data, making the new image more suitable for human visual perception and computer processing and for the purpose of tasks such as segmentation, feature extraction, and object recognition. The proposed scheme performs better than the Laplacian-based approach. It is recommended to use a specially generated test image for performance measurement and to evaluate different fusion methods and compare the advantages of different wavelet transforms with ker Nelsons’ extensive experimental results. Chang and Kuo  used the advantage of wavelet transform to propose a multi-resolution method based on improved wavelet transform, called tree structure wavelet transform or wavelet packet. The development of this transformation is that a large class of natural textures can be modeled as quasi-periodic signals with a dominant frequency in the intermediate frequency channel. The transform can be scaled up to any desired frequency channel for further decomposition. In contrast, conventional pyramid structure wavelet transforms perform further decomposition in the low-frequency channel. A progressive texture classification algorithm has been developed, which not only has computational appeal, but also has excellent performance.
In this paper, the multimedia retrieval technology is studied with the image as the research object, as well as the wavelet decomposition of images, extraction of image features, comparison of the effect of different wavelet bases on the recognition results, and the effect of different decomposition layers on the recognition results in the retrieval and analysis process. Through the recognition results of face, vehicle, building, and landscape images, the optimal wavelet basis function and the optimal number of layers are selected, and an image retrieval model based on wavelet decomposition is established.
Design an image retrieval method based on wavelet decomposition
Analyze the influence of different wavelet bases on image retrieval and obtain the optimal wavelet base for wavelet decomposition
Analyze the influence of different layer decomposition on image retrieval and get the optimal layer
2 Proposed method
2.1 Wavelet theory
where ψ(t) is the mother wavelet, a is the scale factor, and τ is the translation factor.
In the past decade, wavelet analysis has made rapid progress in both theory and method. People study from three different starting points: multi-resolution, framework, and filter bank. At present, the description of function space, construction of wavelet basis, cardinal interpolation wavelet, vector wavelet, high-dimensional wavelet, multi-band wavelet, and periodic wavelet are the main research directions and hotspots of wavelet theory. Nowadays, people have recognized multi-resolution processing in computer vision, subband coding in speech and image compression, non-stationary signal analysis based on non-uniform sampling grids, and wavelet series expansion in applied mathematics are only the same theory. That is, different views of wavelet theory.
In application, wavelet analysis has quite an extensive application space due to its good time-frequency localization characteristics, scale variation characteristics, and directional characteristics. Its application areas include many disciplines of mathematics, quantum mechanics, theoretical physics, signal analysis and processing, image processing, pattern recognition and artificial intelligence, machine vision, data compression, nonlinear analysis, automatic control, computational mathematics, artificial synthesis of music and language, medical imaging and diagnosis, geological exploration data processing, fault diagnosis of large-scale machinery, and many other aspects. The scope of its application is constantly expanding. Wavelet analysis is used as an important analytical theory and tool in almost all subject areas, and fruitful results have been achieved in the research and application process.
2.2 Wavelet basis
Finite support in time domain, that is, the length of ψ(t) is finite and its high-order origin ∫tpψ(t)dt = 0, p = 0 ∼ N. The longer the N value, the longer the length of ψ(t).
In the frequency domain, ψ(ω) has a N zero point at ω.
ψ(t) and its integer displacement are orthogonal.
2.3 Color characteristics of the image
Color features are the most widely used visual features in image retrieval. Colors allow the human brain to distinguish between objects’ brightness and boundaries. In image processing, color is based on well-established descriptions and models. Each system has its own characteristics and scope of use. When processing images, color systems can be determined according to requirements and can be used in different color systems. A color feature is a global feature that describes the surface properties of a scene corresponding to an image or image area. The general color feature is based on the characteristics of the pixel, at which point all pixels belonging to the image or image area have a white contribution. The color is often related to the background of the object in the image, and compared with other visual features, the color feature has less dependence on the size, direction, and viewing angle of the image itself and thus has higher robustness.
Since the color is insensitive to changes in the direction, size, etc. of the image or image area, the color feature does not capture the local features of the object in the image well. In addition, when only the color feature is used, if the database is very human, many unneeded images are often retrieved. Color histograms are the most commonly used methods for expressing color features. They have the advantage of being unaffected by image rotation and translation changes. Further, normalization is not affected by image scale changes. The disadvantage is that color space distribution is not expressed. Color histograms are color features that are commonly used in many image retrieval systems. It describes the proportion of different colors in the entire image and does not care about the spatial position of each color, that is, the object or object in the image cannot be described. Color histograms are particularly well suited for describing images that are difficult to white-divide.
2.4 Image texture features
The so-called image texture reflects a local structural feature of the image, which is expressed as a certain change in the gray level or color of the pixel in a neighborhood of the image pixel, and the change is spatially statistically related. The arrangement of texture primitives and primitives consists of two elements. Texture analysis methods include statistical methods, structural methods, and model-based methods.
A texture feature is also a global feature that also describes the surface properties of a scene corresponding to an image or image region. However, since the texture is only a characteristic of the surface of the object and does not fully reflect the essential properties of the object, high-level image content cannot be obtained by only using the texture feature. Unlike color features, texture features are not pixel-based features, and they require statistical calculations in regions that contain multiple pixels. In pattern matching, this regional feature has greater advantages and cannot be successfully matched due to local deviations. As a statistical feature, texture features often have rotational invariance and are more resistant to noise. However, texture features also have their disadvantages. One obvious drawback is that when the resolution of the image changes, the calculated texture may have a large deviation. In addition, due to the possibility of being affected by illumination and reflection, the texture reflected from the image is not necessarily the actual texture of the surface of the object, for example, reflections in water. The effects of reflections from smooth metal surfaces, etc., can cause texture changes. Since these are not the characteristics of the object itself, when applying texture information to a search, sometimes these fake textures can be “misleading” to the search.
The use of texture features is an effective method when searching for texture images that have large differences in thickness, density, and the like. However, when there is little difference between the easily distinguishable information such as the thickness and the density between the textures, the usual texture features are difficult to accurately reflect the difference between the textures of different human visual perceptions.
3 Experimental results
3.1 Data sources
This data is based on the face database cas-Peal of the Institute of Technology of the Chinese Academy of Sciences. The database was built in 2003, including 1040 face samples. The face image of the database is complex, including faces with different positions, such as front and side, and face samples with different time periods. To meet the requirements of sample diversity, the database includes samples of men and women of different ages, and the images include a variety of backgrounds.
In order to verify the correctness and robustness of this method, the second data in this paper comes from life, using life pictures taken by MI 4, including vehicle, building, and landscape, 200 photos of each type, the picture size is 92 × 112, and the picture is converted to BMP (Bitmap) format using JPEG (Joint Photographic experts group).
3.2 Experimental environment
The data processing in this paper is performed in MATLAB R2014b 8.4 software environment. The main parameters of the hardware environment are Intel Core i7-4710HQ quad-core processor, Kingston DDR3L 4G memory, and Windows 7 Ultimate 64-bit SP1 operating system.
3.3 Classification method
In order to guarantee the stability of the classification, this paper uses a Support Vector Machine (SVM) as a classifier that uses a linear kernel function, in which the test set and training set of the sample are divided by a 10-fold cross-validation method, and the sample is divided into 10 samples, one of which was used as a test sample and nine were taken as training samples.
4.1 Image preprocessing
As can be clearly seen from Fig. 1, after the wavelet transform, the picture becomes blurred, but the basic features of the face, such as the eyes, mouth, nose, cheeks, eyebrows, and other features are still very clear. The result of the blurred picture shows that the number of feature tables is few, and the basic picture features clearly show that although the feature is reduced, it does not affect the feature extraction.
4.2 The influence of the wavelet parameters on the classification
The factors affecting the recognition result and efficiency are mainly the wavelet basis and wavelet layer number. The choice of wavelet basis directly affects the quality of feature extraction and affects the final retrieval rate. The number of wavelet layers determines the number of features in the recognition. The higher the number of layers, the more features of the image. This paper compares the effects of five kinds of wavelet bases such as Daub(2), Daub(4), Daub(6), cubic b-spline wavelet, and orthogonal base wavelet on the recognition results. At the same time, the effects of 1-, 2-, 3-, 4-, and 5-layer wavelet transform on classification efficiency are compared.
The result can be obtained from Fig. 6. When the cubic b-spline wavelet is used as the wavelet basis function and the four-layer wavelet decomposition is used as the layer number, the classification effect is optimal and the accuracy rate is 89.08%.
Image retrieval time
Number of layers
4.3 Recognition results of non-face images
Three results of four layers, cubic b-spline wavelet transform
Retrieval average time-consuming
Retrieval time variance
The results in Table 2 show that using the four-layer wavelet analysis and the cubic b-spline wavelet as the wavelet basis, the recognition rates of the three types of pictures are all higher. Among them, the recognition rate of buildings and vehicle is higher than that of landscape. The reason may be that the frequency characteristics of vehicle and buildings are obvious, but the frequency characteristics of landscape are not obvious. From the training time and retrieval time of the same sample number, although the retrieval accuracy rate of landscape is the lowest, the landscape consumes the most time. The training time is 2.7 times and 2.4 times that of the vehicle and the building respectively, and the variance of the retrieval time is also relatively large, which shows that the method has a poor effect on the feature extraction of the landscape images.
4.4 Low-frequency and high-frequency recognition rate
Multimedia resources have become a way for people to obtain information. Intelligent query of multimedia information is a new hotspot of data mining technology. In the query of multimedia information, the query algorithm design is one of the main aspects. Although the wavelet transform has been successfully used and image research, the optimal selection problem between the number of layers in the wavelet transform and the wavelet basis function has not been solved in the image retrieval process. In this paper, wavelet analysis is used as an image feature query method to analyze face, vehicle, building, and landscape images. The wavelet bases on different wavelet basis functions and the number of decomposition layers are analyzed, and the accuracy and query speed are used as evaluation indicators, and the effects of different wavelet basis functions and layers on the results are compared and analyzed.
The authors thank the editor and anonymous reviewers for their helpful comments and valuable suggestions.
Availability of data and materials
Please contact author for data requests.
HGK is the correspondence author, and KS is the first author. Both authors read and approved the final manuscript.
301 Art Center Chung-Ang University 221 Heukseok-dong Dongjak-gu, Seoul, 156–756 Korea.
Kun Sui was born in Qingdao, Shandong, P.R. China, in 1982. Doctor of Technology Art, Lecturer. Graduated from the Korea Dong Yang University in 2009. Worked in Qingdao Agricultural University. His research interests include New Media Art and digital image processing.
*Author for correspondence:
Hyung-Gi Kim, was born in Korea, in 1960.
Doctor of Technology Art, Professor. Graduated from the Soongsil University in 2009. Worked in Graduate school of Advanced Imaging Science, Multimedia and Film Chung-Ang University, Seoul, Korea. He has held eleven successful solo Media Art Exhibitions and participated in many group exhibitions. His research focuses on 3D display systems, projection mapping, kinetic art, interactive media art, and media performance.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 2.Liu H, Zhao F, Chaudhary V. Pareto-based interval type-2 fuzzy c-means with multi-scale JND color histogram for image segmentation. Digital Signal Process. 76, 75-83 (2018)Google Scholar
- 3.L. Li, K. Liu, F. Cheng, An improved TLD with Harris corner and color moment. Proceedings of the Spie 225, 102251P (2017)Google Scholar
- 6.I.M. Stephanakis, G.C. Anastassopoulos, L. Iliadis. A self-organizing feature map (SOFM) model based on aggregate-ordering of local color vectors according to block similarity measures. Neurocomputing 107, 97-107 (2013)Google Scholar
- 8.D. Chai, K.N. Ngan, Face segmentation using skin-color map in videophone applications. IEEE Trans Csvt 9(4), 551–564 (1999)Google Scholar
- 13.Jain A K, Farrokhnia F. Unsupervised texture segmentation using Gabor filters[C]// IEEE International Conference on Systems, Man and Cybernetics, 1990. Conference proceedings. IEEE, 2002:1167–1186Google Scholar
- 15.Navarro-Alarcon D, Liu Y H. Fourier-based shape servoing: a new feedback method to actively deform soft objects into desired 2-D image contours. IEEE Trans. Robot., 2018, PP(99):1–8Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.