1 Introduction

Throat polyps are small fleshy growths which form on the vocal cords, usually as a result of overuse. They are mainly caused by straining or overusing the voice, for example, public speaking. Professional singers, sports/fitness coaches, or actors are all prone to developing throat polyps. The most common symptoms of throat polyps are a hoarse or deeper voice or a breathy sounding voice similar to laryngitis.

The traditional methods of throat polyp diagnosis are indirect laryngoscope, video-laryngoscope, and stroboscope light [1]. These methods need special instrument and depend on the experience of the pathologists. Usually, the patients will feel uncomfortable pain. Due to the fact that voice change of the patients is the most common symptoms of throat polyps, it would be desirable if the throat polyps could be detected based on the patient voices. In [1], Zhong et al. tried to detect throat polyps based on patient voices. Two fuzzy classifiers and a Bayesian classifier were designed for throat polyp detection based on patient vowel voices /a:/ and /i:/. The experimental results showed that an interval type 2 fuzzy classifier performed the better. In this paper, we will use the compressive sensing and support vector machine (SVM) algorithm to detect the throat polyps with patient vowel voices /a:/ and /i:/ while reducing the burden of voice data collection and storage.

Compressive sensing (or compressed sampling (CS)) theory demonstrated that a high-dimensional signal can be projected into a low-dimensional space with a random measurement matrix when the signal was sparse or compressible which was proposed by Donoho and Candès in 2006 [2, 3]. Then, the original signal can be recovered from the low-dimensional information with solving an optimization problem. The provable success of CS for signal reconstruction motivated that the low-dimensional signal contained the main features of the original signal. Thus, the universality of CS theory can be leveraged in the hypothesis testing problem and mitigate the complexity of data computing [4].

The hypothesis testing in compressed domain can not only reduce the pressure of data storage and transmission but also overcome the large amount of data calculation. In [5, 6], Budhaditya used the compressed sensor network data for anomaly detection based on spectrum theory method and obtained satisfactory detection results in the light of residual analysis of compressed data. In [7, 8], random projection in conjunction with principal component analysis (PCA) was implemented for anomaly detection in compressed domain, and an application of this proposed methodology to detect IP-level volume anomalies in computer network traffic suggested a high relevance to practical problems. In [9], an anomaly detection criterion based on wavelet packet transform and statistic process control theory in compressed domain was used for through wall human detection. The experimental results showed that the proposed algorithm could effectively detect the existence of human being through compressed signals.

Because of the advantage of compressed classification in big data based on compressive sensing comparative with classification in original data [1015], a throat polyp detection algorithm based on compressive sensing and support vector machine is proposed in this paper. The remainder of this paper is organized as follows: In Section 2, the compressive sensing theory will be introduced. Throat polyp detection procedure based on CS and SVM will be deduced in Section 3. Experimental results of throat polyp detection will be shown in Section 4. Section 5 is the conclusion and discussion.

2 Background on compressive sensing

Compressive sensing states that the signal often contains some type of structure that enables intelligent representation and processing which builds on a core tenet of signal processing and information theory [16].

Suppose that an observer makes measurements of a signal x, it can be expressed as follows:

x = Φ θ ,
(1)

where θRN is the expansion coefficient vector under the orthonormal basis Φ. If θ has only K ≤ N nonzero coefficients, we can say that signal x is K-sparse.

The surprising result of CS is that a length-N signal that is K-sparse in some basis can be recovered exactly/approximately from a nonadaptive linear projection of the signal onto a random basis. In matrix notation, it can be described as follows [1721]:

y = Ψ x ,
(2)

where y is an M × 1 column vector and Ψ is an M × N random matrix. The appeal of CS is that we only need to collect M = O(K log(N/K)) random measurements to recover the signal x by solving the following l0-norm-constrained optimization problem:

θ = arg min θ 0 s . t . y = Ψ Φ θ ,
(3)

where the ||θ||0 norm counts the number of nonzero components of θ. However, solving Equation 3 was both numerically unstable and NP-complete. Instead of solving the l0 minimization problem, nonadaptive CS theory seeks to solve the ‘closest possible’ tractable minimization problem, i.e., the l1 minimization:

θ = arg min θ 1 s . t . y = Ψ Φ θ .
(4)

Although M < N, the recovery of the signal x from the measurements y become possible and practical under the additional assumption of signal sparsity or compressibility. The provable success of CS for signal reconstruction can indicate that the collected low-dimensional measurements contained the main features of the original signal. Therefore, it provides us a novel procedure for hypothesis testing of big data which can be carried out in the compressed domain.

3 Throat polyp detection procedure

The most common symptom and the first to typically appear in the throat polyp patients is a general roughness or hoarseness of the voice, which may or may not be accompanied by a sore throat or a full feeling in the throat. In other words, the frequency components of the same voice such as vowel voices will be varied when a person suffers throat polyps. Therefore, we can detect the throat polyps by analyzing the frequency component of the voice signal.

3.1 Acquire the frequency component by WPT

Fourier transform (FT) is the conventional signal frequency spectrum analysis tool which is a global transform and has low-frequency resolution. Due to its shortage in recognizing the tiny change of the frequency spectrum of FT, wavelet packet transform (WPT) has become the widest implement in the field of signal frequency analysis.

WPT is one extension of the wavelet transform (WT) which provides a complete level-by-level decomposition. It can enable the extraction of features from signals which combine stationary and nonstationary characteristics with an arbitrary time-frequency resolution [22].

In this paper, we extract the features of vowel voices /a:/ and /i:/ to detect the throat polyps. According to the principle of WPT, the vowel voice signal x(t) is decomposed into j levels of decomposition, and the node signals are reconstructed as x j i t . Then, it can be expressed as follows:

x t = i = 1 2 j x j i t .
(5)

The node signal energy can be defined E j i as

E j i = - x j i t 2 dt = x j i t 2 .
(6)

On the basis of WPT, Equation 6 illustrates that the node signal energy E j i stores the energy of a specific time-frequency window. In other words, E j i indicates the proportion of corresponding frequency component in the original signal. Thus, according to the principle described above, the throat polyp detection can be achieved by investigating the changing trend of E j i .

In order to eliminate the influence of volume for throat polyp detection, we define Δ E j i as the node signal energy ratio in the total signal energy further:

Δ E j i = E j i i = 1 m E j i ,
(7)

where m denotes the first dominant nodes which contained the main energy of the signal. It can eliminate the noise effect on the tiny energy node and improve the detection accuracy.

3.2 Throat polyp detection procedure with SVM

Support vector machines (SVM) are a popular machine learning method for classification, regression, and other learning tasks. In this method, one maps the data into a higher dimensional input space and one constructs an optimal separating hyper plane in this space [23, 24].

Given a training set of N data points (y i , x i ), i = 1, 2, 3, …, N where x i Rn and y i ∈ {1, -1}. The classifier is constructed as follows. One assumes that

ω T φ x i + b 1 , if y i = 1
ω T φ x i + b 1 , if y i = - 1
(8)

which is equivalent to

y i ω T φ x i + b 1 , i = 1 , 2 , , N
(9)

where φ is a nonlinear function which maps the input space into a higher dimensional space. However, the function (9) is not explicitly constructed. In order to obtain the separating hyper plane in the higher dimensional space, variables ξ i are introduced to solve the following primal optimization problem

min ω , b , ξ 1 2 ω T ω + C I = 1 N ξ i
subject to y i ω T φ x i + b 1 - ξ i .
(10)

Through the training data, we can obtain the support vectors and kernel parameters in the model for prediction.

As we know, a continuous speech usually consists of vowels and voiced consonants. The vocal cord does not vibrate when producing voiced consonants which come from the vibration of the lips and teeth. It will bring up interference in the afterward steps because of useless signal collection. Meanwhile, multi-vowel in one speech sample will result in an aliasing in spectrum map of the sample. Therefore, in this paper, we only use the vowels /a:/ and /i:/ to detect the throat polyps of the patients.

In this paper, we acquired the vowel voice signal based on compressive sensing and extract the features constructed by frequency components of compressed signals. Lastly, the SVM method is used to obtain the classification model for throat polyp detection. The procedure is depicted in Algorithm 1.

4 Experimental results and analysis

In the experiments, vowel /a:/ and /i:/ voice signals of 26 patients were collected, among which 13 patients have throat polyps and 13 patients did not have throat polyps. The Gaussian random measurement matrix was used for compressed data obtained. The compressed voice signals were decomposed by eight-layer wavelet packet with ‘db10’ wavelet, and the first 20 node signals were used to extract the features. The C-SVM program proposed by Dr. Lin was used for setting up the classification model and throat polyp prediction [23].

In the first experiment, we used 7,000 samples in the original vowel /a:/ and /i:/ of 26 patients, respectively, to construct the features. The compression ratio is 50%, and the features of eight normal patients without throat polyps and eight abnormal patients with throat polyps were used for training to establish a classification model. The other features of ten patients were used to test the performance of the proposed algorithm (Algorithm 1).

Figures 1 and 2 showed the node energy ratio of compressed vowel /a:/ and /i:/ signals of a normal patient without throat polyps and an abnormal patient with throat polyps. It can be seen that the frequency components of two kinds of patients are different and the low-frequency component in vowel voice signals changed more obviously when the patient has throat polyps. In other words, the frequency component of patients would vary when he or she suffers throat polyps. Thus, frequency component energy of vowel voice signals could be used as the features for throat polyp detection.

Figure 1
figure 1

Node energy ratio of compressed vowel /a:/ voice signal for a normal and abnormal patient.

Figure 2
figure 2

Node energy ratio of compressed vowel /i:/ voice signal for a normal and abnormal patient.

Figure 3 showed the prediction results of throat polyp patients under different random measurement matrices based on the proposed algorithm (Algorithm 1). We can see that the correct rate of prediction is about 50% with small fluctuations. It indicates that the features used for test and training were similar although they were obtained under different measurement matrices. Meanwhile, we repute that the low correct rate of prediction and small fluctuations were caused by the few training samples.

Figure 3
figure 3

Correct rate of throat polyp prediction under different random measurement matrices.

In the second experiment, we used different samples in the original vowel /a:/ and /i:/ of 26 patients, respectively, to construct the features with the same measurement matrix. The compression ratio, the training data, and test data were the same with the first experiment. The results were shown in Figure 4, while the correct rate of prediction has a mean value of ten predictions, respectively, for each number of samples. It can be seen that the correct rate of prediction was about 50% at different number of samples, while the fluctuations also were caused by the few training date and could not construct a high-accuracy prediction model. Meanwhile, the results demonstrated that our classifier is able to detect throat patient with a small number of samples.

Figure 4
figure 4

Correct rate of throat polyp prediction under different number of samples.

5 Conclusions

Big data refers to large, diverse, complex, longitudinal, and distributed data sets. Some core technologies are needed to solve the problem in big data such as classification technology. Compressive sensing theory provided a new approach for big data classification which overwhelmed the limitation of Nyquist sampling theory and could sample and compress data simultaneously.

In this paper, we used the compressive sensing theory to acquire the compressed vowel voice signals for throat polyp detection. The frequency component energy ratios of compressed data obtained by wavelet packet transform were used as features. Then, the support vector machine intelligent algorithm was used to detect the existence of throat polyps. The experimental results showed that the performance of prediction was stable, but the correct rate of prediction is low, due to the few samples of patient cases.