Abstract
Pneumonia annually kills over 1,800,000 children throughout the world. The vast majority of these deaths occur in resource poor regions such as the sub-Saharan Africa and remote Asia. Prompt diagnosis and proper treatment are essential to prevent these unnecessary deaths. The reliable diagnosis of childhood pneumonia in remote regions is fraught with difficulties arising from the lack of field-deployable imaging and laboratory facilities as well as the scarcity of trained community healthcare workers. In this paper, we present a pioneering class of technology addressing both of these problems. Our approach is centred on the automated analysis of cough and respiratory sounds, collected via microphones that do not require physical contact with subjects. Cough is a cardinal symptom of pneumonia but the current clinical routines used in remote settings do not make use of coughs beyond noting its existence as a screening-in criterion. We hypothesized that cough carries vital information to diagnose pneumonia, and developed mathematical features and a pattern classifier system suited for the task. We collected cough sounds from 91 patients suspected of acute respiratory illness such as pneumonia, bronchiolitis and asthma. Non-contact microphones kept by the patient’s bedside were used for data acquisition. We extracted features such as non-Gaussianity and Mel Cepstra from cough sounds and used them to train a Logistic Regression classifier. We used the clinical diagnosis provided by the paediatric respiratory clinician as the gold standard to train and validate our classifier. The methods proposed in this paper could separate pneumonia from other diseases at a sensitivity and specificity of 94 and 75% respectively, based on parameters extracted from cough sounds alone. The inclusion of other simple measurements such as the presence of fever further increased the performance. These results show that cough sounds indeed carry critical information on the lower respiratory tract, and can be used to diagnose pneumonia. The performance of our method is far superior to those of existing WHO clinical algorithms for resource-poor regions. To the best of our knowledge, this is the first attempt in the world to diagnose pneumonia in humans using cough sound analysis. Our method has the potential to revolutionize the management of childhood pneumonia in remote regions of the world.
Similar content being viewed by others
References
Abaza, A. A., J. B. Day, J. S. Reynolds, A. M. Mahmoud, W. T. Goldsmith, W. G. McKinney, E. L. Petsonk, and D. G. Frazer. Classification of voluntary cough sound and airflow patterns for detecting abnormal pulmonary function. Cough 5:8, 2009.
Abeyratne, U. Blind Reconstruction of Non-minimum-phase Systems from 1-D Oblique Slices of Bispectrum. IET, 1999.
Bhutta, Z. A. Childhood pneumonia in developing countries: refinement of clinical algorithms is a priority. BMJ 333:612, 2006.
Cardoso, M. R. A., C. M. Nascimento-Carvalho, F. Ferrero, F. M. Alves, and S. N. Cousens. Adding fever to WHO criteria for diagnosing pneumonia enhances the ability to identify pneumonia cases among wheezing children. Arch. Dis. Child. 96:58, 2011.
Falade, A., H. Tschäppeler, B. M. Greenwood, and E. K. Mulholland. Use of simple clinical signs to predict pneumonia in young Gambian children: the influence of malnutrition. Bull. World Health Organ. 73:299, 1995.
Frosh, A. Pediatric airway: cry, stridor, and cough. JAMA 301:2392–2392 (2009).
Gold, B., N. Morgan, and D. Ellis. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley-Interscience, 2011.
Harari, M., V. Spooner, S. Meisner, M. Carney, F. Shann, and J. De Campo. Clinical signs of pneumonia in children. Lancet 338:928–930, 1991.
Hirschberg, J., I. Lelli, and Z. Garay. The Pediatric Airway: Cry, Stridor, and Cough. Plural Pub., 2009.
IMCI Integrated Management of Childhood Illness. World Health Organization, 2005.
Knocikova, J., J. Korpas, M. Vrabec, and M. Javorka. Wavelet analysis of voluntary cough sounds in patients with respiratory diseases. J. Physiol. Pharmacol. 59:331–340, 2008.
Kosasih, K., U. Abeyratne, and V. Swarnkar. High frequency analysis of cough sounds in pediatric patients with respiratory diseases. In: Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE. 2012. IEEE.
Lozano, J., M. Steinhoff, J. Ruiz, M. Mesa, N. Martinez, and B. Dussan. Clinical predictors of acute radiological pneumonia and hypoxaemia at high altitude. Arch. Dis. Child. 71:323–327, 1994.
Mendel, J. M. Tutorial on higher-order statistics (spectra) in signal processing and system theory: theoretical results and some applications. Proc. IEEE 79:278–305, 1991.
Mulholland, E., E. Simoes, M. Costales, E. McGrath, E. Manalac, and S. Gove. Standardized diagnosis of pneumonia in developing countries. Pediatr. Infect. Dis. J. 11:77, 1992.
Ng, A. K., T. S. Koh, E. Baey, T. H. Lee, U. R. Abeyratne, and K. Puvanendran. Could formant frequencies of snore signals be an alternative means for the diagnosis of obstructive sleep apnea? Sleep Med. 9:894–898, 2008.
Oppenheim, A. V., R. W. Schafer, and J. R. Buck. Discrete-Time Signal Processing. Vol. 1999. Englewood Cliffs, NJ: Prentice Hall, 1989.
Palafox, M., H. Guiscafré, H. Reyes, O. Muñoz, and H. Martínez. Diagnostic value of tachypnoea in pneumonia defined radiologically. Arch. Dis. Child. 82:41–45, 2000.
Piirilä, P., and A. Sovijärvi. Differences in acoustic and dynamic characteristics of spontaneous cough in pulmonary diseases. Chest 96:46–53, 1989.
Pneumonia Strategy Overview. Bill & Melinda Gates Foundation (November 2009).
Smith, J. A., J. E. Earis, and A. A. Woodcock. Establishing a gold standard for manual cough counting: video versus digital audio recordings. Cough 2:6, 2006.
Suttorp, N., T. Welte, and R. Marre. Community-Acquired Pneumonia. Springer, 2007.
Swarnkar, V., U. Abeyratne, A. Chang, Y. Amrulloh, A. Setyati, and R. Triasih. Automatic identification of wet and dry cough in pediatric patients with respiratory diseases. Ann. Biomed. Eng. 41, 2013. doi:10.1007/s10439-013-0741-6.
Wardlaw, T., E. W. Johansson, and M. Hodge. Pneumonia: The Forgotten Killer of Children. UNICEF/WHO, 2006.
Thorpe, W., M. Kurver, G. King, and C. Salome. Acoustic analysis of cough. In: Intelligent Information Systems Conference, The Seventh Australian and New Zealand 2001. 2001. IEEE.
WHO/UNICEF. Global Action Plan for the Prevention and Control of Pneumonia (GAPP). WHO Reference Number: WHO/FCH/CAH/NCH/09.04, 23 (2009).
Acknowledgments
This work was supported by the Bill & Melinda Gates Foundation, USA, under its Grand Challenges in Global Health Explorations Grant program (Grant ID# OPP1008199 GCE to Abeyratne).
Author information
Authors and Affiliations
Corresponding author
Additional information
Associate Editor Merryn Tawhai oversaw the review of this article.
Appendix
Appendix
Our method requires the computation of a number of mathematical features from cough sounds. This Appendix describes the features we computed from each sub-segment x i, i = 1, 2, 3 of a recorded cough sound x.
The Bispectrum Score (BGS)
The 3rd order spectrum of a signal is known as the bispectrum.2 Unlike the power spectrum (the 2nd order spectrum based on the autocorrelation), the bispectrum preserves Fourier phase information. The bispectrum \( B_{xi} \left( {\omega_{1} ,\omega_{2} } \right) \) of the segment x i can be estimated from (5) as,
where \( W\left( {\tau_{1} ,\tau_{2} } \right) \) is a bispectrum window function such as the minimum bispectrum-bias supremum window14 used in this paper, \( C_{xi} \left( {\tau_{1} ,\tau_{2} } \right) \) is the third order cumulants of x i estimated with (6), and, \( \omega_{1} ,\omega_{2} \) denotes digital frequencies.
In (6), Q is the length of the 3rd order correlation lags considered and x i is a zero-mean signal.
The bispectrum is a 2D signal. However, it can be proven2 that for linear signals, any 1D oblique slice of the bispectrum other than the slices parallel to the axes: ω1 = 0, ω2 = 0 and ω1 + ω2 = 0 carries sufficient information to characterise the entire 2D bispectrum within a phase factor. In this work, we capture the information available in the bispectrum via the diagonal slice P(ω) defined by ω1 = ω2 = ω, i.e. \( P\left( \omega \right) = B_{xi} \left( {\omega , \omega } \right). \)Then the Bispectrum Score (BSG) is computed as defined in (7). In (7) we used k 1 = 90 Hz, k2 = 5 kHz, k3 = 6 kHz and k4 = 10.5 kHz.
Non-Gaussianity Score (NGS)
NGS score is a numerical measure of non-Gaussianity of a given segment of data x i . The normal probability plot can be utilized to obtain a visual measure of the Gaussianity of a set of data, and the NGS score is a way of quantifying the non-Gaussianity based on regression analysis. We used (8) to estimate the NGS score, where p and q represents the normal probability plots of the reference normal data and the analysed data (x i ). The symbol N is the number of data points used in the probability plot.
Formant Frequencies
In speech analysis, formants frequencies (FF) are referred to as the resonances of the vocal tract.16 In cough analysis, it is reasonable to expect that the resonances of the overall airway that contribute to the generation of a cough sound will be represented in the formant structure. One classic example for this is wheeze. Existence of mucous can also change acoustic properties of airways. We included the first four formants (F1, F2, F3, F4) in our candidate feature set. We computed F1–F4 by peak picking the Linear Predictive Coding (LPC) spectrum of cough segments x i. For this work we used a 14th order LPC model with the parameters determined via the Levinson-Durbin recursive procedure.17
Log Energy (LogE)
The log energy for every sub-segment x i was computed using (9):
In (9) ε is an arbitrarily small positive constant added to prevent any inadvertent computation of the logarithm of 0.
Zero Crossing (Zcr)
The number of zero crossings was counted for each sub-segment x i.
Kurtosis (Kurt)
The kurtosis is a measure of how peaky the probability density distribution of x i is. It is the fourth central moment of x i and can be computed using (10), where μ and σ respectively denote the mean and the standard deviation of x i.
Mel-Frequency Cepstral Coefficients (MFCC)
MFCCs have been widely used in speech recognition systems.7,9 MFCC provides some resilience to the non-linguistic sources of variance in speech signals. They also provide orthogonal features making facilitating the training of the classifier. The computation of MFCC involves the estimation of short-term power spectra, mapping to Mel frequency scale and then computing the cepstral coefficients. In our work, we included 12 MFCC coefficients in our feature set.
Rights and permissions
About this article
Cite this article
Abeyratne, U.R., Swarnkar, V., Setyati, A. et al. Cough Sound Analysis Can Rapidly Diagnose Childhood Pneumonia. Ann Biomed Eng 41, 2448–2462 (2013). https://doi.org/10.1007/s10439-013-0836-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10439-013-0836-0