A deep transfer learning approach for improved post-traumatic stress disorder diagnosis

  • Debrup Banerjee
  • Kazi IslamEmail author
  • Keyi Xue
  • Gang Mei
  • Lemin Xiao
  • Guangfan Zhang
  • Roger Xu
  • Cai Lei
  • Shuiwang Ji
  • Jiang Li
Regular Paper


Post-traumatic stress disorder (PTSD) is a traumatic-stressor-related disorder developed by exposure to a traumatic or adverse environmental event that caused serious harm or injury. Structured interview is the only widely accepted clinical practice for PTSD diagnosis but suffers from several limitations including the stigma associated with the disease. Diagnosis of PTSD patients by analyzing speech signals has been investigated as an alternative since recent years, where speech signals are processed to extract frequency features and these features are then fed into a classification model for PTSD diagnosis. In this paper, we developed a deep belief network (DBN) model combined with a transfer learning (TL) strategy for PTSD diagnosis. We computed three categories of speech features and utilized the DBN model to fuse these features. The TL strategy was utilized to transfer knowledge learned from a large speech recognition database, TIMIT, for PTSD detection where PTSD patient data are difficult to collect. We evaluated the proposed methods on two PTSD speech databases, each of which consists of audio recordings from 26 patients. We compared the proposed methods with other popular methods and showed that the state-of-the-art support vector machine (SVM) classifier only achieved an accuracy of 57.68%, and TL strategy boosted the performance of the DBN from 61.53 to 74.99%. Altogether, our method provides a pragmatic and promising tool for PTSD diagnosis. Preliminary results of this study were presented in Banerjee (in: 2017 IEEE international conference on data mining (ICDM), IEEE, 2017).


Speech based PTSD diagnosis Deep belief network Deep learning Transfer learning 



This research is partially supported by DOD under grant W81XWH-15-C-0099. The authors would like to thank UHCMC for providing the Ohio dataset. The support of NVIDIA Corporation for the donation of the TESLA K40 GPU used in this research is gratefully acknowledged.


  1. 1.
    Banerjee D, Islam K, Mei G, Xiao L, Zhang G, Xu R, Ji S, Li J (2017) A deep transfer learning approach for improved post-traumatic stress disorder diagnosis. In: 2017 IEEE international conference on data mining (ICDM), IEEE, pp 11–20Google Scholar
  2. 2.
    Bengio Y (2009) Learning deep architectures for AI. Found Trends® Mach Learn 2(1):1–127MathSciNetzbMATHGoogle Scholar
  3. 3.
    Bijleveld H-A (2015) Post-traumatic stress disorder and stuttering: a diagnostic challenge in a case study. Proc Soc Behav Sci 193:37–43Google Scholar
  4. 4.
    Brown SM, Webb A, Mangoubi R, Dy JG (2015) A sparse combined regression-classification formulation for learning a physiological alternative to clinical post-traumatic stress disorder scores. In: AAAI, pp 1700–1706Google Scholar
  5. 5.
    Calvo RA, D’Mello S (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Affect Comput 1(1):18–37Google Scholar
  6. 6.
    Deng L, Li J, Huang J-T, Yao K, Yu D, Seide F, Seltzer M, Zweig G, He X, Williams J, et al (2013) Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 8604–8608Google Scholar
  7. 7.
    Dieleman S, Schrauwen B (2014) End-to-end learning for music audio. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 6964–6968Google Scholar
  8. 8.
    Edwards AL (1948) Note on the correction for continuity in testing the significance of the difference between correlated proportions. Psychometrika 13(3):185–187Google Scholar
  9. 9.
    Farrús M, Hernando J, Ejarque P (2007) Jitter and shimmer measurements for speaker recognition. In: Eighth annual conference of the international speech communication associationGoogle Scholar
  10. 10.
    Foa EB, Steketee G, Rothbaum BO (1989) Behavioral/cognitive conceptualizations of post-traumatic stress disorder. Behav Ther 20(2):155–176Google Scholar
  11. 11.
    Friedman MJ (2007) PTSD history and overview. United States Department of Veterans AffairsGoogle Scholar
  12. 12.
    Galatzer-Levy IR, Ma S, Statnikov A, Yehuda R, Shalev AY (2017) Utilization of machine learning for prediction of post-traumatic stress: a re-examination of cortisol in the prediction and pathways to non-remitting ptsd. Transl Psychiatr 7(3):e1070Google Scholar
  13. 13.
    Galatzer-Levy IR, Karstoft KI, Statnikov A, Shalev AY (2014) Quantitative forecasting of ptsd from early trauma responses: a machine learning application. J Psychiatr Res 59:68–76Google Scholar
  14. 14.
    Garofolo John S, Lamel Lori F, Fisher William M, Fiscus Jonathan G, Pallett David S, Dahlgren Nancy L, Victor Z (1993) TIMIT acoustic-phonetic continuous speech corpus, 1993. Linguistic Data Consortium, PhiladelphiaGoogle Scholar
  15. 15.
    Grinage BD (2003) Diagnosis and management of post-traumatic stress disorder. Am Fam Phys 68(12):2401–2408Google Scholar
  16. 16.
    Gulzar T, Singh A, Sharma S (2014) Comparative analysis of IPCC, MFCC and BFCC for the recognition of Hindi words using artificial neural networks. Int J Comput Appl 101(12):22–27Google Scholar
  17. 17.
  18. 18.
    Hansen JHL, Kim W, Rahurkar M, Ruzanski E, Meyerhoff J (2011) Robust emotional stressed speech detection using weighted frequency subbands. EURASIP J Adv Signal Process 2011(1):906789Google Scholar
  19. 19.
    Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554MathSciNetzbMATHGoogle Scholar
  20. 20.
    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetzbMATHGoogle Scholar
  21. 21.
    Hovens JE, Van der Ploeg HM, Klaarenbeek MTA, Bramsen I, Schreuder JN, Rivero VV (1994) The assessment of posttraumatic stress disorder: with the clinician administered ptsd scale: Dutch results. J Clin Psychol 50(3):325–340Google Scholar
  22. 22.
    Kamishima T, Hamasaki M, Akaho S (2009) Trbagg: a simple transfer learning method and its application to personalization in collaborative tagging. In: Ninth IEEE international conference on data mining, 2009, ICDM’09, IEEE, pp 219–228Google Scholar
  23. 23.
    Karen-Inge K, Galatzer-Levy Isaac R, Alexander S, Zhiguo L, Shalev Arieh Y (2015) Bridging a translational gap: using machine learning to improve the prediction of ptsd. BMC Psychiatr 15(1):30Google Scholar
  24. 24.
    Kessler RC, Rose S, Koenen KC, Karam EG, Stang PE, Stein DJ, Heeringa SG, Hill ED, Liberzon I, McLaughlin KA (2014) How well can post-traumatic stress disorder be predicted from pre-trauma risk factors? An exploratory study in the who world mental health surveys. World Psychiatr 13(3):265–274Google Scholar
  25. 25.
    Kim J-H, Woodland PC (2001) The use of prosody in a combined system for punctuation generation and speech recognition. In: Seventh European conference on speech communication and technologyGoogle Scholar
  26. 26.
    Knoth B, Vergyri D, Shriberg E, Mitra V, Mclaren V, Kathol A, Richey C, Graciarena M (2018) Systems for speech-based assessment of a patient’s state-of-mind. US Patent WO2016028495 A1Google Scholar
  27. 27.
    Krothapalli SR, Koolagudi SG (2013) Characterization and recognition of emotions from speech using excitation source information. Int J Speech Technol 16(2):181–201Google Scholar
  28. 28.
    Kumaraswamy R, Odom P, Kersting K, Leake D, Natarajan S (2015) Transfer learning via relational type matching. In: 2015 IEEE international conference on data mining (ICDM), IEEE, pp 811–816Google Scholar
  29. 29.
    Kunze J, Kirsch L, Kurenkov I, Krug A, Johannsmeier J, Stober S (2017) Transfer learning for speech recognition on a budget. ArXiv preprint arXiv:1706.00290
  30. 30.
    Li X, Tao J, Johnson MT, Soltis J, Savage A, Leong KM, Newman JD (2007) Stress and emotion classification using jitter and shimmer features. In: IEEE international conference on acoustics, speech and signal processing, 2007, ICASSP 2007, vol 4. IEEE, pp IV–1081Google Scholar
  31. 31.
    Litman DJ, Hirschberg JB, Swerts M (2000) Predicting automatic speech recognition performance using prosodic cues. In: Proceedings of the 1st North American chapter of the association for computational linguistics conference. Association for Computational Linguistics, pp 218–225Google Scholar
  32. 32.
    Marinić I, Supek F, Kovačić Z, Rukavina L, Jendričko T, Kozarić-Kovačić D (2007) Posttraumatic stress disorder: diagnostic data analysis by data mining methodology. Croat Med J 48(2):185–197Google Scholar
  33. 33.
    Muda L, Begam M, Elamvazuthi I (2010) Voice recognition algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (dtw) techniques. ArXiv preprint arXiv:1003.4083
  34. 34.
    Omurca S, Ekinci E (2015) An alternative evaluation of post traumatic stress disorder with machine learning methods. In: 2015 International symposium on innovations in intelligent systems and applications (INISTA), IEEE, pp 1–7Google Scholar
  35. 35.
    Ooi KEBrian, Low LSA, Lech M, Allen N (2012) Early prediction of major depression in adolescents using glottal wave characteristics and Teager energy parameters. In: 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 4613–4616Google Scholar
  36. 36.
  37. 37.
    Ptsd and symptoms (2018) Accessed 20 June 2018
  38. 38.
    Pan SJ, Yang Q (2010) A survey on transfer learning. EEE Trans Knowl Data Eng 22(10):1345–1359Google Scholar
  39. 39.
    Pitman RK (1989) Post-traumatic stress disorder, hormones, and memory. Biol Psychiatr 26(3):221–223Google Scholar
  40. 40.
    Pratt LY (1993) Discriminability-based transfer between neural networks. In: Advances in neural information processing systems, pp 204–211Google Scholar
  41. 41.
    Ramaswamy S, Madaan V, Qadri F, Heaney CJ, North TC, Padala PR, Sattar SP, Petty F (2005) A primary care perspective of posttraumatic stress disorder for the department of veterans affairs. Prim Care Compan J Clin Psychiatr 7(4):180Google Scholar
  42. 42.
    Rozgic V, Vazquez-Reina A, Crystal M, Srivastava A, Tan V, Berka C (2014) Multi-modal prediction of ptsd and stress indicators. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 3636–3640Google Scholar
  43. 43.
    Scherer S, Lucas GM, Gratch J, Rizzo AS, Morency L-P (2016) Self-reported symptoms of depression and ptsd are associated with reduced vowel space in screening interviews. IEEE Trans Affect Comput 7(1):59–73Google Scholar
  44. 44.
    Scherer S, Stratou G, Gratch J, Morency L-P (2013) Investigating voice quality as a speaker-independent indicator of depression and ptsd. In: Interspeech, pp 847–851Google Scholar
  45. 45.
    Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813Google Scholar
  46. 46.
    Sparr LF, Bremner JD (2005) Post-traumatic stress disorder and memory prescient medicolegal testimony at the international war crimes tribunal? J Am Acad Psychiatr Law Online 33(1):71–78Google Scholar
  47. 47.
    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetzbMATHGoogle Scholar
  48. 48.
    van den Broek EL, van der Sluis F, Dijkstra T (2010) Telling the story and re-living the past: how speech analysis can reveal emotions in post-traumatic stress disorder (ptsd) patients. In: Sensing emotions, Springer, pp 153–180Google Scholar
  49. 49.
    Vergyri D, Knoth B, Shriberg E, Mitra V, McLaren M, Ferrer L, Garcia P, Marmar C (2015) Speech-based assessment of ptsd in a military population using diverse feature classes. In: Sixteenth annual conference of the international speech communication associationGoogle Scholar
  50. 50.
    Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83Google Scholar
  51. 51.
    Young A (1997) The harmony of illusions: inventing post-traumatic stress disorder. Princeton University Press, PrincetonGoogle Scholar
  52. 52.
    Zhang Q, Wu Q, Zhu H, He L, Huang H, Zhang J, Zhang W (2016) Multimodal MRI-based classification of trauma survivors with and without post-traumatic stress disorder. Front Neurosci 10:292Google Scholar
  53. 53.
    Zhang W, Li R, Zeng T, Sun Q, Kumar S, Ye J, Ji S (2016) Deep model based transfer and multi-task learning for biological image analysis. In: IEEE transactions on big dataGoogle Scholar
  54. 54.
    Zhuang X, Rozgić V, Crystal M, Marx BP (2014) Improving speech-based ptsd detection via multi-view learning. In: Spoken language technology workshop (SLT), 2014 IEEE, pp 260–265Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  • Debrup Banerjee
    • 1
    • 5
  • Kazi Islam
    • 1
    Email author
  • Keyi Xue
    • 1
    • 4
  • Gang Mei
    • 2
  • Lemin Xiao
    • 2
  • Guangfan Zhang
    • 2
  • Roger Xu
    • 2
  • Cai Lei
    • 3
  • Shuiwang Ji
    • 6
  • Jiang Li
    • 1
  1. 1.Department of ECEOld Dominion UniversityNorfolkUSA
  2. 2.Signal Processing and ControlIntelligent Automation, Inc.RockvilleUSA
  3. 3.School of EECSWashington State UniversityPullmanUSA
  4. 4.Panther Creek High SchoolCaryUSA
  5. 5.Computer Science and Engineering DepartmentKoneru Lakshmaiah Education FoundationVaddeswaram, GunturIndia
  6. 6.Department of CSETexas A&M University College StationTexasUSA

Personalised recommendations