World Wide Web

, Volume 22, Issue 5, pp 1893–1911 | Cite as

Multimodal deep learning based on multiple correspondence analysis for disaster management

  • Samira PouyanfarEmail author
  • Yudong Tao
  • Haiman Tian
  • Shu-Ching Chen
  • Mei-Ling Shyu
Part of the following topical collections:
  1. Special Issue on Big Data for Effective Disaster Management


The fast and explosive growth of digital data in social media and World Wide Web has led to numerous opportunities and research activities in multimedia big data. Among them, disaster management applications have attracted a lot of attention in recent years due to its impacts on society and government. This study targets content analysis and mining for disaster management. Specifically, a multimedia big data framework based on the advanced deep learning techniques is proposed. First, a video dataset of natural disasters is collected from YouTube. Then, two separate deep networks including a temporal audio model and a spatio-temporal visual model are presented to analyze the audio-visual modalities in video clips effectively. Thereafter, the results of both models are integrated using the proposed fusion model based on the Multiple Correspondence Analysis (MCA) algorithm which considers the correlations between data modalities and final classes. The proposed multimodal framework is evaluated on the collected disaster dataset and compared with several state-of-the-art single modality and fusion techniques. The results demonstrate the effectiveness of both visual model and fusion model compared to the baseline approaches. Specifically, the accuracy of the final multi-class classification using the proposed MCA-based fusion reaches to 73% on this challenging dataset.


Multimodal deep learning Multiple Correspondence Analysis (MCA) Disaster information management 



This research is partially supported by NSF CNS-1461926.


  1. 1.
    Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: Learning sound representations from unlabeled video. In: Advances in neural information processing systems, pp. 892–900 (2016)Google Scholar
  2. 2.
    Baecchi, C., Uricchio, T., Bertini, M., Del Bimbo, A.: A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed. Tools Appl. 75(5), 2507–2525 (2016)CrossRefGoogle Scholar
  3. 3.
    Bartlett, M.S., Movellan, J.R., Sejnowski, T.J.: Face recognition by independent component analysis. IEEE Trans. Neural Netw. 13(6), 1450–1464 (2002)CrossRefGoogle Scholar
  4. 4.
    Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: ACM international conference on multimodal interfaces, pp. 205–211 (2004)Google Scholar
  5. 5.
    Careem, M., De Silva, C., De Silva, R., Raschid, L., Weerawarana, S.: Sahana: Overview of a disaster management system. In: IEEE international conference on information and automation, pp. 361–366 (2006)Google Scholar
  6. 6.
    Chang, K.I., Bowyer, K.W., Flynn, P.J.: An evaluation of multimodal 2D + 3D face biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 619–624 (2005)CrossRefGoogle Scholar
  7. 7.
    Che, X., Ip, B., Lin, L.: A survey of current youtube video characteristics. IEEE Multimed. 22(2), 56–63 (2015)CrossRefGoogle Scholar
  8. 8.
    Chen, C., Zhu, Q., Lin, L., Shyu, M.L.: Web media semantic concept retrieval via tag removal and model fusion. ACM Trans. Intell. Syst. Technol. 4(4), 61 (2013)Google Scholar
  9. 9.
    Chen, M., Chen, S.C., Shyu, M.L., Wickramaratna, K.: Semantic event detection via multimodal data mining. IEEE Signal Process. Mag. 23(2), 38–46 (2006)CrossRefGoogle Scholar
  10. 10.
    Chen, S.C., Shyu, M.L., Kashyap, R.L.: Augmented transition network as a semantic model for video data. Int. J. Netw. Inf. Syst., Special Issue on Video Data 3(1), 9–25 (2000)Google Scholar
  11. 11.
    Chen, S.C., Shyu, M.L., Peeta, S., Zhang, C.: Learning-based spatio-temporal vehicle tracking and indexing for transportation multimedia database systems. IEEE Trans. Intell. Transp. Syst. 4(3), 154–167 (2003)CrossRefGoogle Scholar
  12. 12.
    Chen, S.C., Shyu, M.L., Zhang, C.: Innovative shot boundary detection for video indexing. In: Deb, S. (ed.) Video Data Management and Information Retrieval, 217–236. Idea Group Publishing (2005)Google Scholar
  13. 13.
    Chen, X., Zhang, C., Chen, S.C., Rubin, S.: A human-centered multiple instance learning framework for semantic video retrieval. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 39(2), 228–233 (2009)CrossRefGoogle Scholar
  14. 14.
    Fang, R., Pouyanfar, S., Yang, Y., Chen, S.C., Iyengar, S.S.: Computational health informatics in the big data age: A survey. ACM Comput. Surv. 49(1), 12:1–12:36 (2016)CrossRefGoogle Scholar
  15. 15.
    Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649 (2013)Google Scholar
  16. 16.
    Greenacre, M., Blasius, J.: Multiple correspondence analysis and related methods. Chapman and Hall/CRC press, London (2006)CrossRefzbMATHGoogle Scholar
  17. 17.
    Grosky, W.I., Zhang, C., Chen, S.C.: Intelligent and pervasive multimedia systems. IEEE MultiMed. 16(1), 14–15 (2009)CrossRefGoogle Scholar
  18. 18.
    Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
  19. 19.
    Hu, X., Deng, F., Li, K., Zhang, T., Chen, H., Jiang, X., Lv, J., Zhu, D., Faraco, C., Zhang, D., et al.: Bridging low-level features and high-level semantics via fMRI brain imaging for video classification. In: ACM international conference on multimedia, pp. 451–460 (2010)Google Scholar
  20. 20.
    Huh, M., Agrawal, P., Efros, A.A.: What makes ImageNet good for transfer learning?. arXiv:1608.08614 (2016)
  21. 21.
    Josse, J., Chavent, M., Liquet, B., Husson, F.: Handling missing values with regularized iterative multiple correspondence analysis. J. Classif. 29(1), 91–116 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Kahou, S.E., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, K., Jean, S., Froumenty, P., Dauphin, Y., Boulanger-Lewandowski, N., et al.: Emonets: Multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 10(2), 99–111 (2016)CrossRefGoogle Scholar
  23. 23.
    Kessous, L., Castellano, G., Caridakis, G.: Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J. Multimodal User Interfaces 3(1-2), 33–48 (2010)CrossRefGoogle Scholar
  24. 24.
    Khan, S., Yong, S.P.: A comparison of deep learning and hand crafted features in medical image modality classification. In: IEEE international conference on computer and information sciences, pp. 633–638 (2016)Google Scholar
  25. 25.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on empirical methods in natural language processing, pp. 1746–1751 (2014)Google Scholar
  26. 26.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)Google Scholar
  27. 27.
    Lahat, D., Adali, T., Jutten, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE 103(9), 1449–1477 (2015)CrossRefGoogle Scholar
  28. 28.
    Li, T., Xie, N., Zeng, C., Zhou, W., Zheng, L., Jiang, Y., Yang, Y., Ha, H.Y., Xue, W., Huang, Y., et al.: Data-driven techniques in disaster information management. ACM Comput. Surv. 50(1), 1:1–1:45 (2017)CrossRefGoogle Scholar
  29. 29.
    Li, Y., Gai, K., Ming, Z., Zhao, H., Qiu, M.: Intercrossed access controls for secure financial services on multimedia big data in cloud systems. ACM Trans. Multimed. Comput. Commun. Appl. 12(4), 67:1–67:18 (2016)Google Scholar
  30. 30.
    Lin, L., Chen, C., Shyu, M.L., Chen, S.C.: Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE MultiMed. 18(3), 32–43 (2011)CrossRefGoogle Scholar
  31. 31.
    Lin, L., Ravitz, G., Shyu, M.L., Chen, S.C.: Effective feature space reduction with imbalanced data for semantic concept detection. In: IEEE international conference on sensor networks, ubiquitous and trustworthy computing, pp. 262–269 (2008)Google Scholar
  32. 32.
    Lin, L., Shyu, M.L.: Weighted association rule mining for video semantic detection. Methods and Innovations for Multimedia Database Content Management 1 (1), 37–54 (2012)Google Scholar
  33. 33.
    Maestre, E., Papiotis, P., Marchini, M., Llimona, Q., Mayor, O., Pérez, A., Wanderley, M.M.: Enriched multimodal representations of music performances: Online access and visualization. IEEE MultiMed. 24(1), 24–34 (2017)CrossRefGoogle Scholar
  34. 34.
    McDonald, K., Smeaton, A.F.: A comparison of score, rank and probability-based fusion methods for video shot retrieval. In: International conference on image and video retrieval, pp. 61–70 (2005)Google Scholar
  35. 35.
    Meissner, A., Luckenbach, T., Risse, T., Kirste, T., Kirchner, H.: Design challenges for an integrated disaster management communication and information system. In: IEEE workshop on disaster recovery networks (2002)Google Scholar
  36. 36.
    Meng, T., Shyu, M.L.: Leveraging concept association network for multimedia rare concept mining and retrieval. In: IEEE international conference on multimedia and expo, pp. 860–865 (2012)Google Scholar
  37. 37.
    Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: IEEE international conference on computer vision, pp. 104–111 (2009)Google Scholar
  38. 38.
    Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International conference on machine learning, pp. 689–696 (2011)Google Scholar
  39. 39.
    Pan, L., Pouyanfar, S., Chen, H., Qin, J., Chen, S.C.: Deepfood: Automatic multi-class classification of food ingredients using deep learning. In: IEEE international conference on collaboration and internet computing, pp. 181–189 (2017)Google Scholar
  40. 40.
    Poria, S., Cambria, E., Hussain, A., Huang, G.B.: Towards an intelligent framework for multimodal affective data analysis. Neural Netw. 63, 104–116 (2015)CrossRefGoogle Scholar
  41. 41.
    Pouyanfar, S., Chen, S.C.: Semantic concept detection using weighted discretization multiple correspondence analysis for disaster information management. In: IEEE international conference on information reuse and integration, pp. 556–564 (2016)Google Scholar
  42. 42.
    Pouyanfar, S., Chen, S.C.: Automatic video event detection for imbalance data using enhanced ensemble deep learning. Int. J. Semantic Comput. 11(01), 85–109 (2017)CrossRefGoogle Scholar
  43. 43.
    Pouyanfar, S., Chen, S.C., Shyu, M.L.: Deep spatio-temporal representation learning for multi-class imbalanced data classification. In: IEEE international conference on information reuse and integration for data science, pp. 386–393 (2018)Google Scholar
  44. 44.
    Pouyanfar, S., Yang, Y., Chen, S.C., Shyu, M.L., Iyengar, S.S.: Multimedia big data analytics: A survey. ACM Comput. Surv. 51(1), 10:1–10:34 (2018)CrossRefGoogle Scholar
  45. 45.
    Saporta, G.: Data fusion and data grafting. Comput. Stat. Data Anal. 38(4), 465–473 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  46. 46.
    Shyu, M.L., Chen, S.C.: Emerging multimedia research and applications. IEEE MultiMed. 22(4), 11–13 (2015)CrossRefGoogle Scholar
  47. 47.
    Shyu, M.L., Chen, S.C., Kashyap, R.L.: Generalized affinity-based association rule mining for multimedia database queries. Knowl. Inf. Syst. 3(3), 319–337 (2001)CrossRefzbMATHGoogle Scholar
  48. 48.
    Shyu, M.L., Sarinnapakorn, K., Kuruppu-Appuhamilage, I., Chen, S.C., Chang, L., Goldring, T.: Handling nominal features in anomaly intrusion detection problems. In: International workshop on research issues in data engineering: Stream data mining and applications, pp. 55–62 (2005)Google Scholar
  49. 49.
    Smith, J.R.: Riding the multimedia big data wave. In: ACM SIGIR conference on research and development in information retrieval, pp. 1–2 (2013)Google Scholar
  50. 50.
    Song, F., Guo, Z., Mei, D.: Feature selection using principal component analysis. In: IEEE international conference on system science, engineering design and manufacturing informatization, pp. 27–30 (2010)Google Scholar
  51. 51.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition, pp. 2818–2826 (2016)Google Scholar
  52. 52.
    Tian, H., Chen, S.C.: MCA-NN: Multiple correspondence analysis based neural network for disaster information detection. In: IEEE international conference on multimedia big data, pp. 268–275 (2017)Google Scholar
  53. 53.
    Tian, H., Chen, S.C., Rubin, S.H., Grefe, W.K.: FA-MCADF: Feature affinity based multiple correspondence analysis and decision fusion framework for disaster information management. In: IEEE international conference on information reuse and integration, pp. 198–206 (2017)Google Scholar
  54. 54.
    Tian, Y., Chen, S.C., Shyu, M. L., Huang, T., Sheu, P., Del Bimbo, A.: Multimedia big data. IEEE MultiMed. 22(3), 93–95 (2015)CrossRefGoogle Scholar
  55. 55.
    Walch, M., Lange, K., Baumann, M., Weber, M.: Autonomous driving: investigating the feasibility of car-driver handover assistance. In: ACM international conference on automotive user interfaces and interactive vehicular applications, pp. 11–18 (2015)Google Scholar
  56. 56.
    Wang, Z., Kuan, K., Ravaut, M., Manek, G., Song, S., Fang, Y., Kim, S., Chen, N., D’Haro, L.F., Tuan, L.A., et al.: Truly multi-modal youtube-8m video classification with video, audio, and text. arXiv:1706.05461 (2017)
  57. 57.
    Weill, P., Vitale, M.: Place to space: Migrating to eBusiness models. Harvard Business Press, Brighton (2001)Google Scholar
  58. 58.
    Wöllmer, M., Metallinou, A., Eyben, F., Schuller, B., Narayanan, S.: Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In: Annual conference of the international speech communication association, pp. 2362–2365 (2010)Google Scholar
  59. 59.
    Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)CrossRefGoogle Scholar
  60. 60.
    Wu, Y., Chang, E.Y., Chang, K.C.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: ACM international conference on multimedia, pp. 572–579 (2004)Google Scholar
  61. 61.
    Yang, Y., Pouyanfar, S., Tian, H., Chen, M., Chen, S.C., Shyu, M.L.: IF-MCA: Importance factor-based multiple correspondence analysis for multimedia data analytics. IEEE Trans. Multimed. 20(4), 1024–1032 (2018)CrossRefGoogle Scholar
  62. 62.
    Yates, D., Paquette, S.: Emergency knowledge management and social media technologies: A case study of the 2010 haitian earthquake. Int. J. Inf. Manag. 31(1), 6–13 (2011)CrossRefGoogle Scholar
  63. 63.
    Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp. 487–495 (2014)Google Scholar
  64. 64.
    Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Feature selection using correlation and reliability based scoring metric for video semantic detection. In: IEEE international conference on semantic computing, pp. 462–469 (2010)Google Scholar
  65. 65.
    Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Effective supervised discretization for classification based on correlation maximization. In: IEEE international conference on information reuse and integration, pp. 390–395 (2011)Google Scholar
  66. 66.
    Zhu, W., Cui, P., Wang, Z., Hua, G.: Multimedia big data computing. IEEE Multimed. 22(3), 96–105 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computing and Information SciencesFlorida International UniversityMiamiUSA
  2. 2.Department of Electrical and Computer EngineeringUniversity of MiamiCoral GablesUSA

Personalised recommendations