Natural Hazards

, Volume 94, Issue 2, pp 833–842 | Cite as

Real-time identification of urban rainstorm waterlogging disasters based on Weibo big data

  • Yang Xiao
  • Beiqun Li
  • Zaiwu GongEmail author
Original Paper


With the acceleration of urbanisation in China, preventing and reducing the economic losses and casualties caused by urban rainstorm waterlogging disasters have become a critical and difficult issue that the government is concerned about. As urban storms are sudden, clustered, continuous, and cause huge economic losses, it is difficult to conduct emergency management. Developing a more scientific method for real-time disaster identification will help prevent losses over time. Examining social media big data is a feasible method for obtaining on-site disaster data and carrying out disaster risk assessments. This paper presents a real-time identification method for urban-storm disasters using Weibo data. Taking the June 2016 heavy rainstorm in Nanjing as an example, the obtained Weibo data are divided into eight parts for the training data set and two parts for the testing data set. It then performs text pre-processing using the Jieba segmentation module for word segmentation. Then, the term frequency–inverse document frequency method is used to calculate the feature items weights and extract the features. Hashing algorithms are introduced for processing high-dimensional sparse vector matrices. Finally, the naive Bayes, support vector machine, and random forest text classification algorithms are used to train the model, and a test set sample is introduced for testing the model to select the optimal classification algorithm. The experiments showed that the naive Bayes algorithm had the highest macro-average accuracy.


Urban rainstorm waterlogging disaster Real-time identification Micro-blogging data Text classification 



This research is partially supported by the Major project of the national social science foundation (grant no. 16ZDA047), the National Natural Science Foundation of China (71171115, 71571104), the Reform Foundation of Postgraduate Education and Teaching in Jiangsu Province (JGKT10034), a Six Talent Peaks Project in Jiangsu Province (2014-JY-014), Top-notch Academic Programs Project of Jiangsu Higher Education Institutions, and the Postgraduate Research & Practice Innovation Program, Major project of humanities and social sciences of Anhui Education Department (SK2015ZD07).


  1. Bai H, Lin XG (2016) Sina Weibo disaster information detection based on chinese short text classification. Catastrophology 31(02):19–23Google Scholar
  2. Bermingham A, Smeaton A F (2010) Classifying sentiment in microblogs: is brevity an advantage? In: ACM international conference on information and knowledge management. ACM:1833-1836Google Scholar
  3. Bo P, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. Proc Emnlp 31(1):79–86Google Scholar
  4. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140Google Scholar
  5. Breiman L (2001) Random Forest. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  6. Cao YB, Wu YM, Xu RJ (2017) Research about the Perceptible Area Extracted after the Earthquake Based on the Microblog Public Opinion. J Seismol Res 40(02):303–310Google Scholar
  7. Chen QX (2009) Research on text hierarchical classification. J Harbin Inst Technol 32(1):9–22Google Scholar
  8. Chen X, Ishwaran H (2012) Random forests for genomic data analysis. Genomics 99(6):323CrossRefGoogle Scholar
  9. Choi S, Bae B (2015) The real-time monitoring system of social big data for disaster management. In: Computer Science and its Applications. Springer, Berlin, pp 809–815Google Scholar
  10. Dong LJ, Xi-Bing LI, Peng K (2013) Prediction of rockburst classification using Random Forest. Chin J Nonferrous Metals 23(2):472–477CrossRefGoogle Scholar
  11. Gao YB, Guo WY, Zhou HY (2014a) Improvements of personal weibo clustering algorithm based on K-means. Microcomput Appl 33(14):78–81Google Scholar
  12. Guo YX, Lu XQ, Li Z (2014b) Bursty topics detection approach on Chinese microblog based on burst words clustering. Microcomput Appl 34(02):486–490 + 505Google Scholar
  13. Ho TK (1998) The Random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–834CrossRefGoogle Scholar
  14. Jansen BJ, Zhang M, Sobel K (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inform Sci Technol 60(11):2169–2188CrossRefGoogle Scholar
  15. Ko Y (2017) How to use negative class information for Naive Bayes classification. Inf Process Manag 53(6):1255–1268CrossRefGoogle Scholar
  16. Li S (2007) Research of Chinese text classification based on Naive Bayesian method and application of microblogging data classification. Beijing Institute of TechnologyGoogle Scholar
  17. Lin JH, Yang AM, Zhou YM (2012) Classification of microblog sentiment based on Naive Bayesian. Comput Eng Sci 34(09):160–165Google Scholar
  18. Mih Ilescu DM, Gui V, Toma CI et al (2013) Computer aided diagnosis method for steatosis rating in ultrasound images using random forests. Med Ultrasonogr 15(15):184–190CrossRefGoogle Scholar
  19. Nair MR, Ramya GR, Sivakumar PB (2017) Usage and analysis of Twitter during 2015 Chennai flood towards disaster management. Proc Comput Sci 115:350–358CrossRefGoogle Scholar
  20. Qu Y, Huang C, Zhang P, et al (2011) Microblogging after a major disaster in China: a case study of the 2010 Yushu earthquake//ACM Conference on Computer Supported Cooperative Work, CSCW 2011, Hangzhou, China, March. DBLP:25-34Google Scholar
  21. Sakaki T, Okazaki M, Matsuo Y, et al (2010) Earthquake shakes Twitter usersGoogle Scholar
  22. Salakhutdinov R, Hinton G E (2007) Semantic hashing. In: Proceedings of SIGIR workshop on information retrieval and applications of graphical models, AmsterdamGoogle Scholar
  23. Si QS (2017) Influenza surveiliance and forecast analysis based-on Sina Weibo. In: The 2nd global conference on theory and applications of OR/OM for sustainability, BeijingGoogle Scholar
  24. Sina Weibo Data Center: 2017 Weibo User Development Report,
  25. Tesfamariam Solomon, Zheng L (2010) Earthquake induced damage classification for reinforced concrete buildings. Struct Saf 32(2):154–164CrossRefGoogle Scholar
  26. The Progress of China’s Human Rights in 2013 White Paper[EB/OL] (02 August 2014).
  27. VapnikV Zhang X G (1999) The nature of statistical learning theory. Tsinghua University Press, TsinghuaGoogle Scholar
  28. Wang HL, Xia B (2016) Research on the ranking of products of B2B e-commerce platform based on machine learning. Microcomput Appl 35(11):45–47Google Scholar
  29. Wang Y, Xiao SB, Guo YX (2013) Research on Chinese micro-blog bursty topics detection. New Technol Lib Inf Ser 02:57–62Google Scholar
  30. Wu XH, Luan CJ (2017) A method for detecting sudden earthquake events based on micro-blog text classification. Microcomput Appl 36(19):58–61Google Scholar
  31. Xie LX, Zhou M, Sun MS (2012) Hierarchical structure based hybrid approach to sentiment analysis of Chinese micro blog and its feature extraction. J Chin Inf Proc 26(1):73–83Google Scholar
  32. Xu JH, Chu JX, Nie GZ et al (2015) Earthquake disaster extraction based on location microblogging. J Nat Disasters 05:12–18Google Scholar
  33. Zan HY, Bi YL, Shi JM (2017) Spam Review Identification Based on Adaboost Algorithm and Rules Matching. J Zhengzhou Univ Nat Sci Ed 49(01):24–28Google Scholar
  34. Zhao-Tong TG, Yang DW, Cai XM et al (2012) Predict seasonal low flows in the upper Yangtze River using random forests model. J Hydroelect Eng 31(3):18–24Google Scholar
  35. Zhu M (2010) Study on text classification method based on adaptive genetic BP neutral network. Nanchang, Nanchang University, ThesisGoogle Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  1. 1.School of Management and EngineeringNanjing University of Information Science and TechnologyNanjingChina

Personalised recommendations