Skip to main content

Advertisement

Log in

Real-time identification of urban rainstorm waterlogging disasters based on Weibo big data

  • Original Paper
  • Published:
Natural Hazards Aims and scope Submit manuscript

Abstract

With the acceleration of urbanisation in China, preventing and reducing the economic losses and casualties caused by urban rainstorm waterlogging disasters have become a critical and difficult issue that the government is concerned about. As urban storms are sudden, clustered, continuous, and cause huge economic losses, it is difficult to conduct emergency management. Developing a more scientific method for real-time disaster identification will help prevent losses over time. Examining social media big data is a feasible method for obtaining on-site disaster data and carrying out disaster risk assessments. This paper presents a real-time identification method for urban-storm disasters using Weibo data. Taking the June 2016 heavy rainstorm in Nanjing as an example, the obtained Weibo data are divided into eight parts for the training data set and two parts for the testing data set. It then performs text pre-processing using the Jieba segmentation module for word segmentation. Then, the term frequency–inverse document frequency method is used to calculate the feature items weights and extract the features. Hashing algorithms are introduced for processing high-dimensional sparse vector matrices. Finally, the naive Bayes, support vector machine, and random forest text classification algorithms are used to train the model, and a test set sample is introduced for testing the model to select the optimal classification algorithm. The experiments showed that the naive Bayes algorithm had the highest macro-average accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Bai H, Lin XG (2016) Sina Weibo disaster information detection based on chinese short text classification. Catastrophology 31(02):19–23

    Google Scholar 

  • Bermingham A, Smeaton A F (2010) Classifying sentiment in microblogs: is brevity an advantage? In: ACM international conference on information and knowledge management. ACM:1833-1836

  • Bo P, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. Proc Emnlp 31(1):79–86

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  • Breiman L (2001) Random Forest. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Cao YB, Wu YM, Xu RJ (2017) Research about the Perceptible Area Extracted after the Earthquake Based on the Microblog Public Opinion. J Seismol Res 40(02):303–310

    Google Scholar 

  • Chen QX (2009) Research on text hierarchical classification. J Harbin Inst Technol 32(1):9–22

    Google Scholar 

  • Chen X, Ishwaran H (2012) Random forests for genomic data analysis. Genomics 99(6):323

    Article  Google Scholar 

  • Choi S, Bae B (2015) The real-time monitoring system of social big data for disaster management. In: Computer Science and its Applications. Springer, Berlin, pp 809–815

    Google Scholar 

  • Dong LJ, Xi-Bing LI, Peng K (2013) Prediction of rockburst classification using Random Forest. Chin J Nonferrous Metals 23(2):472–477

    Article  Google Scholar 

  • Gao YB, Guo WY, Zhou HY (2014a) Improvements of personal weibo clustering algorithm based on K-means. Microcomput Appl 33(14):78–81

    Google Scholar 

  • Guo YX, Lu XQ, Li Z (2014b) Bursty topics detection approach on Chinese microblog based on burst words clustering. Microcomput Appl 34(02):486–490 + 505

  • Ho TK (1998) The Random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–834

    Article  Google Scholar 

  • http://www.cma.gov.cn/2011xwzx/2011.xmtjj/201207/t20120724_179464.html, 2012-07-24

  • Jansen BJ, Zhang M, Sobel K (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inform Sci Technol 60(11):2169–2188

    Article  Google Scholar 

  • Ko Y (2017) How to use negative class information for Naive Bayes classification. Inf Process Manag 53(6):1255–1268

    Article  Google Scholar 

  • Li S (2007) Research of Chinese text classification based on Naive Bayesian method and application of microblogging data classification. Beijing Institute of Technology

  • Lin JH, Yang AM, Zhou YM (2012) Classification of microblog sentiment based on Naive Bayesian. Comput Eng Sci 34(09):160–165

    Google Scholar 

  • Mih Ilescu DM, Gui V, Toma CI et al (2013) Computer aided diagnosis method for steatosis rating in ultrasound images using random forests. Med Ultrasonogr 15(15):184–190

    Article  Google Scholar 

  • Nair MR, Ramya GR, Sivakumar PB (2017) Usage and analysis of Twitter during 2015 Chennai flood towards disaster management. Proc Comput Sci 115:350–358

    Article  Google Scholar 

  • Qu Y, Huang C, Zhang P, et al (2011) Microblogging after a major disaster in China: a case study of the 2010 Yushu earthquake//ACM Conference on Computer Supported Cooperative Work, CSCW 2011, Hangzhou, China, March. DBLP:25-34

  • Sakaki T, Okazaki M, Matsuo Y, et al (2010) Earthquake shakes Twitter users

  • Salakhutdinov R, Hinton G E (2007) Semantic hashing. In: Proceedings of SIGIR workshop on information retrieval and applications of graphical models, Amsterdam

  • Si QS (2017) Influenza surveiliance and forecast analysis based-on Sina Weibo. In: The 2nd global conference on theory and applications of OR/OM for sustainability, Beijing

  • Sina Weibo Data Center: 2017 Weibo User Development Report, http://www.useit.com.cn/thread-17562-1-1.html

  • Tesfamariam Solomon, Zheng L (2010) Earthquake induced damage classification for reinforced concrete buildings. Struct Saf 32(2):154–164

    Article  Google Scholar 

  • The Progress of China’s Human Rights in 2013 White Paper[EB/OL] (02 August 2014). http://news.sohu.com/20140526/n400036148.shtml

  • VapnikV Zhang X G (1999) The nature of statistical learning theory. Tsinghua University Press, Tsinghua

    Google Scholar 

  • Wang HL, Xia B (2016) Research on the ranking of products of B2B e-commerce platform based on machine learning. Microcomput Appl 35(11):45–47

    Google Scholar 

  • Wang Y, Xiao SB, Guo YX (2013) Research on Chinese micro-blog bursty topics detection. New Technol Lib Inf Ser 02:57–62

    Google Scholar 

  • Wu XH, Luan CJ (2017) A method for detecting sudden earthquake events based on micro-blog text classification. Microcomput Appl 36(19):58–61

    Google Scholar 

  • Xie LX, Zhou M, Sun MS (2012) Hierarchical structure based hybrid approach to sentiment analysis of Chinese micro blog and its feature extraction. J Chin Inf Proc 26(1):73–83

    Google Scholar 

  • Xu JH, Chu JX, Nie GZ et al (2015) Earthquake disaster extraction based on location microblogging. J Nat Disasters 05:12–18

    Google Scholar 

  • Zan HY, Bi YL, Shi JM (2017) Spam Review Identification Based on Adaboost Algorithm and Rules Matching. J Zhengzhou Univ Nat Sci Ed 49(01):24–28

    Google Scholar 

  • Zhao-Tong TG, Yang DW, Cai XM et al (2012) Predict seasonal low flows in the upper Yangtze River using random forests model. J Hydroelect Eng 31(3):18–24

    Google Scholar 

  • Zhu M (2010) Study on text classification method based on adaptive genetic BP neutral network. Nanchang, Nanchang University, Thesis

    Google Scholar 

Download references

Acknowledgements

This research is partially supported by the Major project of the national social science foundation (grant no. 16ZDA047), the National Natural Science Foundation of China (71171115, 71571104), the Reform Foundation of Postgraduate Education and Teaching in Jiangsu Province (JGKT10034), a Six Talent Peaks Project in Jiangsu Province (2014-JY-014), Top-notch Academic Programs Project of Jiangsu Higher Education Institutions, and the Postgraduate Research & Practice Innovation Program, Major project of humanities and social sciences of Anhui Education Department (SK2015ZD07).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zaiwu Gong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, Y., Li, B. & Gong, Z. Real-time identification of urban rainstorm waterlogging disasters based on Weibo big data. Nat Hazards 94, 833–842 (2018). https://doi.org/10.1007/s11069-018-3427-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11069-018-3427-4

Keywords

Navigation