Abstract
With the acceleration of urbanisation in China, preventing and reducing the economic losses and casualties caused by urban rainstorm waterlogging disasters have become a critical and difficult issue that the government is concerned about. As urban storms are sudden, clustered, continuous, and cause huge economic losses, it is difficult to conduct emergency management. Developing a more scientific method for real-time disaster identification will help prevent losses over time. Examining social media big data is a feasible method for obtaining on-site disaster data and carrying out disaster risk assessments. This paper presents a real-time identification method for urban-storm disasters using Weibo data. Taking the June 2016 heavy rainstorm in Nanjing as an example, the obtained Weibo data are divided into eight parts for the training data set and two parts for the testing data set. It then performs text pre-processing using the Jieba segmentation module for word segmentation. Then, the term frequency–inverse document frequency method is used to calculate the feature items weights and extract the features. Hashing algorithms are introduced for processing high-dimensional sparse vector matrices. Finally, the naive Bayes, support vector machine, and random forest text classification algorithms are used to train the model, and a test set sample is introduced for testing the model to select the optimal classification algorithm. The experiments showed that the naive Bayes algorithm had the highest macro-average accuracy.
Similar content being viewed by others
References
Bai H, Lin XG (2016) Sina Weibo disaster information detection based on chinese short text classification. Catastrophology 31(02):19–23
Bermingham A, Smeaton A F (2010) Classifying sentiment in microblogs: is brevity an advantage? In: ACM international conference on information and knowledge management. ACM:1833-1836
Bo P, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine learning techniques. Proc Emnlp 31(1):79–86
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
Breiman L (2001) Random Forest. Mach Learn 45(1):5–32
Cao YB, Wu YM, Xu RJ (2017) Research about the Perceptible Area Extracted after the Earthquake Based on the Microblog Public Opinion. J Seismol Res 40(02):303–310
Chen QX (2009) Research on text hierarchical classification. J Harbin Inst Technol 32(1):9–22
Chen X, Ishwaran H (2012) Random forests for genomic data analysis. Genomics 99(6):323
Choi S, Bae B (2015) The real-time monitoring system of social big data for disaster management. In: Computer Science and its Applications. Springer, Berlin, pp 809–815
Dong LJ, Xi-Bing LI, Peng K (2013) Prediction of rockburst classification using Random Forest. Chin J Nonferrous Metals 23(2):472–477
Gao YB, Guo WY, Zhou HY (2014a) Improvements of personal weibo clustering algorithm based on K-means. Microcomput Appl 33(14):78–81
Guo YX, Lu XQ, Li Z (2014b) Bursty topics detection approach on Chinese microblog based on burst words clustering. Microcomput Appl 34(02):486–490 + 505
Ho TK (1998) The Random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–834
http://www.cma.gov.cn/2011xwzx/2011.xmtjj/201207/t20120724_179464.html, 2012-07-24
Jansen BJ, Zhang M, Sobel K (2009) Twitter power: tweets as electronic word of mouth. J Am Soc Inform Sci Technol 60(11):2169–2188
Ko Y (2017) How to use negative class information for Naive Bayes classification. Inf Process Manag 53(6):1255–1268
Li S (2007) Research of Chinese text classification based on Naive Bayesian method and application of microblogging data classification. Beijing Institute of Technology
Lin JH, Yang AM, Zhou YM (2012) Classification of microblog sentiment based on Naive Bayesian. Comput Eng Sci 34(09):160–165
Mih Ilescu DM, Gui V, Toma CI et al (2013) Computer aided diagnosis method for steatosis rating in ultrasound images using random forests. Med Ultrasonogr 15(15):184–190
Nair MR, Ramya GR, Sivakumar PB (2017) Usage and analysis of Twitter during 2015 Chennai flood towards disaster management. Proc Comput Sci 115:350–358
Qu Y, Huang C, Zhang P, et al (2011) Microblogging after a major disaster in China: a case study of the 2010 Yushu earthquake//ACM Conference on Computer Supported Cooperative Work, CSCW 2011, Hangzhou, China, March. DBLP:25-34
Sakaki T, Okazaki M, Matsuo Y, et al (2010) Earthquake shakes Twitter users
Salakhutdinov R, Hinton G E (2007) Semantic hashing. In: Proceedings of SIGIR workshop on information retrieval and applications of graphical models, Amsterdam
Si QS (2017) Influenza surveiliance and forecast analysis based-on Sina Weibo. In: The 2nd global conference on theory and applications of OR/OM for sustainability, Beijing
Sina Weibo Data Center: 2017 Weibo User Development Report, http://www.useit.com.cn/thread-17562-1-1.html
Tesfamariam Solomon, Zheng L (2010) Earthquake induced damage classification for reinforced concrete buildings. Struct Saf 32(2):154–164
The Progress of China’s Human Rights in 2013 White Paper[EB/OL] (02 August 2014). http://news.sohu.com/20140526/n400036148.shtml
VapnikV Zhang X G (1999) The nature of statistical learning theory. Tsinghua University Press, Tsinghua
Wang HL, Xia B (2016) Research on the ranking of products of B2B e-commerce platform based on machine learning. Microcomput Appl 35(11):45–47
Wang Y, Xiao SB, Guo YX (2013) Research on Chinese micro-blog bursty topics detection. New Technol Lib Inf Ser 02:57–62
Wu XH, Luan CJ (2017) A method for detecting sudden earthquake events based on micro-blog text classification. Microcomput Appl 36(19):58–61
Xie LX, Zhou M, Sun MS (2012) Hierarchical structure based hybrid approach to sentiment analysis of Chinese micro blog and its feature extraction. J Chin Inf Proc 26(1):73–83
Xu JH, Chu JX, Nie GZ et al (2015) Earthquake disaster extraction based on location microblogging. J Nat Disasters 05:12–18
Zan HY, Bi YL, Shi JM (2017) Spam Review Identification Based on Adaboost Algorithm and Rules Matching. J Zhengzhou Univ Nat Sci Ed 49(01):24–28
Zhao-Tong TG, Yang DW, Cai XM et al (2012) Predict seasonal low flows in the upper Yangtze River using random forests model. J Hydroelect Eng 31(3):18–24
Zhu M (2010) Study on text classification method based on adaptive genetic BP neutral network. Nanchang, Nanchang University, Thesis
Acknowledgements
This research is partially supported by the Major project of the national social science foundation (grant no. 16ZDA047), the National Natural Science Foundation of China (71171115, 71571104), the Reform Foundation of Postgraduate Education and Teaching in Jiangsu Province (JGKT10034), a Six Talent Peaks Project in Jiangsu Province (2014-JY-014), Top-notch Academic Programs Project of Jiangsu Higher Education Institutions, and the Postgraduate Research & Practice Innovation Program, Major project of humanities and social sciences of Anhui Education Department (SK2015ZD07).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xiao, Y., Li, B. & Gong, Z. Real-time identification of urban rainstorm waterlogging disasters based on Weibo big data. Nat Hazards 94, 833–842 (2018). https://doi.org/10.1007/s11069-018-3427-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11069-018-3427-4