Summarizing Microblogs During Emergency Events: A Comparison of Extractive Summarization Algorithms

Dutta, Soumi; Chandra, Vibhash; Mehra, Kanav; Ghatak, Sujata; Das, Asit Kumar; Ghosh, Saptarshi

doi:10.1007/978-981-13-1498-8_76

Soumi Dutta^19,20,
Vibhash Chandra¹⁹,
Kanav Mehra¹⁹,
Sujata Ghatak²⁰,
Asit Kumar Das¹⁹ &
…
Saptarshi Ghosh^19,21

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 813))

1234 Accesses
18 Citations

Abstract

Microblogging sites, notably Twitter, have become important sources of real-time situational information during emergency events. Since hundreds to thousands of microblogs (tweets) are generally posted on Twitter during an emergency event, manually going through every tweet is not feasible. Hence, summarization of microblogs posted during emergency events has become an important problem in recent years. Several summarization algorithms have been proposed in the literature, both for general document summarization, as well as specifically for summarization of microblogs. However, to our knowledge, there has not been any systematic analysis on which algorithms are more suitable for summarization of microblogs posted during disasters. In this work, we evaluate and compare the performance of 8 extractive summarization algorithms in the application of summarizing microblogs posted during emergency events. Apart from comparing the performances of the algorithms, we also find significant differences among the summaries produced by different algorithms over the same input data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Availability of implementations: Frequency Summarizer (http://glowingpython.blogspot.in/2014/09/text-summarization-with-nltk.html), Mead (http://www.summarization.com/mead/), SumBasic (https://github.com/EthanMacdonald/SumBasic), LexRank, LSA and LUHN are available as part of the Python Sumy package (https://pypi.python.org/pypi/sumy). COWTS (proposed in our prior work [5]) and ClusterRank were implemented by us.

References

Imran, M., Castillo, C., Diaz, F., Vieweg, S.: Processing social media messages in mass emergency: a survey. ACM Comput. Surv. 47(4), 67:1–67:38 (2015)
Google Scholar
Das, D., Martins, A.F.: A survey on automatic text summarization. Lit. Surv. Lang. Stat. II Course CMU 4, 192–195 (2007)
Google Scholar
Gupta, V., Lehal, G.S.: A survey of text summarization extractive techniques. IEEE J. Emerg. Technol. Web Intell. 2(3), 258–268 (2010)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Rudra, K., Ghosh, S., Goyal, P., Ganguly, N., Ghosh, S.: Extracting situational information from microblogs during disaster events: a classification-summarization approach. In: Proceedings of ACM CIKM (2015)
Google Scholar
Olariu, A.: Efficient online summarization of microblogging streams. In: Proceedings of EACL(short paper), pp. 236–240 (2014)
Google Scholar
Shou, L., Wang, Z., Chen, K., Chen, G.: Sumblr: continuous summarization of evolving tweet streams. In: Proceedings of ACM SIGIR (2013)
Google Scholar
Wang, Z., Shou, L., Chen, K., Chen, G., Mehrotra, S.: On summarization and timeline generation for evolutionary tweet streams. IEEE Trans. Knowl. Data Eng. 27, 1301–1314 (2015)
Article Google Scholar
Zubiaga, A., Spina, D., Amigo, E., Gonzalo, J.: Towards real-time summarization of scheduled events from twitter streams. In: Hypertext(Poster) (2012)
Google Scholar
Erkan, G., Radev, D.R.: LexRank: Graph-Based Lexical Centrality as Salience in Text Summarization, pp. 457–479 (2004)
Google Scholar
Dutta, S., Ghatak, S., Roy, M., Ghosh, S., Das, A.K.: A graph based clustering technique for tweet summarization. In: 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO)(Trends and Future Directions), pp. 1–6. IEEE (2015)
Google Scholar
Xu, W., Grishman, R., Meyers, A., Ritter, A.: A preliminary study of tweet summarization using information extraction. In: Proceedings of NAACL 2013, 20 (2013)
Google Scholar
Chakrabarti, D., Punera, K.: Event summarization using tweets. In: Proceedings of AAAI ICWSM, pp. 340–348 (2011)
Google Scholar
Gillani, M., Ilyas, M.U., Saleh, S., Alowibdi, J.S., Aljohani, N., Alotaibi, F.S.: Post summarization of microblogs of sporting events. In: Proceedings of International Conference on World Wide Web (WWW) Companion, pp. 59–68 (2017)
Google Scholar
Khan, M.A.H., Bollegala, D., Liu, G., Sezaki, K.: Multi-tweet summarization of real-time events. In: Socialcom (2013)
Google Scholar
Nichols, J., Mahmud, J., Drews, C.: Summarizing sporting events using twitter. In: Proceedings of ACM International Conference on Intelligent User Interfaces (IUI), pp. 189–198 (2012)
Google Scholar
Takamura, H., Yokono, H., Okumura, M.: Summarizing a document stream. In: Proceedings of ECIR (2011)
Google Scholar
Osborne, M., Moran, S., McCreadie, R., Lunen, A.V., Sykora, M., Cano, E., Ireson, N., Macdonald, C., Ounis, I., He, Y., Jackson, T., Ciravegna, F., OBrien, A.: Real-time detection, tracking, and monitoring of automatically discovered events in social media. In: Proceedings of ACL (2014)
Google Scholar
Kedzie, C., McKeown, K., Diaz, F.: Predicting salient updates for disaster summarization. In: Proceedings of ACL (2015)
Google Scholar
Nguyen, M.T., Kitamoto, A., Nguyen, T.T.: Tsum4act: a framework for retrieving and summarizing actionable tweets during a disaster for reaction. In: Proceedings of PAKDD (2015)
Google Scholar
Inouye, D.I., Kalita, J.K.: Comparing twitter summarization algorithms for multiple post summaries. In: Proceedings of IEEE SocialCom/PASSAT, pp. 298–306 (2011)
Google Scholar
Mackie, S., McCreadie, R., Macdonald, C., Ounis, I.: Comparing algorithms for microblog summarisation. In: Proceedings of CLEF (2014)
Google Scholar
Rosa, K.D., Shah, R., Lin, B., Gershman, A., Frederking, R.: Topical Clustering of Tweets
Google Scholar
Garg, N., Favre, B., Riedhammer, K., Hakkani-Tr, D.: Clusterrank: a graph based method for meeting summarization. In: INTERSPEECH, pp. 1499–1502. ISCA (2009)
Google Scholar
Erkan, G., Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. J. Artif. Int. Res. 22(1), 457–479 (2004)
Google Scholar
Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: SIGIR, pp. 19–25 (2001)
Google Scholar
Ozsoy, M.G., Alpaslan, F.N., Cicekli, I.: Text summarization using latent semantic analysis. J. Inf. Sci. 37(4), 405–417 (2011). http://dx.doi.org/10.1177/0165551511408848
Radev, D.R., Allison, T., Blair-Goldensohn, S., Blitzer, J., elebi, A., Dimitrov, S., Drbek, E., Hakim, A., Lam, W., Liu, D., Otterbacher, J., Qi, H., Saggion, H., Teufel, S., Topper, M., Winkel, A., Zhang, Z.: MEAD—a platform for multidocument multilingual text summarization. In: LREC. European Language Resources Association (2004)
Google Scholar
Radev, D.R., Hovy, E., McKeown, K.: Introduction to the special issue on summarization. Comput. Linguist. 28(4), 399–408 (2002)
Article Google Scholar
Nenkova, A., Vanderwende, L.: The impact of frequency on summarization. Technical report, Microsoft Research (2005)
Google Scholar
Hyderabad blasts—Wikipedia (2013). http://en.wikipedia.org/wiki/2013_Hyderabad_blasts
Sandy Hook Elementary School shooting–Wikipedia (2012). http://en.wikipedia.org/wiki/Sandy_Hook_Elementary_School_shooting
North India floods—Wikipedia (2013). http://en.wikipedia.org/wiki/2013_North_India_floods
Typhoon Hagupit—Wikipedia (2014). http://en.wikipedia.org/wiki/Typhoon_Hagupit
2015 Nepal earthquake—Wikipedia (2015). http://en.wikipedia.org/wiki/2015_Nepal_earthquake
REST API Resources, Twitter Developers. https://dev.twitter.com/docs/api
Tao, K., Abel, F., Hauff, C., Houben, G.J., Gadiraju, U.: Groundhog day: near-duplicate detection on twitter. In: Proceedings of Conference on World Wide Web (WWW) (2013)
Google Scholar
Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Proceedings of Workshop on Text Summarization Branches Out, ACL, pp. 74–81 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Engineering Science and Technology Shibpur, Shibpur, India
Soumi Dutta, Vibhash Chandra, Kanav Mehra, Asit Kumar Das & Saptarshi Ghosh
Institute of Engineering and Management, Kolkata, 700091, India
Soumi Dutta & Sujata Ghatak
Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
Saptarshi Ghosh

Authors

Soumi Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Vibhash Chandra
View author publications
You can also search for this author in PubMed Google Scholar
Kanav Mehra
View author publications
You can also search for this author in PubMed Google Scholar
Sujata Ghatak
View author publications
You can also search for this author in PubMed Google Scholar
Asit Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar
Saptarshi Ghosh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soumi Dutta .

Editor information

Editors and Affiliations

Machine Intelligence Research Labs, Auburn, WA, USA
Ajith Abraham
Department of Computer and System Sciences, Visva-Bharati University, Santiniketan, West Bengal, India
Paramartha Dutta
Department of Computer Science and Engineering, University of Kalyani, Kalyani, India
Jyotsna Kumar Mandal
Institute of Engineering and Management, Kolkata, West Bengal, India
Abhishek Bhattacharya
Institute of Engineering and Management, Kolkata, West Bengal, India
Soumi Dutta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dutta, S., Chandra, V., Mehra, K., Ghatak, S., Das, A.K., Ghosh, S. (2019). Summarizing Microblogs During Emergency Events: A Comparison of Extractive Summarization Algorithms. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_76

Download citation

DOI: https://doi.org/10.1007/978-981-13-1498-8_76
Published: 02 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1497-1
Online ISBN: 978-981-13-1498-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics