Skip to main content

Multi-label Classification of Twitter Data Using Modified ML-KNN

  • Conference paper
  • First Online:
Advances in Data and Information Sciences

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 39))

Abstract

Social media has become a very rich source of information. Labeling unstructured social media text is a critical task as features belong to multiple labels. Without appropriate labels, raw data does not make any sense. So it is mandatory to provide appropriate labels. In this work, we have proposed a modified multilabel K nearest neighbor (Modified ML-KNN) for generating multiple labels of tweets which when configured with a certain distance measure and number of nearest neighbors gives better performance than conventional ML-KNN. To validate the proposed approach, we have used two different twitter data sets, one Disease related tweets set prepared by us using five different disease keywords and an other benchmark Seattle data set consisting of incident-related tweets. The modified ML-KNN is able to improve the performance of conventional ML-KNN with a minimum of 5% in both the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sofean M, Smith M (2012) A real-time disease surveillance architecture using social networks. Stud Health Technol Inf 180:823–827

    Google Scholar 

  2. Guo J, Zhang P, Guo L (2012) Mining hot topics from twitter streams. Procedia Comput Sci 9:2008–2011

    Article  Google Scholar 

  3. Rui W, Xing K, Jia Y (2016) BOWL: Bag of word clusters text representation using word embeddings. In: International conference on knowledge science, engineering and management. Springer International Publishing

    Chapter  Google Scholar 

  4. Ding W et al (2008) LRLW-LSI: an improved latent semantic indexing (LSI) text classifier. Lect Note Comput Sci 5009:483

    Google Scholar 

  5. Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. IEEE Trans Knowl Data Eng 26(8):1819–1837

    Article  Google Scholar 

  6. Aha DW (1991) Incremental constructive induction: an instance-based approach. In: Proceedings of the eighth international workshop on machine learning

    Chapter  Google Scholar 

  7. Cha SH (2007) Comprehensive survey on distance/similarity measures between probability density functions. City 1(2):1

    Google Scholar 

  8. Tsoumakas G et al (2011) Mulan: a java library for multi-label learning. J Mach Learn Res, 2411–2414

    Google Scholar 

  9. Schulz A et al (2014) Evaluating multi-label classification of incident-related tweets. In: Making Sense of Microposts (Microposts2014), vol 7

    Google Scholar 

  10. Velardi P et al (2014) Twitter mining for fine-grained syndromic surveillance. Artif Intell Med 61(3):153–163

    Article  Google Scholar 

  11. Roesslein J (2009) Tweepy documentation. http://tweepy.readthedocs.io/en/v3.5

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saurabh Kumar Srivastava .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Srivastava, S.K., Singh, S.K. (2019). Multi-label Classification of Twitter Data Using Modified ML-KNN. In: Kolhe, M., Trivedi, M., Tiwari, S., Singh, V. (eds) Advances in Data and Information Sciences . Lecture Notes in Networks and Systems, vol 39. Springer, Singapore. https://doi.org/10.1007/978-981-13-0277-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-0277-0_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-0276-3

  • Online ISBN: 978-981-13-0277-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics