Skip to main content

Exploring the Meaning behind Twitter Hashtags through Clustering

  • Conference paper
Business Information Systems Workshops (BIS 2012)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 127))

Included in the following conference series:

Abstract

Social networks are generators of large amount of data produced by users, who are not limited with respect to the content of the information they exchange. The data generated can be a good indicator of trends and topic preferences among users. In our paper we focus on analyzing and representing hashtags by the corpus in which they appear. We cluster a large set of hashtags using K-means on map reduce in order to process data in a distributed manner. Our intention is to retrieve connections that might exist between different hashtags and their textual representation, and grasp their semantics through the main topics they occur with.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 72.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hannon, J., Bennett, M., Smyth, B.: Recommending twitter users to follow using content and collaborative filtering approaches. In: Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys 2010, pp. 199–206. ACM, New York (2010)

    Google Scholar 

  2. Kireyev, K., Palen, L., Anderson, K.: Applications of Topics Models to Analysis of Disaster-Related Twitter Data (December 2009)

    Google Scholar 

  3. Phelan, O., McCarthy, K., Smyth, B.: Using twitter to recommend real-time topical news. In: Proceedings of the Third ACM Conference on Recommender Systems, RecSys 2009, pp. 385–388. ACM, New York (2009)

    Google Scholar 

  4. Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM 2010, pp. 261–270. ACM, New York (2010)

    Google Scholar 

  5. Romero, D.M., Meeder, B., Kleinberg, J.: Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 695–704. ACM, New York (2011)

    Google Scholar 

  6. Gruhl, D., Guha, R., Liben-Nowell, D., Tomkins, A.: Information diffusion through blogspace. In: Proceedings of the 13th International Conference on World Wide Web, WWW 2004, pp. 491–501. ACM, New York (2004)

    Google Scholar 

  7. Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, WebKDD/SNA-KDD 2007, pp. 56–65. ACM, New York (2007)

    Google Scholar 

  8. Thomas, K., Grier, C., Song, D., Paxson, V.: Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC 2011, pp. 243–258. ACM, New York (2011)

    Chapter  Google Scholar 

  9. Wang, A.H.: Dont’t Follow me: Spam Detection in Twitter. In: Proceedings of the International Conference on Security and Cryptography (SECRYPT) (July 2010)

    Google Scholar 

  10. Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)

    Google Scholar 

  11. Chen, J., Nairn, R., Nelson, L., Bernstein, M., Chi, E.: Short and tweet: experiments on recommending content from information streams. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, pp. 1185–1194. ACM, New York (2010)

    Google Scholar 

  12. Ellen, J.: All about microtext - a working definition and a survey of current microtext research within artificial intelligence and natural language processing. In: ICAART (1) 2011, pp. 329–336 (2011)

    Google Scholar 

  13. O’Connor, B., Krieger, M., Ahn, D.: TweetMotif: Exploratory Search and Topic Summarization for Twitter. In: Cohen, W.W., Gosling, S., Cohen, W.W., Gosling, S. (eds.) ICWSM. The AAAI Press (2010)

    Google Scholar 

  14. Xu, T., Oard, D.W.: Wikipedia-based topic clustering for microblogs. Proceedings of the American Society for Information Science and Technology 48(1), 1–10 (2011)

    Article  Google Scholar 

  15. Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW 2008: Proceeding of the 17th International Conference on World Wide Web, pp. 91–100. ACM, New York (2008)

    Chapter  Google Scholar 

  16. Pennacchiotti, M., Popescu, A.M.: A machine learning approach to twitter user classification (2011)

    Google Scholar 

  17. Rangrej, A., Kulkarni, S., Tendulkar, A.V.: Comparative study of clustering techniques for short text documents. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011, pp. 111–112. ACM, New York (2011)

    Chapter  Google Scholar 

  18. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  19. Hadoop, http://hadoop.apache.org

  20. Papadimitriou, S., Sun, J.: Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 512–521. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

  21. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: IEEE/NASA Goddard Conference on Mass Storage Systems and Technologies, pp. 1–10 (2010)

    Google Scholar 

  22. Apache Mahout, http://hadoop.apache.or

  23. Wegener, D., Mock, M., Adranale, D., Wrobel, S.: Toolkit-based high-performance data mining of large data on mapreduce clusters. In: Proceedings of the 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009, pp. 296–301. IEEE Computer Society, Washington, DC (2009)

    Chapter  Google Scholar 

  24. Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) NIPS, pp. 281–288. MIT Press (2006)

    Google Scholar 

  25. Cascading, http://www.cascading.org/

  26. Lucene, http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/all/org/apache/lucene/analysis/Analyzer.html

  27. Willett, P.: The Porter Stemming Algorithm: Then and Now. Program: Electronic Library and Information Systems 40(3), 219–223 (2006)

    Article  Google Scholar 

  28. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)

    Google Scholar 

  29. McCallum, A., Nigam, K., Ungar, L.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169–178. ACM (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Muntean, C.I., Morar, G.A., Moldovan, D. (2012). Exploring the Meaning behind Twitter Hashtags through Clustering. In: Abramowicz, W., Domingue, J., Węcel, K. (eds) Business Information Systems Workshops. BIS 2012. Lecture Notes in Business Information Processing, vol 127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34228-8_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34228-8_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34227-1

  • Online ISBN: 978-3-642-34228-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics