Exploring the Meaning behind Twitter Hashtags through Clustering

Muntean, Cristina Ioana; Morar, Gabriela Andreea; Moldovan, Darie

doi:10.1007/978-3-642-34228-8_22

Cristina Ioana Muntean⁹,
Gabriela Andreea Morar⁹ &
Darie Moldovan⁹

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 127))

Included in the following conference series:

International Conference on Business Information Systems

1617 Accesses
12 Citations

Abstract

Social networks are generators of large amount of data produced by users, who are not limited with respect to the content of the information they exchange. The data generated can be a good indicator of trends and topic preferences among users. In our paper we focus on analyzing and representing hashtags by the corpus in which they appear. We cluster a large set of hashtags using K-means on map reduce in order to process data in a distributed manner. Our intention is to retrieve connections that might exist between different hashtags and their textual representation, and grasp their semantics through the main topics they occur with.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hannon, J., Bennett, M., Smyth, B.: Recommending twitter users to follow using content and collaborative filtering approaches. In: Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys 2010, pp. 199–206. ACM, New York (2010)
Google Scholar
Kireyev, K., Palen, L., Anderson, K.: Applications of Topics Models to Analysis of Disaster-Related Twitter Data (December 2009)
Google Scholar
Phelan, O., McCarthy, K., Smyth, B.: Using twitter to recommend real-time topical news. In: Proceedings of the Third ACM Conference on Recommender Systems, RecSys 2009, pp. 385–388. ACM, New York (2009)
Google Scholar
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM 2010, pp. 261–270. ACM, New York (2010)
Google Scholar
Romero, D.M., Meeder, B., Kleinberg, J.: Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. In: Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 695–704. ACM, New York (2011)
Google Scholar
Gruhl, D., Guha, R., Liben-Nowell, D., Tomkins, A.: Information diffusion through blogspace. In: Proceedings of the 13th International Conference on World Wide Web, WWW 2004, pp. 491–501. ACM, New York (2004)
Google Scholar
Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, WebKDD/SNA-KDD 2007, pp. 56–65. ACM, New York (2007)
Google Scholar
Thomas, K., Grier, C., Song, D., Paxson, V.: Suspended accounts in retrospect: an analysis of twitter spam. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, IMC 2011, pp. 243–258. ACM, New York (2011)
Chapter Google Scholar
Wang, A.H.: Dont’t Follow me: Spam Detection in Twitter. In: Proceedings of the International Conference on Security and Cryptography (SECRYPT) (July 2010)
Google Scholar
Kwak, H., Lee, C., Park, H., Moon, S.: What is twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600. ACM (2010)
Google Scholar
Chen, J., Nairn, R., Nelson, L., Bernstein, M., Chi, E.: Short and tweet: experiments on recommending content from information streams. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, pp. 1185–1194. ACM, New York (2010)
Google Scholar
Ellen, J.: All about microtext - a working definition and a survey of current microtext research within artificial intelligence and natural language processing. In: ICAART (1) 2011, pp. 329–336 (2011)
Google Scholar
O’Connor, B., Krieger, M., Ahn, D.: TweetMotif: Exploratory Search and Topic Summarization for Twitter. In: Cohen, W.W., Gosling, S., Cohen, W.W., Gosling, S. (eds.) ICWSM. The AAAI Press (2010)
Google Scholar
Xu, T., Oard, D.W.: Wikipedia-based topic clustering for microblogs. Proceedings of the American Society for Information Science and Technology 48(1), 1–10 (2011)
Article Google Scholar
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW 2008: Proceeding of the 17th International Conference on World Wide Web, pp. 91–100. ACM, New York (2008)
Chapter Google Scholar
Pennacchiotti, M., Popescu, A.M.: A machine learning approach to twitter user classification (2011)
Google Scholar
Rangrej, A., Kulkarni, S., Tendulkar, A.V.: Comparative study of clustering techniques for short text documents. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011, pp. 111–112. ACM, New York (2011)
Chapter Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Hadoop, http://hadoop.apache.org
Papadimitriou, S., Sun, J.: Disco: Distributed co-clustering with map-reduce: A case study towards petabyte-scale end-to-end mining. In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 512–521. IEEE Computer Society, Washington, DC (2008)
Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: IEEE/NASA Goddard Conference on Mass Storage Systems and Technologies, pp. 1–10 (2010)
Google Scholar
Apache Mahout, http://hadoop.apache.or
Wegener, D., Mock, M., Adranale, D., Wrobel, S.: Toolkit-based high-performance data mining of large data on mapreduce clusters. In: Proceedings of the 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009, pp. 296–301. IEEE Computer Society, Washington, DC (2009)
Chapter Google Scholar
Chu, C.T., Kim, S.K., Lin, Y.A., Yu, Y., Bradski, G.R., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Schölkopf, B., Platt, J.C., Hoffman, T. (eds.) NIPS, pp. 281–288. MIT Press (2006)
Google Scholar
Cascading, http://www.cascading.org/
Lucene, http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/all/org/apache/lucene/analysis/Analyzer.html
Willett, P.: The Porter Stemming Algorithm: Then and Now. Program: Electronic Library and Information Systems 40(3), 219–223 (2006)
Article Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
Google Scholar
McCallum, A., Nigam, K., Ungar, L.: Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 169–178. ACM (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Economics and Business Administration, Babes-Bolyai University, Cluj-Napoca, Romania
Cristina Ioana Muntean, Gabriela Andreea Morar & Darie Moldovan

Authors

Cristina Ioana Muntean
View author publications
You can also search for this author in PubMed Google Scholar
Gabriela Andreea Morar
View author publications
You can also search for this author in PubMed Google Scholar
Darie Moldovan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Poznań University of Economics, Poznań, Poland
Witold Abramowicz
Knowledge Media Institute, The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
John Domingue
Poznań University of Economics, Al. Niepodległości 10, 60-967, Poznań, Poland
Krzysztof Węcel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Muntean, C.I., Morar, G.A., Moldovan, D. (2012). Exploring the Meaning behind Twitter Hashtags through Clustering. In: Abramowicz, W., Domingue, J., Węcel, K. (eds) Business Information Systems Workshops. BIS 2012. Lecture Notes in Business Information Processing, vol 127. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34228-8_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-34228-8_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34227-1
Online ISBN: 978-3-642-34228-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics