Improving Categorisation in Social Media Using Hyperlinks to Structured Data Sources

  • Sheila Kinsella
  • Mengjiao Wang
  • John G. Breslin
  • Conor Hayes
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6644)


Social media presents unique challenges for topic classification, including the brevity of posts, the informal nature of conversations, and the frequent reliance on external hyperlinks to give context to a conversation. In this paper we investigate the usefulness of these external hyperlinks for categorising the topic of individual posts. We focus our analysis on objects that have related metadata available on the Web, either via APIs or as Linked Data. Our experiments show that the inclusion of metadata from hyperlinked objects in addition to the original post content significantly improved classifier performance on two disparate datasets. We found that including selected metadata from APIs and Linked Data gave better results than including text from HTML pages. We investigate how this improvement varies across different topics. We also make use of the structure of the data to compare the usefulness of different types of external metadata for topic classification in a social media dataset.


social media hyperlinks text classification Linked Data metadata 


  1. 1.
    Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: 1st Int’l Conference on Web Search and Data Mining, WSDM 2008. ACM, New York (2008)Google Scholar
  2. 2.
    Angelova, R., Weikum, G.: Graph-based text classification: Learn from your neighbors. In: 29th Int’l SIGIR Conference on Research and Development in Information Retrieval. SIGIR 2006. ACM, New York (2006)Google Scholar
  3. 3.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  4. 4.
    Berendt, B., Hanser, C.: Tags are not metadata, but “just more content”–to some people. In: 5th Int’l Conference on Weblogs and Social Media, ICWSM 2007 (2007)Google Scholar
  5. 5.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The story so far. International Journal on Semantic Web and Information Systems 5(3) (2009)Google Scholar
  6. 6.
    Breslin, J.G., Harth, A., Bojars, U., Decker, S.: Towards semantically-interlinked online communities. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 500–514. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Cha, M., Pérez, J., Haddadi, H.: Flash Floods and Ripples: The spread of media content through the blogosphere. In: 3rd Int’l Conference on Weblogs and Social Media, ICWSM 2009 (2009)Google Scholar
  8. 8.
    Figueiredo, F., Belém, F., Pinto, H., Almeida, J., Gonçalves, M., Fernandes, D., Moura, E., Cristo, M.: Evidence of quality of textual features on the Web 2.0. In: 18th Conference on Information and Knowledge Management, CIKM 2009. ACM, New York (2009)Google Scholar
  9. 9.
    Garcia Esparza, S., O’Mahony, M.P., Smyth, B.: Towards tagging and categorization for micro-blogs. In: 21st National Conference on Artificial Intelligence and Cognitive Science, AICS 2010 (2010)Google Scholar
  10. 10.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: An update. ACM SIGKDD Exp. 11(1) (2009)Google Scholar
  11. 11.
    Irani, D., Webb, S., Pu, C., Li, K.: Study of trend-stuffing on Twitter through text classification. In: 7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, CEAS 2010 (2010)Google Scholar
  12. 12.
    Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: Tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. 60(11) (2009)Google Scholar
  13. 13.
    Kinsella, S., Passant, A., Breslin, J.G.: Using hyperlinks to enrich message board content with Linked Data. In: 6th Int’l Conference on Semantic Systems, I-SEMANTICS 2010. ACM, New York (2010)Google Scholar
  14. 14.
    Kinsella, S., Passant, A., Breslin, J.G.: Topic classification in social media using metadata from hyperlinked objects. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Murdock, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 201–206. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  15. 15.
    Mendoza, M., Poblete, B., Castillo, C.: Twitter under crisis: Can we trust what we RT? In: 1st Workshop on Social Media Analytics, SOMA 2010. ACM, New York (2010)Google Scholar
  16. 16.
    Qi, X., Davison, B.: Classifiers without borders: Incorporating fielded text from neighboring web pages. In: 31st Int’l SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008. ACM, New York (2008)Google Scholar
  17. 17.
    Sergey, B., Lawrence, P.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)CrossRefGoogle Scholar
  18. 18.
    Stankovic, M., Rowe, M., Laublet, P.: Mapping tweets to conference talks: a goldmine for semantics. In: 3rd Int’l Workshop on Social Data on the Web, SDoW 2010 (2010),
  19. 19.
    Sun, A., Suryanto, M.A., Liu, Y.: Blog classification using tags: An empirical study. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 307–316. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  20. 20.
    Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Fourth Int’l Conference on Web Search and Data Mining, WSDM 2011. ACM, New York (2011)Google Scholar
  21. 21.
    Yin, Z., Li, R., Mei, Q., Han, J.: Exploring social tagging graph for web object classification. In: 15th SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, KDD 2009. ACM, New York (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Sheila Kinsella
    • 1
  • Mengjiao Wang
    • 1
  • John G. Breslin
    • 1
    • 2
  • Conor Hayes
    • 1
  1. 1.Digital Enterprise Research InstituteNational University of IrelandGalwayIreland
  2. 2.School of Engineering and InformaticsNational University of IrelandGalwayIreland

Personalised recommendations