Utilizing Microblogs for Web Page Relevant Term Acquisition

  • Tomáš Uherčík
  • Marián Šimko
  • Mária Bieliková
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7741)


To allow advanced processing of information available on the Web, the web content necessitates semantic descriptions (metadata) processable by machines. Manual creation of metadata even in a lightweight form such as (web page) relevant terms is for us humans demanding and almost an impossible task, especially when considering open information space such as the Web. New approaches are devised continuously to automate the process. In the age of the Social Web an important new source of data to mine emerges – social annotations of web content. In this paper we utilize microblogs in particular. We present a method for relevant domain terms extraction for web resources based on processing of the biggest microblogging service to date – Twitter. The method leverages social characteristics of the Twitter network to consider different relevancies of Twitter posts assigned to the web resources. We evaluated the method in a user experiment while observing its performance for different types of web content.


automatic term recognition keyword extraction user-generated content social annotations microblog twitter 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ahmad, K., Gillam, L., Tostevin, L.: University of Surrey participation in TREC 8: Weirdness indexing for logical document extrapolation and retrieval (WILDER). In: Proc. of the Eighth Text REtrieval Conference, TREC 8 (1999)Google Scholar
  2. 2.
    Barla, M.: Towards Social-based User Modeling and Personalization. Information Sciences and Technologies Bulletin of the ACM Slovakia 3(1), 52–60 (2011)Google Scholar
  3. 3.
    Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American Magazine (May 2001)Google Scholar
  4. 4.
    Bieliková, M., Barla, M., Šimko, M.: Lightweight Semantics for the “Wild Web”. In: White, B., Isaías, P., Santoro, F.M. (eds.) Proc. of the IADIS Int. Conf. on WWW/Internet, ICWI 2011, pp. xxv–xxxii. IADIS Press (2011)Google Scholar
  5. 5.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proc. of the 7th Int. Conf. on World Wide Web, pp. 107–117 (1998)Google Scholar
  6. 6.
    Dong, A.: Time is of the essence: improving recency ranking using Twitter data. In: Proc. of the 19th Int. Conf. on World Wide Web, pp. 331–340. ACM (2010)Google Scholar
  7. 7.
    Chen, J., Nairn, R., Nelson, L., Bernstein, M., Chi, E.: Short and tweet: experiments on recommending content from information streams. In: Proc. of the 28th Int. Conf. on Human Factors in Computing Systems, pp. 1185–1194. ACM (2010)Google Scholar
  8. 8.
    Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. In: Computational Linguistics, pp. 22–29. MIT Press (1991)Google Scholar
  9. 9.
    Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: Proc. of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, pp. 56–65 (2007)Google Scholar
  10. 10.
    Kanta, M., Šimko, M., Bieliková, M.: Trend-Aware User Modeling with Location-Aware Trends on Twitter. In: Proc. of Semantic Media Adaptation and Personalization, SMAP 2012. IEEE Computer Society (to appear, 2012)Google Scholar
  11. 11.
    Lučanský, M., Šimko, M.: Improving Relevance of Keyword Extraction from the Web Utilizing Visual Style Information. In: van Emde Boas, P., Italiano, G.F., Nawrocki, J., Sack, H., Groen, F.C.A. (eds.) SOFSEM 2013. LNCS, vol. 7741, pp. 445–456. Springer, Heidelberg (2013)Google Scholar
  12. 12.
    Majer, T., Šimko, M.: Leveraging Microblogs for Resource Ranking. In: Bieliková, M., Friedrich, G., Gottlob, G., Katzenbeisser, S., Turán, G. (eds.) SOFSEM 2012. LNCS, vol. 7147, pp. 518–529. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proc. of Conf. on Empirical Methods in Natural Language Processing, pp. 404–411. ACL (2004)Google Scholar
  14. 14.
    Phelan, O., McCarthy, K., Smyth, B.: Using twitter to recommend real-time topical news. In: Proc. of the 3rd ACM Conf. on Recommender Systems, pp. 385–388. ACM (2009)Google Scholar
  15. 15.
    Sabou, M., Gracia, J., Angeletou, S., D’Aquin, M., Motta, E.: Evaluating the Semantic Web: A Task-Based Approach. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 423–437. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    Tunkelang, D.: A Twitter Analog to PageRank (2009),
  17. 17.
    Weng, J., Lim, E., Jiang, J., He, Q.: TwitterRank: Finding Topic-sensitive Influential Twitterers. In: Proc. of the 3rd Int. Conf. on Web Search and Data Mining, pp. 261–270 (2010)Google Scholar
  18. 18.
    Wong, W., Liu, W., Bennamoun, M.: Ontology learning from text: A look back and in the future. ACM Computing Surveys (CSUR) 44(4), Article No. 20 (2012)Google Scholar
  19. 19.
    Wu, W., Zhang, B., Ostendorf, M.: Automatic generation of personalized annotation tags for twitter users. In: The 2010 Annual Conf. of the North American Chapter of the Association for Computational Linguistics, pp. 689–692. ACL (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Tomáš Uherčík
    • 1
  • Marián Šimko
    • 1
  • Mária Bieliková
    • 1
  1. 1.Faculty of Informatics and Information TechnologiesSlovak University of Technology in BratislavaBratislavaSlovakia

Personalised recommendations