Encyclopedia of Social Network Analysis and Mining

Living Edition
| Editors: Reda Alhajj, Jon Rokne

Automatic Document Topic Identification Using Social Knowledge Network

  • Mostafa M. Hassan
  • Fakhreddine Karray
  • Mohamed S. Kamel
Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-7163-9_352-1




Stands for automatic document topic identification


“A model for describing the world, that consists of a set of types (concepts), properties, and relationship types” (Garshol 2004)


Stands for social knowledge network


Stands for Wikipedia Hierarchical Ontology


A term weighting methodology that is commonly used in text mining and in information retrieval. It stands for term frequency-inverse document frequency


An online social networking website


Stands for Resource Description Framework. It is a method of representing information to facilitate the data interchange on the Web


Stands for automatic speech recognition


Stands for normalized mutual information. It is a well-known document clustering performance measure


Stands for nonnegative matrix factorization. Nonnegative matrix factorization is a family of algorithms that...

This is a preview of subscription content, log in to check access.


  1. Auer S, Lehmann J (2007) What have Innsbruck and Leipzig in common? Extracting semantics from wiki content. In: Franconi E, Kifer M, May W (eds) The semantic web: research and applications. Springer, Berlin/New York, pp 503–517CrossRefGoogle Scholar
  2. Coursey K, Mihalcea R (2009) Topic identification using Wikipedia graph centrality. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the Association for Computational Linguistics, companion volume: short papers, Association for Computational Linguistics, Boulder, pp 117–120Google Scholar
  3. Coursey K, Mihalcea R, Moen W (2009) Using encyclopedic knowledge for automatic topic identification. In: Proceedings of the thirteenth conference on computational natural language learning, Association for Computational Linguistics, Boulder, pp 210–218Google Scholar
  4. European Travel Commission (2013) Social networking and UGC. http://www.newmediatrendwatch.com/world-overview/137-social-networking-and-ugc, June 2013. Online; Accessed 25 Oct 2013
  5. Garshol L (2004) Metadata? Thesauri? Taxonomies? Topic maps! Making sense of it all. J Inf Sci 30(4):378CrossRefGoogle Scholar
  6. Giles J (2005) Internet encyclopaedias go head to head. Nature 438(7070):900–901CrossRefGoogle Scholar
  7. Hassan M (2013) Automatic document topic identification using hierarchical ontology extracted from human background knowledge. PhD dissertation, University of WaterlooGoogle Scholar
  8. Huynh D, Cao T, Pham P, Hoang T (2009) Using hyperlink texts to improve quality of identifying document topics based on Wikipedia. In: International conference on knowledge and systems engineering, 2009 (KSE’09), IEEE, Hanoi, pp 249–254Google Scholar
  9. Janik M, Kochut K (2008a) Training-less Ontology-based Text Categorization. In: workshop on exploiting semantic annotations in information retrieval (ESAIR 2008) at the 30th European Conference on Information Retrieval, ECIRGoogle Scholar
  10. Janik M, Kochut K (2008b) Wikipedia in action: ontological knowledge in text categorization. In: IEEE international conference on semantic computing, 2008, IEEE, Santa Clara, pp 268–275Google Scholar
  11. Korfiatis NT, Poulos M, Bokos G (2006) Evaluating authoritative sources using social networks: an insight from Wikipedia. Online Inf Rev 30(3):252–262CrossRefGoogle Scholar
  12. Kuhn HW (2005) The Hungarian method for the assignment problem. Nav Res Logist 52(1):7–21CrossRefGoogle Scholar
  13. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137MathSciNetCrossRefzbMATHGoogle Scholar
  14. Medelyan O (2009) Human-competitive automatic topic indexing. PhD dissertation, The University of WaikatoGoogle Scholar
  15. Medelyan O, Witten I, Milne D (2008) Topic indexing with Wikipedia. In: Proceedings of AAAI workshop on Wikipedia and artificial intelligence: an evolving synergy, AAAI, Chicago, pp 19–24Google Scholar
  16. Ng A, Jordan M, Weiss Y et al (2002) On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2:849–856Google Scholar
  17. Popescul A, Ungar LH (2000) Automatic labeling of document clusters. http://citeseer.ist.psu.edu/viewdoc/download? doi:
  18. Schönhofen P (2009) Identifying document topics using the Wikipedia category network. Web Intell Agent Syst 7(2):195–207Google Scholar
  19. Xu W, Gong Y (2004) Document clustering by concept factorization. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, ACM, Sheffield, pp 202–209Google Scholar
  20. Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, ACM, Toronto, pp 267–273Google Scholar
  21. Zhao Y, Karypis G, Fayyad U (2005) Hierarchical clustering algorithms for document datasets. Data Min Knowl Discov 10(2):141–168MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  • Mostafa M. Hassan
    • 1
  • Fakhreddine Karray
    • 2
  • Mohamed S. Kamel
    • 2
  1. 1.Sandvine Inc.WaterlooCanada
  2. 2.Department of Electrical and Computer Engineering, Centre for Pattern Analysis and Machine Intelligence (CPAMI)University of WaterlooWaterlooCanada

Section editors and affiliations

  • Fakhreddine Karray
    • 1
  1. 1.Department of Electrical and Computer Engineering, Centre for Pattern Analysis and Machine Intelligence (CPAMI)University of WaterlooWaterlooCanada