Exploring Wikipedia and DMoz as Knowledge Bases for Engineering a User Interests Hierarchy for Social Network Applications

  • Mandar Haridas
  • Doina Caragea
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5871)


The outgrowth of social networks in the recent years has resulted in opportunities for interesting data mining problems, such as interest or friendship recommendations. A global ontology over the interests specified by the users of a social network is essential for accurate recommendations. We propose, evaluate and compare three approaches to engineering a hierarchical ontology over user interests. The proposed approaches make use of two popular knowledge bases, Wikipedia and Directory Mozilla, to extract interest definitions and/or relationships between interests. More precisely, the first approach uses Wikipedia to find interest definitions, the latent semantic analysis technique to measure the similarity between interests based on their definitions, and an agglomerative clustering algorithm to group similar interests into higher level concepts. The second approach uses the Wikipedia Category Graph to extract relationships between interests, while the third approach uses Directory Mozilla to extract relationships between interests. Our results show that the third approach, although the simplest, is the most effective for building a hierarchy over user interests.


Latent Semantic Analysis User Interest Latent Semantic Indexing Social Network Application Agglomerative Cluster Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bausch, S., Han, L.: Social networking sites grow 47 percent, year over year, reaching 45 percent of web users, according to nielsen/netratings (2006),
  2. 2.
    Gruber, T.: A translation approach to portable ontology specifications. Technical report 5(2), 199–220, Knowledge Systems AI Laboratory, Stanford University (1993)Google Scholar
  3. 3.
    Bahirwani, V., Caragea, D., Aljandal, W., Hsu, W.: Ontology engineering and feature construction for predicting friendship links in the LiveJournal social network. In: The 2nd SNA-KDD Workshop 2008, Las Vegas, Nevada, USA (2008)Google Scholar
  4. 4.
    Jardine, N., van Rijsbergen, C.J.: The use of hierarchical clustering in information retrieval. Information Storage and Retrieval 7, 217–240 (1971)CrossRefGoogle Scholar
  5. 5.
    Haridas, M.: Exploring Wikipedia and DMoz as knowledge bases for engineering a user interest hierarchy for social network applications. M.S. Thesis, Department of Computing and Information Sciences. KSU, Manhattan, KS, USA (2009)Google Scholar
  6. 6.
    Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In: The 21st National Conference on Artificial Intelligence, Boston, MA (2006)Google Scholar
  7. 7.
    Janik, M., Kochut, K.: Wikipedia in action: Ontological knowledge in text categorization. Technical report no. uga-cs-tr-07-001, University of Georgia (2007)Google Scholar
  8. 8.
    Syed, Z.S., Finin, T., Joshi, A.: Wikipedia as an ontology for describing documents. In: The 2nd International Conference on Weblogs and Social Media (2008)Google Scholar
  9. 9.
    Strube, M., Ponzetto, S.P.: WikiRelate! computing semantic relatedness using Wikipedia. In: The 21st National Conf. on AI, Boston, MA (2006)Google Scholar
  10. 10.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: The 20th International Joint Conference on Artificial Intelligence, Hyderabad, India (2007)Google Scholar
  11. 11.
    Milne, D.: Computing semantic relatedness using Wikipedia link structure. In: The New Zealand Computer Science Research Student Conference (2007)Google Scholar
  12. 12.
    Lee, C.H., Yang, H.C., Ma, S.M.: A novel multi-language text categorization system using latent semantic indexing. In: The First International Conference on Innovative Computing, Information and Control, Beijing, China (2006)Google Scholar
  13. 13.
    Grobelnik, M., Mladeni, D.: Simple classification into large topic ontology of web documents. In: The 27th International Conference on Information Technology Interfaces, Cavtat, Croatia (2005)Google Scholar
  14. 14.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  15. 15.
    Manning, C., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)zbMATHGoogle Scholar
  16. 16.
    Rosario, B.: Latent semantic indexing: An overview. Final paper INFOSYS 240. University of Berkeley (2000)Google Scholar
  17. 17.
    Zesch, T., Gurevynch, I.: Analysis of the Wikipedia category graph for NLP applications. In: The TextGraphs-2 Workshop (2007)Google Scholar
  18. 18.
    Maarek, Y.S., Shaul, I.Z.B.: Automatically organizing bookmarks per contents. Comput. Netw. ISDN Syst. 28(7-11), 1321–1333 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Mandar Haridas
    • 1
  • Doina Caragea
    • 1
  1. 1.Kansas State UniversityManhattan

Personalised recommendations