Similarity Matching of Computer Science Unit Outlines in Higher Education

  • Gaurav Langan
  • James MontgomeryEmail author
  • Saurabh Garg
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9992)


With the globalisation of education, students may undertake higher education courses anywhere in the world. Yet there is variation between different universities’ offerings. Even though web search engines can help one to locate potentially similar courses or subjects offered by different universities, judging the degree of similarity between each of them is currently a manual process in which a student or staff member has to go through subject/unit descriptions within a course to understand the different topics taught. In this paper, we study the application of text mining to evaluate the similarity or overlap between different units and propose a system that can help students and staff to make these judgements. The unit or course descriptions are generally short, containing 100–200 words, and exhibit very wide diversity in the ways they are written. Experimental results using data from Australian and international universities demonstrate the accuracy of the proposed system in calculating the similarity between different computing units.


Similarity Score Semantic Similarity Semantic Network Keyword Extraction Educational Data Mining 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on interactive presentation sessions, pp. 69–72. Association for Computational Linguistics (2006)Google Scholar
  2. 2.
    Buckland, M.K., Gey, F.C.: The relationship between recall and precision. JASIS 45(1), 12–19 (1994)CrossRefGoogle Scholar
  3. 3.
    Damashek, M., et al.: Gauging similarity with n-grams: language-independent categorization of text. Science 267(5199), 843–848 (1995)CrossRefGoogle Scholar
  4. 4.
    Fellbaum, C.: WordNet. Wiley Online Library (1998)Google Scholar
  5. 5.
    Luan, J.: Data mining and knowledge management in higher education-potential applications (2002)Google Scholar
  6. 6.
    Medelyan, O., Milne, D., Legg, C., Witten, I.H.: Mining meaning from Wikipedia. Int. J. Hum. Comput. Stud. 67(9), 716–754 (2009)CrossRefGoogle Scholar
  7. 7.
    Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)Google Scholar
  8. 8.
    Milne, D., Witten, I.: An open-source toolkit for mining Wikipedia. Artif. Intell. 194, 222–239 (2013)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Romero, C., Ventura, S.: Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(6), 601–618 (2010)CrossRefGoogle Scholar
  10. 10.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefzbMATHGoogle Scholar
  11. 11.
    Tang, C., Lau, R.W., Li, Q., Yin, H., Li, T., Kilis, D.: Personalized courseware construction based on web data mining. In: Proceedings of the First International Conference on Web Information Systems Engineering, vol. 2, pp. 204–211. IEEE (2000)Google Scholar
  12. 12.
    Wikipedia: Main page (2015).
  13. 13.
    Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)Google Scholar
  14. 14.
    Zhang, L., Liu, X., Liu, X.: Personalized instructing recommendation system based on web mining. In: The 9th International Conference for Young Computer Scientists, 2008. ICYCS 2008, pp. 2517–2521. IEEE (2008)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.School of Engineering & ICTUnversity of TasmaniaHobartAustralia

Personalised recommendations