Data Mining in E-Learning

  • Khaled Hammouda
  • Mohamed Kamel
Part of the Advanced Information and Knowledge Processing book series (AI&KP)


This chapter presents an innovative approach for performing data mining on documents, which serves as a basis for knowledge extraction in e-learning environments. The approach is based on a radical model of text data that considers phrasal features paramount in documents, and employs graph theory to facilitate phrase representation and efficient matching. In the process of text mining, a grouping (clustering) approach is also employed to identify groups of documents such that each group represents a different topic in the underlying document collection. Document groups are tagged with topic labels through unsupervised key-phrase extraction from the document clusters. The approach serves in solving some of the difficult problems in e-learning where the volume of data could be overwhelming for the learner, such as automatically organizing documents and articles based on topics, and providing summaries for documents and groups of documents.1


Text Mining Hierarchical Agglomerative Cluster Document Cluster Inverted List Reference Topic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Salton, G., Wong, A., Yang, C. (1975) A vector space model for automatic indexing. Communications of the ACM, 18:613–620.CrossRefGoogle Scholar
  2. 2.
    Salton, G., McGill, M.J. (1983) Introduction to Modem Information Retrieval. McGraw-Hill Computer Science Series. New York: McGraw-Hill.zbMATHGoogle Scholar
  3. 3.
    Salton, G. (1989) Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison-Wesley.Google Scholar
  4. 4.
    Wong, W., Fu, A. (2000) Incremental document clustering for web page classification. In: 2000 International Conference on Information Society in the 21th Century: Emerging Technologies and New challenges (IS2000), Japan.Google Scholar
  5. 5.
    Jiang, Z., Joshi, A., Krishnapuram, R., Yi, L. (2000) Retriever: improving web search engine results using clustering. Technical report, CSEE Department, University of Maryland, Baltimore County (UMBC).Google Scholar
  6. 6.
    Kurtz, S. (1999) Reducing the space requirement of suffix trees. Software—Practice and Experience, 29:1149–1171.CrossRefGoogle Scholar
  7. 7.
    Apostolico, A. (1985) The myriad virtues of subword trees. In: Apostolico, A., Galil, Z. (eds.), Combinatorial Algorithms on Words. NATO ISI Series. New York: Springer-Verlag, pp. 85–96.CrossRefGoogle Scholar
  8. 8.
    Manber, U., Myers, G. (1993) Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 22:935–948.MathSciNetCrossRefGoogle Scholar
  9. 9.
    Caropreso, M.F., Matwin, S., Sebastiani, F. (2000) Statistical phrases in automated text categorization. Technical report IEI-B4-07-2000, Pisa, Italy.Google Scholar
  10. 10.
    Isaacs, J.D., Aslam, J.A. (1999) Investigating measures for pairwise document similarity. Technical report PCS-TR99-357, Dartmouth College, Computer Science, Hanover, NH.Google Scholar
  11. 11.
    Lin, D. (1998) An information-theoretic definition of similarity. In: Proceedings of the 15th International Conf. on Machine Learning, San Francisco: Morgan Kaufmann, pp. 296–304.Google Scholar
  12. 12.
    Strehl, A., Ghosh, J., Mooney, R. (2000) Impact of similarity measures on web-page clustering. In: Proceedings of the 17th National Conference on Artificial Intelligence: Workshop of Artificial Intelligence for Web Search (AAAI 2000), Austin, TX, AAAI, pp. 58–64.Google Scholar
  13. 13.
    Yang, Y., Pedersen, J.P. (1997) A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML’97), Nashville, TN, pp. 412–420.Google Scholar
  14. 14.
    Jain, A.K., Dubes, R.C. (1988) Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice Hall.zbMATHGoogle Scholar
  15. 15.
    Steinbach, M., Karypis, G., Kumar, V. (2000) A comparison of document clustering techniques. KDD-2000 Workshop on TextMining.Google Scholar
  16. 16.
    Beil, F., Ester, M., Xu, X. (2002) Frequent term-based text clustering. In: Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada, pp. 436–142.Google Scholar
  17. 17.
    Hill, D.R. (1968) A vector clustering technique. In: K. Samuelson, (ed.), Mechanized Information Storage, Retrieval and Dissemination. Amsterdam: North-Holland.Google Scholar
  18. 18.
    Cios, K., Pedrycs, W., Swiniarski, R. (1998) Data Mining Methods for Knowledge Discovery. Boston: Kluwer Academic Publishers.CrossRefGoogle Scholar
  19. 19.
    Dasarathy, B.V. (1991) Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. McGraw-Hill Computer Science Series. Las Alamitos, CA: IEEE Computer Society Press.Google Scholar
  20. 20.
    Lu, S. Y., Fu, K.S. (1978) A sentence-to-sentence clustering procedure for pattern analysis. IEEE Transactions on Systems, Man, and Cybernetics, 8:381–389.MathSciNetCrossRefGoogle Scholar
  21. 21.
    Zamir, O., Etzioni, O., Madanim, O., Karp, R.M. (1997) Fast and intuitive clustering of web documents. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, AAAI, pp. 287–290.Google Scholar
  22. 22.
    Hammouda, K., Kamel, M. (2004) Document similarity using a phrase indexing graph model. Knowledge and Information Systems, 6: 710–727.CrossRefGoogle Scholar
  23. 23.
    Frank, E., Paynter, G.W., Witten, L.H., Gutwin, C., Nevill-Manning, C.G. (1999) Domain-specific keyphrase extraction. In: Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), Stockholm, Sweden, Morgan Kaufmann, pp. 668–673.Google Scholar
  24. 24.
    Turney, P.D. (2000) Learning algorithms for keyphrase extraction. Information Retrieval, 2:303–336.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2007

Authors and Affiliations

  • Khaled Hammouda
    • 1
  • Mohamed Kamel
    • 2
  1. 1.Systems Design EngineeringUniversity of WaterlooWaterlooOntarioCanada
  2. 2.Electrical & Computer EngineeringUniversity of WaterlooWaterlooCanada

Personalised recommendations