Knowledge and Information Systems

, Volume 36, Issue 1, pp 211–250 | Cite as

DRAL: a tool for discovering relevant e-activities for learners

  • Amelia ZafraEmail author
  • Cristóbal Romero
  • Sebastián Ventura
Regular Paper


Web-based educational systems routinely collect vast quantities of data on students’ e-activity generating log files that offer researchers unique opportunities to apply data mining techniques and discover interesting information to improve the learning process. This paper proposes a friendly and intuitive tool called DRAL to detect the most relevant e-activities that a student needs to pass a course based on features extracted from logged data in an education web-based system. The method uses a more flexible representation of the available information based on multiple instance learning to prevent the appearance of a great number of missing values and is based on a multi-objective grammar guided genetic programming algorithm which obtains simple and clear classification rules which are markedly useful to identify the number, type and time of e-activities more relevant so that a student has a high probability to pass a course. To validate this approach, our proposal is compared with the most traditional proposals in multiple instance learning over the years. Experimental results demonstrate that the approach proposed successfully improves the accuracy of previous models by finding a balance between specificity and sensitivity values.


Web usage mining Educational data mining Multiple instance learning Multiobjective evolutionary algorithm Genetic programming 



The authors gratefully acknowledge the financial subsidy provided by the Spanish Department of Research under TIN2008-06681-C06-03 and P08-TIC-3720 Projects and FEDER fund.


  1. 1.
    Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector machines for multiple-instance learning. In: NIPS’02: proceedings of neural information processing system. Vancouver, Canada, pp 561–568Google Scholar
  2. 2.
    Ardila A (2001) Predictors of university academic performance in Colombia. Int J Educ Res 35:411–417CrossRefGoogle Scholar
  3. 3.
    Auer P, Ortner R (2004) A boosting approach to multiple instance learning. In: ECML’04: Proceedings of the 5th European Conference on Machine Learning. Lecture Notes in Computer Science, vol 3201, Pisa, Italy, pp 63–74Google Scholar
  4. 4.
    Bekele R, Menzel W (2005) A bayesian approach to predict performance of a student (bapps): a case with ethiopian students. Artif Intell Appl 22:189–194Google Scholar
  5. 5.
    Belanger F, Jordan DH (2000) Evaluation and implementation of distance learning: technologies, tools and techniques. Idea Group, HersheyGoogle Scholar
  6. 6.
    Busato V, Prins F, Elshout J, Hamaker C (2000) Intellectual ability, learning style, personality, achievement motivation and academic success of psychology students in higher education. Pers Individ Differ 29:1057–1068CrossRefGoogle Scholar
  7. 7.
    Cen H, Koedinger KR, Junker B (2006) Learning factors analysis a general method for cognitive model evaluation and improvement, vol 4053. Springer, BerlinGoogle Scholar
  8. 8.
    Chadwick SA (1999) Teaching virtually via the web: comparing student performance and attitudes about communication in lecture, virtual web-based, and web-supplemented courses. Electron J Commun 9:1–13Google Scholar
  9. 9.
    Chai YM, Yang ZW (2007) A multi-instance learning algorithm based on normalized radial basis function network. In: ISSN’07: proceedings of the 4th international symposium on neural networks. Lecture Notes in Computer Science, vol 4491, Nanjing, China, pp 1162–1172Google Scholar
  10. 10.
    Chen X, Zhang C, Chen S, Rubin S (2009) A human-centered multiple instance learning framework for semantic video retrieval. IEEE Trans Syst Man Cybern Part C Appl Rev 39(2):228–233CrossRefGoogle Scholar
  11. 11.
    Chevaleyre Y, Bredeche N, Zucker J (2002) Learning rules from multiple instance data: Issues and algorithms. In: IPMU’02: proceedings of 9th information processing and management of uncertainty in knowledge-based systems, Annecy, France, pp 455–459Google Scholar
  12. 12.
    Chevaleyre YZ, Zucker JD (2001) Solving multiple-instance and multiple-part learning problems with decision trees and decision rules. Application to the mutagenesis problem. In: AI’01: proceedings of the 14th of the Canadian society for computational studies of intelligence, Lecture Note in Computer Science, vol 2056, Ottawa, Canada, pp 204–214Google Scholar
  13. 13.
    Chidolue M (2001) The relationship between teacher characteristics, learning environment and student achievement and attitude. Stud Educ Eval 22(3):263–274CrossRefGoogle Scholar
  14. 14.
    Coello CA, Lamont GB, Veldhuizen DAV (2007) Evolutionary algorithms for solving multi-objective problems. Genetic and evolutionary computation, 2nd edn. Springer, BerlinGoogle Scholar
  15. 15.
    Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimisation: NSGA-II. In: PPSN VI: proceedings of the 6th international conference on parallel problem solving from nature. Springer, London, UK, pp 849–858Google Scholar
  16. 16.
    Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 17:1–30MathSciNetGoogle Scholar
  17. 17.
    Dietterich TG, Lathrop RH, Lozano-Perez T (1997) Solving the multiple instance problem with axis-parallel rectangles. Artif Intell 89(1-2):31–71zbMATHCrossRefGoogle Scholar
  18. 18.
    Fausett L, Elwasif W (1994) Predicting performance from test scores using backpropagation and counterpropagation. In: WCCI’94: IEEE world congress on computational intelligence, Washington, USA, pp 3398–3402Google Scholar
  19. 19.
    Gao S, Suna Q (2008) Exploiting generalized discriminative multiple instance learning for multimedia semantic concept detection. Pattern Recogn 41(10):3214–3223zbMATHCrossRefGoogle Scholar
  20. 20.
    Garcia-Piquer A, Fornells A, Orriols-Puig A, Corral G, Golobardes E (2011) Data classification through an evolutionary approach based on multiple criteria. Knowl Inf Syst (in press). doi: 10.1007/s10115-011-0462-9
  21. 21.
    Gartner T, Flach PA, Kowalczyk, A., Smola AJ (2002) Multi-instance kernels. In: ICML’02: proceedings of the 19th international conference on machine learning. Morgan Kaufmann, Sydney, Australia, pp 179–186Google Scholar
  22. 22.
    Gu Z, Mei T, Tang J, Wu X, Hua X (2008) Milc2: A multi-layer multi-instance learning approach to video concept detection. In: MMM’08: proceedings of the 14th international conference of multimedia modeling, Kyoto, Japan, pp 24–34Google Scholar
  23. 23.
    Herman G, Ye G, Xu J, Zhang B (2008) Region-based image categorization with reduced feature set. In: Proceedings of the 10th IEEE workshop on multimedia signal processing, Cairns, QLD, pp 586–591Google Scholar
  24. 24.
    Hong Y, Kwong S (2009) Learning assignment order of instances for the constrained k-means clustering algorithm. IEEE Trans Syst Man Cybern Part B Cybern 39(2):568–574CrossRefGoogle Scholar
  25. 25.
    Huang H, Hsu C (2002) Bayesian classification for data from the same unknown class. IEEE Trans Syst Man Cybern Part B Cybern 32(2):137–145CrossRefGoogle Scholar
  26. 26.
    Jantan H, Hamdan AR, Othman ZA (2010) Classification and prediction of academic talent using data mining techniques. In: KES’10: proceedings of the 14th international conference on knowledge-based and intelligent information and engineering systems: part I. Springer, Berlin, pp 491–500Google Scholar
  27. 27.
    Keerthi S, Shevade S, Bhattacharyya C, Murthy K (2001) Improvements to platt’s SMO algorithm for svm classifier design. Neural Comput 13(3):637–649zbMATHCrossRefGoogle Scholar
  28. 28.
    Kotsiantis S, Pintelas P (2005) Predicting students marks in hellenic open university. In: ICALT’05: the 5th international conference on advanced learning technologies, Kaohsiung, Taiwan, pp 664–668Google Scholar
  29. 29.
    Kouchakpour P, Zaknich A, Brunl T (2009) A survey and taxonomy of performance improvement of canonical genetic programming. Knowl Inf Syst 21:1–39. doi: 10.1007/s10115-008-0184-9 CrossRefGoogle Scholar
  30. 30.
    Luengo J, Garca S, Herrera F (2011) On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowl Inf Syst (in press). doi: 10.1007/s10115-011-0424-2
  31. 31.
    Luna J, Romero J, Ventura S (2011) Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules. Knowl Inf Syst (in press). doi: 10.1007/s10115-011-0419-z
  32. 32.
    Majid A, Lee CH, Mahmood M, Choi TS (2011) Impulse noise filtering based on noise-free pixels using genetic programming. Knowl Inf Syst (in press). doi: 10.1007/s10115-011-0456-7
  33. 33.
    Marcano-Cedeo A, Quintanilla-Domnguez J, Andina D (2011) Breast cancer classification applying artificial metaplasticity algorithm. Neurocomputing 74(8):1243–1250CrossRefGoogle Scholar
  34. 34.
    Maron O, Lozano-Pérez T (1997) A framework for multiple-instance learning. In: NIPS’97: proceedings of neural information processing system 10, Denver, Colorado, USA, pp 570–576Google Scholar
  35. 35.
    Martnez D (2001) Predicting student outcomes using discriminant function analysis. In: Annual meeting of the research and planning group, California, USA, pp 163–173Google Scholar
  36. 36.
    Minaei-Bidgoli B, Punch W (2003) Using genetic algorithms for data mining optimization in an educational web-based system. Genet Evol Comput 2:2252–2263Google Scholar
  37. 37.
    Moallem M (2001) Applying constructivist and objectivist learning theories in the design of a web-based course: implications for practice. Educ Technol Soc 4:113–125Google Scholar
  38. 38.
    Nguyen TN, Paul J, Peter H (2007) A comparative analysis of techniques for predicting academic performance. IEEE Xplore, pp 7–12Google Scholar
  39. 39.
    Oommen BJ, Hashem MK (2009) Modeling a student’s behavior in a tutorial-like system using learning automata. IEEE Trans Syst Man Cybern Part B Cybern (in press)Google Scholar
  40. 40.
    Pang J, Huang Q, Jiang S (2008) Multiple instance boost using graph embedding based decision stump for pedestrian detection. In: ECCV’08: proceedings of the 10th European conference on computer vision. Lecture Note in Computer Science, vol 5305. Springer, Berlin, pp 541–552Google Scholar
  41. 41.
    Pao HT, Chuang SC, Xu YY, Fu H (2008) An EM based multiple instance learning method for image classification. Expert Syst Appl 35(3):1468–1472CrossRefGoogle Scholar
  42. 42.
    Pappa G, Freitas A (2009) Evolving rule induction algorithms with multi-objective grammar-based genetic programming. Knowl Inf Syst 19:283–309. doi: 10.1007/s10115-008-0171-1 CrossRefGoogle Scholar
  43. 43.
    Qi X, Han Y (2007) Incorporating multiple svms for automatic image annotation. Pattern Recogn 40(2):728–741zbMATHCrossRefGoogle Scholar
  44. 44.
    Rice WH (2006) Moodle e-learning course development. Pack Publishing, BirminghamGoogle Scholar
  45. 45.
    Romero C, Espejo P, Zafra A, Romero J, Ventura S (2011) Web usage mining for predicting final marks of students that use moodle courses. Comput Appl Eng Educ J (accepted)Google Scholar
  46. 46.
    Romero C, Gonzalez P, Ventura S, del Jesus M, Herrera F (2009) Evolutionary algorithms for subgroup discovery in e-learning: a practical application using moodle data. Expert Syst Appl 36(2):1632–1644CrossRefGoogle Scholar
  47. 47.
    Romero C, Ventura S (2010) Educational data mining: a review of the state-of-the-art. IEEE Trans Syst Man Cybern Part C Appl Rev 40(6):610–618CrossRefGoogle Scholar
  48. 48.
    Shi Y (2010) Multiple criteria optimization-based data mining methods and applications: a systematic survey. Knowl Inf Syst 24:369–391. doi: 10.1007/s10115-009-0268-1 CrossRefGoogle Scholar
  49. 49.
    Sikora M (2011) Induction and pruning of classification rules for prediction of microseismic hazards in coal mines. Expert Syst Appl 38(6):6748–6758MathSciNetCrossRefGoogle Scholar
  50. 50.
    Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45(4):427–437CrossRefGoogle Scholar
  51. 51.
    Superby J, Vandamme J, Meskens N (2006) Determination of factors influencing the achievement of the first-year university students using data mining methods. In: EDM’06: workshop on educational data mining, Hong Kong, China, pp 37–44Google Scholar
  52. 52.
    Ventura S, Romero C, Zafra A, Delgado JA, Hervás C (2007) JCLEC: a java framework for evolutionary computation. Soft Comput 12(4):381–392CrossRefGoogle Scholar
  53. 53.
    Wang H, Wang S (2010) Mining incomplete survey data through classification. Knowl Inf Syst 24:221–233. doi: 10.1007/s10115-009-0245-8 CrossRefGoogle Scholar
  54. 54.
    Wang J, Zucker JD (2000) Solving the multiple-instance problem: a lazy learning approach. In: ICML’00: proceedings of the 17th international conference on machine learning, Standord, CA, USA, pp 1119–1126Google Scholar
  55. 55.
    Whigham PA (1995) Grammatically-based genetic programming. In: Proceedings of the workshop on genetic programming: from theory to real-world applications, Tahoe City, California, USA, pp 33–41Google Scholar
  56. 56.
    Witten I, Frank E (2005) Data Mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San FranciscozbMATHGoogle Scholar
  57. 57.
    Xu X (2003) Statistical learning in multiple instance problems. Ph.D. thesis, Department of Computer Science. University of Waikato, Hamilton, New ZealandGoogle Scholar
  58. 58.
    Xu X, Frank E (2004) Logistic regression and boosting for labeled bags of instances. In: PAKDD’04: proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining, Lecture Notes in Computer Science, vol 3056, Sydney, Australia, pp 272–281Google Scholar
  59. 59.
    Zafra A, Gibaja E, Ventura S (2011) Multi-instance learning with multi-objective genetic programming for web mining. Appl Soft Comput 11(1):93–102CrossRefGoogle Scholar
  60. 60.
    Zafra A, Romero C, Ventura S (2011) Multiple instance learning for classifying students in learning management systems. Expert Syst Appl 38(12):15020–15031CrossRefGoogle Scholar
  61. 61.
    Zafra A, Ventura S (2010) G3P-MI: a genetic programming algorithm for multiple instance learning. Inf Sci 180(23):4496–4513CrossRefGoogle Scholar
  62. 62.
    Zafra A, Ventura S, Romero C, Herrera-Viedma E (2009) Multi-instance genetic programming for web index recommendation. Expert Syst Appl 36:11470–11479CrossRefGoogle Scholar
  63. 63.
    Zhang ML, Zhou ZH (2009) Multi-instance clustering with applications to multi-instance prediction. Appl Intell 31:47–68CrossRefGoogle Scholar
  64. 64.
    Zhang Q, Goldman S (2001) EM-DD: an improved multiple-instance learning technique. In: NIPS’01: proceedings of neural information processing system 14, Vancouver, Canada, pp 1073–1080Google Scholar
  65. 65.
    Zhou ZH, Jiang K, Li M (2005) Multi-instance learning based web mining. Appl Intell 22(2):135–147CrossRefGoogle Scholar
  66. 66.
    Zhou ZH, Zhang ML (2007) Solving multi-instance problems with classifier ensemble based on constructive clustering. Knowl Inf Syst 11(2):155–170CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2012

Authors and Affiliations

  • Amelia Zafra
    • 1
    Email author
  • Cristóbal Romero
    • 1
  • Sebastián Ventura
    • 1
  1. 1.Department of Computer Science and Numerical AnalysisUniversity of CordobaCordobaSpain

Personalised recommendations