Skip to main content

A Survey on Pre-Processing Educational Data

  • Chapter
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 524))

Abstract

Data pre-processing is the first step in any data mining process, being one of the most important but less studied tasks in educational data mining research. Pre-processing allows transforming the available raw educational data into a suitable format ready to be used by a data mining algorithm for solving a specific educational problem. However, most of the authors rarely describe this important step or only provide a few works focused on the pre-processing of data. In order to solve the lack of specific references about this topic, this paper specifically surveys the task of preparing educational data. Firstly, it describes different types of educational environments and the data they provide. Then, it shows the main tasks and issues in the pre-processing of educational data, Moodle data being mainly used in the examples. Next, it describes some general and specific pre-processing tools and finally, some conclusions and future research lines are outlined.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The full name column covers the identification of subjects.

  2. 2.

    The student column covers the identification of subjects.

Abbreviations

AIHS:

Adaptive and intelligent hypermedia system

ARFF:

Attribute-relation File Format

CBE:

Computer-based education

CSV:

Comma-separated values

DM:

Data mining

EDM:

Educational data mining

HTML:

Hypertext Markup language

ID:

Identifier

IP:

Internet Protocol

ITS:

Intelligent tutoring system

KDD:

Knowledge discovery in databases

LMS:

Learning management system

MCQ:

Multiple choice question

MIS:

Management information system

MOOC:

Massive Open Online Course

OLAP:

Online Analytical Processing

SQL:

Structured Query Language

WUM:

Web Usage Mining

WWW:

World Wide Web

XML:

Extensible Markup Language

References

  1. Romero, C., Ventura, S.: Data mining in education. WIREs Data Min. Knowl. Disc. 1(3), 12–27 (2013)

    Article  Google Scholar 

  2. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2006)

    Google Scholar 

  3. Miksovsky, P., Matousek, K., Kouba, Z.: Data Pre-processing support for data mining. In: IEEE International Conference on Systems, Man and Cybernetics, pp. 208–212, Hammamet, Tunisia (2002)

    Google Scholar 

  4. Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  5. Gonçalves, P.M., Barros, R.S.M., Vieria, D.C.L: On the use of data mining tools for data preparation in classification problems. In: 11th International Conference on Computer and Information Science, pp. 173–178, IEEE, Washington (2012)

    Google Scholar 

  6. Bohanec, M., Moyle, S., Wettschereck, D., Miksovsk, P.: A software architecture for data pre-processing using data mining and decision support models. In: ECML/PKDD’01 Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, pp. 13–24 (2001)

    Google Scholar 

  7. Sael, N., Abdelaziz, A., Behja, H.: Investigating and advanced approach to data pre-processing in Moodle platform. Int. Rev. Comput. Softw. 7(3), 977–982 (2012)

    Google Scholar 

  8. Marquardt, C.G., Becker, K., Ruiz, D.D.: A Pre-processing tool for web usage mining in the distance education Domain. In: International Database Engineering and Applications Symposium, pp. 78–87. IEEE Computer Society, Washington (2004)

    Google Scholar 

  9. Wettschereck, D.: Educational data pre-processing. In: ECML’02 Discovery Challenge Workshop, pp. 1–6. University of Helsinki, Helsinki (2002)

    Google Scholar 

  10. Simon, J.: Data preprocessing using a priori knowledge. In: D’Mello, S.K., Calvo, R.A., Olney, A. (eds.) 6th International Conference on Educational Data Mining, pp. 352–353. International Educational Data Mining Society, Memphis (2013)

    Google Scholar 

  11. Rice, W.H.: Moodle E-learning Course Development. A Complete Guide to Successful Learning Using Moodle. Packt publishing, Birmingham (2006)

    Google Scholar 

  12. Ma, Y., Liu, B., Wong, C., Yu, P., Lee, S.: Targeting the right students using data mining. In: Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 457–464. ACM, New York (2000)

    Google Scholar 

  13. Silva, D., Vieira, M.: Using data warehouse and data mining resources for ongoing assessment in distance learning. In: IEEE International Conference on Advanced Learning Technologies, pp. 40–45. IEEE Computer Society, Kazan (2002)

    Google Scholar 

  14. Clow, D.: MOOCs and the funnel of participation. In: Suthers, D., Verbert, K., Duval, E., Ochoa, X. (eds.) International Conference on Learning Analytics and Knowledge, pp. 185–189. ACM New York, NY (2013)

    Google Scholar 

  15. Anderson, J., Corbett, A., Koedinger, K.: Cognitive tutors. J. Learn. Sci. 4(2), 67–207 (1995)

    Article  Google Scholar 

  16. Mostow, J., Beck, J.: Some useful tactics to modify, map and mine data from intelligent tutors. J. Nat. Lang. Eng. 12(2), 95–208 (2006)

    Google Scholar 

  17. Brusilovsky, P., Peylo, C.: Adaptive and intelligent web-based educational systems. Int. J. Artif. Intell. Educ. 13(2–4), 159–172 (2003)

    Google Scholar 

  18. Merceron, A., Yacef, K.: Mining student data captured from a web-based tutoring tool: initial exploration and results. J. Interact. Learn. Res. 15(4), 319–346 (2004)

    Google Scholar 

  19. Brusilovsky, P., Miller, P.: Web-based testing for distance education. In: De Bra, P., Leggett, J. (eds.) WebNet’99, World Conference of the WWW and Internet, pp. 149–154. AACE, Honolulu (1999)

    Google Scholar 

  20. Hanna, M.: Data mining in the e-learning domain. Campus-Wide Inf. Syst. 21(1), 29–34 (2004)

    Article  MathSciNet  Google Scholar 

  21. Romero, C., Ventura, S., Salcines, E.: Data mining in course management systems: moodle case study and tutorial. Comput. Educ. 51(1), 368–384 (2008)

    Article  Google Scholar 

  22. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Eleventh International Conference on Data Engineering, pp. 3–4. IEEE, Washington (1995)

    Google Scholar 

  23. Romero, C., Ventura, S., Zafra, A., De Bra, P.: Applying web usage mining for personalizing hyperlinks in web-based adaptive educational systems. Comput. Educ. 53(3), 828–840 (2009)

    Article  Google Scholar 

  24. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  25. Dringus, L.P., Ellis, T.: Using data mining as a strategy for assessing asynchronous discussion forums. Comput. Educ. J. 45(1), 141–160 (2005)

    Article  Google Scholar 

  26. Petrushin, V., Khan, L. (eds.): Multimedia Data Mining and Knowledge Discovery. Springer, London (2007)

    MATH  Google Scholar 

  27. Bari, M., Lavoie, B.: Predicting interactive properties by mining educational multimedia presentations. In: International Conference on Information and Communications Technology, pp. 231–234. Bangladesh University of Engineering and Technology, Dhaka (2007)

    Google Scholar 

  28. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.: Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor. 1(2), 12–23 (2000)

    Article  Google Scholar 

  29. Romero, C., Ventura, S.: Educational data mining: a survey from 1995 to 2005. Expert Syst. Appl. 33(1), 135–146 (2007)

    Article  Google Scholar 

  30. Vranic, M., Pintar, D., Skocir, Z.: The use of data mining in education environment. In: 9th International Conference on Telecommunications, pp. 243–250. IEEE, Zagreb (2007)

    Google Scholar 

  31. Gibert, K., Izquierdo, J., Holmes, G., Athanasiadis, I., Comas, J., Sanchez, M.: On the role of pre and post processing in environmental data mining. In: Sánchez-Marré, M., Béjar, J., Comas, J., Rizzoli, A. E., Guariso, G. (eds.) iEMSs Fourth Biennial Meeting: International Congress on Environmental Modelling and Software (iEMSs 2008), pp. 1937–1958. International Environmental Modelling and Software Society, Barcelona (2008)

    Google Scholar 

  32. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  33. Zhu, F., Ip, H., Fok, A., Cao, J.: PeRES: A Personalized Recommendation Education System Based on Multi-Agents & SCORM. In: Leung, H., Li, F., Lau, R., Li, Q. (eds.) Advances in Web Based Learning—ICWL 2007. LNCS, vol. 4823, pp. 31–42. Springer, Heidelberg (2007)

    Google Scholar 

  34. Avouris, N., Komis, V., Fiotakis, G., Margaritis, M., Voyiatzaki, E.: Why logging of fingertip actions is not enough for analysis of learning activities. In: Workshop on Usage Analysis in Learning Systems, pp. 1–8. AIED Conference, Amsterdam (2005)

    Google Scholar 

  35. Chanchary, F.H., Haque, I., Khalid, M.S.: Web usage mining to evaluate the transfer of learning in a web-based learning environment. In: International Workshop on Knowledge Discovery and Data Mining, pp. 249–253. IEEE, Washington (2008)

    Google Scholar 

  36. Spacco, J., Winters, T., Payne, T.: Inferring use cases from unit testing. In: AAAI Workshop on Educational Data Mining, pp. 1–7, AAAI Press, New York (2006)

    Google Scholar 

  37. Zhang, L, Liu, X., Liu, X.: Personalized instructing recommendation system based on web mining. In: International Conference for Young Computer Scientists, pp. 2517–2521. IEEE Computer Society Washington (2008)

    Google Scholar 

  38. Barnes, T.: The Q-matrix method: mining student response data for knowledge. In: AAAI-2005 Workshop on Educational Data Mining, pp. 1–8, AAAI Press, Pittsburgh (2005)

    Google Scholar 

  39. Chen, C., Chen, M., Li, Y.: Mining key formative assessment rules based on learner profiles for web-based learning systems. In: Spector, J.M., Sampson D.G., Okamoto, T., Kinshuk, Cerri, S.A., Ueno, M., Kashihara, A. (eds.) IEEE International Conference on Advanced Learning Technologies, pp. 1–5. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  40. Wang, F.H.: A fuzzy neural network for item sequencing in personalized cognitive scaffolding with adaptive formative assessment. Expert Syst. Appl. J. 27(1), 11–25 (2004)

    Article  Google Scholar 

  41. Markham, S., Ceddia, J., Sheard, J., Burvill, C., Weir, J., Field, B.: Applying agent technology to evaluation tasks in e-learning environments. In: International Conference of the Exploring Educational Technologies, pp. 1–7. Monash University, Melbourne (2003)

    Google Scholar 

  42. Medvedeva, O., Chavan, G., Crowley, R.: A data collection framework for capturing its data based on an agent communication standard. In: 20th Annual Meeting of the American Association for Artificial Intelligence, pp. 23–30, AAAI, Pittsburgh (2005)

    Google Scholar 

  43. Shen, R., Han, P., Yang, F., Yang, Q., Huang, J.: Data mining and case-based reasoning for distance learning. J. Distance Educ. Technol. 1(3), 46–58 (2003)

    Article  Google Scholar 

  44. Lenzerini, M.: Data integration: a theoretical perspective. In: International Conference on ACM SIGMOD/PODS, pp. 233–246. ACM, New York (2002)

    Google Scholar 

  45. Ingram, A.: Using web server logs in evaluating instructional web sites. J. Educ. Technol. Syst. 28(2), 137–157 (1999)

    Article  Google Scholar 

  46. Peled, A., Rashty, D.: Logging for success: advancing the use of WWW logs to improve computer mediated distance learning. J. Educ. Comput. Res. 21(4), 413–431 (1999)

    Google Scholar 

  47. Talavera, L., Gaudioso, E.: Mining student data to characterize similar behavior groups in unstructured collaboration spaces. In: Workshop on Artificial Intelligence in CSCL, pp. 17–23. Valencia (2004)

    Google Scholar 

  48. Romero, C., Ventura, S., Bra, P.D.: Knowledge discovery with genetic programming for providing feedback to courseware author. User modeling and user-adapted interaction. J. Personalization Res. 14(5), 425–464 (2004)

    Google Scholar 

  49. Mostow, J., Beck, J.E.: Why, what, and how to log? Lessons from LISTEN. In: Barnes, T., Desmarais, M., Romero, R., Ventura, S. (eds.) 2nd International Conference on Educational Data Mining, pp. 269–278. International Educational Data Mining Society, Cordoba (2009)

    Google Scholar 

  50. Binli, S.: Research on data-preprocessing for construction of university information systems. In: International Conference on Computer Application and System Modeling, pp. 459–462. IEEE, Taiyuan (2010)

    Google Scholar 

  51. Dierenfeld, H., Merceron, A.: Learning analytics with excel pivot tables. In: Moodle Research Conference, pp. 115–121. University of Piraeus, Heraklion (2012)

    Google Scholar 

  52. Solodovnikova, D., Niedrite, L.: Using data warehouse resources for assessment of e-earning influence on university processes. In: Eder, J., Haav, H.M., Kalja, A., Penjam, J. (eds.) 9th East European Conference, ADBIS 2005. Advances in Databases and Information Systems. LNCS, vol. 3631, pp. 233-248. Springer, Heidelberg (2005)

    Google Scholar 

  53. Merceron, A., Yacef, K.: Directions to Enhance Learning Management Systems for Better Data Mining. Personal Communication (2010)

    Google Scholar 

  54. Yan, S., Li, Z.: Commercial decision system based on data warehouse and OLAP. Microelectron. Comput. 2, 64–67 (2006)

    Google Scholar 

  55. Zorrilla, M.E., Menasalvas, E., Marin, D., Mora, E., Segovia, J.: Web usage mining project for improving web-based learning sites. In: Moreno-Díaz, R., Pichler, F., Quesada-Arencibia, A. (eds.) Computer Aided Systems Theory—EUROCAST 2005. LNCS, vol. 3643, pp. 205–210. Springer, Heidelberg (2005)

    Google Scholar 

  56. Yin, C., Luo, Q.: Personality mining system in e-learning by using improved association rules. In: International Conference on Machine Learning and Cybernetics, pp. 4130–4134. IEEE, Hong Kong (2007)

    Google Scholar 

  57. Heiner, C., Beck, J.E., Mostow, J.: Lessons on using ITS data to answer educational research questions. In: Lester, J.C., Vicari, R.S., Paraguaçu, F. (eds.) Intelligent Tutoring Systems, 7th International Conference, ITS 2004. LNCS, vol. 3220, pp. 1–9. Springer, Heidelberg (2004)

    Google Scholar 

  58. Rubin, D.B., Little, R.J.A.: Statistical Analysis with Missing Data. Wiley, New York (2002)

    MATH  Google Scholar 

  59. Salmeron-Majadas, S., Santos, O., Boticario, J.G., Cabestrero, R., Quiros, P.: Gathering emotional data from multiple sources. In: D’Mello, S.K., Calvo, R.A., Olney, A. (eds.) 6th International Conference on Educational Data Mining, pp. 404–405. International Educational Data Mining Society, Memphis (2013)

    Google Scholar 

  60. Shuangcheng, L., Ping, W.: Study on the data preprocessing of the questionarie based on the combined classification data mining model. In: International Conference on e-Learning, Enterprise Information Systems and E-Goverment, pp. 217–220. Las Vegas (2009)

    Google Scholar 

  61. García, E., Romero, C., Ventura, S., Castro, C.: An architecture for making recommendations to courseware authors using association rule mining and collaborative filtering. User Model. User-Adap. Inter. 19(1–2), 99–132 (2009)

    Article  Google Scholar 

  62. Huang, C., Lin, W., Wang, S., Wang, W.: Planning of educational training courses by data mining: using China Motor Corporation as an example. Expert Syst. Appl. J. 36(3), 7199–7209 (2009)

    Article  Google Scholar 

  63. Kantardzic, M.: Data Mining: Concepts, Models, Methods, and Algorithms. Wiley, New York (2003)

    Google Scholar 

  64. Beck, J.E.: Using learning decomposition to analyze student fluency development. In: Workshop on Educational Data Mining at the 8th International Conference on Intelligent Tutoring Systems, pp. 21–28. Jhongli (2006)

    Google Scholar 

  65. Redpath, R., Sheard, J.: Domain knowledge to support understanding and treatment of outliers. In: International Conference on Information and Automation, pp. 398–403. IEEE, Colombo (2005)

    Google Scholar 

  66. Sunita, S.B., Lobo, L.M.: Data preparation strategy in e-learning system using association rule algorithm. Int. J. Comput. Appl. 41(3), 35–40 (2012)

    Google Scholar 

  67. Ivancsy, R., Juhasz, S.: Analysis of web user identification methods. World Acad. Sci. Eng. Technol. J. 34, 338–345 (2007)

    Google Scholar 

  68. Rahkila, M., Karjalainen, M.: Evaluation of learning in computer based education using log systems. In: ASEE/IEEE Frontiers in Education Conference, pp. 16–21. IEEE, San Juan (1999)

    Google Scholar 

  69. Wang, F.H.: Content recommendation based on education-contextualized browsing events for web-based personalized learning. Educ. Technol. Soc. 11(4), 94–112 (2008)

    Google Scholar 

  70. Munk, M., Drlík, M.: Impact of Different pre-processing tasks on effective identification of users’ behavioral patterns in web-based educational system. Procedia Comput. Sci. 4, 1640–1649 (2011)

    Article  Google Scholar 

  71. Heraud, J.M., France, L., Mille, A.: Pixed: an ITS that guides students with the help of learners’ interaction log. In: Lester, J.C., Vicari, R.S., Paraguaçu, F. (eds.) Intelligent Tutoring Systems, 7th International Conference, ITS 2004. LNCS, vol. 3220, pp. 57–64. Springer, Heidelberg (2004)

    Google Scholar 

  72. Sheard, J., Ceddia, J., Hurst, J., Tuovinen, J.: Inferring student learning behaviour from website interactions: a usage analysis. J. Educ. Inf. Technol. 8(3), 245–266 (2003)

    Article  Google Scholar 

  73. Petersen, R.J.: Policy dimensions of analytics in higher education. Educause Rev. 47, 44–49 (2012)

    Google Scholar 

  74. Bienkowski, M., Feng, M., Means, B.: Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief. U.S. Department of Education, Office of Educational Technology, pp. 1–57 (2012)

    Google Scholar 

  75. Liu, H., Motoda, H.: Computational Methods of Feature Selection. Chapman & Hall/CRC, Boca Raton (2007)

    MATH  Google Scholar 

  76. Delavari, N., Phon-Amnuaisuk, S., Beikzadeh, M.: Data mining application in higher learning institutions. Inf. Educ. J. 7(1), 31–54 (2008)

    Google Scholar 

  77. Kotsiantis, B., Kanellopoulos, D., Pintelas, P.: Data pre-processing for supervised learning. Int. J. Comput. Sci. 1(2), 111–117 (2006)

    Google Scholar 

  78. Mihaescu, C., Burdescu, D.: Testing attribute selection algorithms for classification performance on real data. In: International IEEE Conference Intelligent Systems, pp. 581–586. IEEE, London (2006)

    Google Scholar 

  79. Márquez-Vera, C., Cano, A., Romero, C., Ventura, S.: Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl. Intell. 38(3), 315–330 (2013)

    Article  Google Scholar 

  80. Wong, S.K., Nguyen, T.T., Chang, E., Jayaratnal, N.: Usability metrics for e-learning. In: Meersman, R., Tari, Z. (eds.) On the Move to Meaningful Internet Systems 2003: OTM 2003 Workshops, LNCS, vol. 2889, pp. 235–252. Springer, Heidelberg (2003)

    Google Scholar 

  81. Hershkovitz, A. Nachmias, R.: Consistency of students’ pace in online learning. In: Barnes, T., Desmarais, M., Romero, R., Ventura, S. (eds.) 2nd International Conference on Educational Data Mining, pp. 71–80. International Educational Data Mining Society, Cordoba (2009)

    Google Scholar 

  82. Mor, E., Minguillón, J.: E-learning personalization based on itineraries and long-term navigational behavior. In: Thirteenth World Wide Web Conference, pp. 264–265. ACM, New York (2004)

    Google Scholar 

  83. Nilakant, K., Mitrovic, A.: Application of data mining in constraint based intelligent tutoring systems. In: International Conference on Artificial Intelligence in Education, pp. 896–898. Amsterdam (2005)

    Google Scholar 

  84. Baker, R., Carvalho, M.: A labeling student behavior faster and more precisely with text replays. In: Baker, R.S.J.d, Barnes, T., Beck, J.E. (eds.) 1st International Conference on Educational Data Mining, pp. 38–47. International Educational Data Mining Society, Montreal (2008)

    Google Scholar 

  85. Zhou, M., Xu, Y., Nesbit., J.C., Winne, P.H.: Sequential pattern analysis of learning logs: methodology and applications. In: Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S. J.D. (eds.) Handbook of Educational Data Mining, Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, pp. 107–120. CRC Press, Boca Raton (2010)

    Google Scholar 

  86. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, Heidelberg (2011)

    Book  Google Scholar 

  87. Thai, D., Wu, H., Li, P.: A hybrid system: neural network with data mining in an e-learning environment. In: Jain, L., Howlett, R.J., Apolloni, B. (eds.) Knowledge-Based Intelligent Information and Engineering Systems, 11th International Conference, KES 2007, XVII Italian Workshop on Neural Networks. LNCS, vol. 4693, pp. 42–49. Springer, Heidelberg (2007)

    Google Scholar 

  88. Hien, N.T.N., Haddawy, P.: A decision support system for evaluating international student applications. In: Frontiers in Education Conference, pp. 1–6. IEEE, Piscataway (2007)

    Google Scholar 

  89. Kosheleva, O., Kreinovich, V., Longrpre, L.: Towards interval techniques for processing educational data. In: International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics, pp. 1–28. IEEE Computer Society, Washington (2006)

    Google Scholar 

  90. Hämäläinen, W., Vinni, M.: Classifiers for educational data mining. In: Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (eds.) Handbook of Educational Data Mining, Chapman & Hall/CRC Data Mining and Knowledge Discovery Series, pp. 57–71. CRC Press, Boca Raton (2010)

    Google Scholar 

  91. Cocea, M., Weibelzahl, S.: Can log files analysis estimate learners’ level of motivation? In: Workshop week Lernen—Wissensentdeckung—Adaptivität, pp. 32–35. Hildesheim (2006)

    Google Scholar 

  92. Tanimoto, S.L.: Improving the prospects for educational data mining. In: Track on Educational Data Mining, at the Workshop on Data Mining for User Modeling, at the 11th International Conference on User Modeling, pp. 1–6. User Modeling Inc., Corfu (2007)

    Google Scholar 

  93. Werner, L., McDowell, C., Denner, J.: A first step in learning analytics: pre-processing low-level Alice logging data of middle school students. J. Educ. Data Min. (2013, in press)

    Google Scholar 

  94. Alcalá, J., Sanchez, L., García, S., Del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft. Comput. 13(3), 307–318 (2009)

    Article  Google Scholar 

  95. Gonçalves, P.M., Barros, R.S.M.: Automating data preprocessing with DMPML and KDDML. In: 10th IEEE/ACIS International Conference on Computer and Information Science, pp. 97–103. IEEE, Washington (2011)

    Google Scholar 

  96. Zaïne, O.R., Luo, J.: Towards evaluating learners’ behaviour in a web-based distance learning environment. In: IEEE International Conference on Advanced Learning Technologies, pp. 357–360. Madison, WI (2001)

    Google Scholar 

  97. Ceddia, J., Sheard, J., Tibbery, G.: WAT: a tool for classifying learning activities from a log file. In: Ninth Australasian Computing Education Conference, pp. 11–17. Australian Computer Society, Darlinghurst (2007)

    Google Scholar 

  98. Rodrigo, M.T., Baker, R., McLaren, B.M., Jayme, A., Dy, T. : Development of a workbench to address the educational data mining bottleneck. In: Yacef, K., Zaïane, O., Hershkovitz, A., Yudelson, M., Stamper, J. (eds.) 5th International Conference on Educational Data Mining, pp. 152–155. International Educational Data Mining Society, Chania (2012)

    Google Scholar 

  99. Koedinger, K., Cunningham, K., Skogsholm, A., LEBER, B.: An open repository and analysis tools for fine-grained, longitudinal learner data. In: Baker, R.S.J.d, Barnes, T., Beck, J.E. (eds.) 1st International Conference on Educational Data Mining, pp. 157–166. International Educational Data Mining Society, Montreal (2008)

    Google Scholar 

Download references

Acknowledgments

This research is supported by projects of the Regional Government of Andalucía and the Ministry of Science and Technology, P08-TIC-3720 and TIN-2011-22408, respectively, and FEDER funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristóbal Romero .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Romero, C., Romero, J.R., Ventura, S. (2014). A Survey on Pre-Processing Educational Data. In: Peña-Ayala, A. (eds) Educational Data Mining. Studies in Computational Intelligence, vol 524. Springer, Cham. https://doi.org/10.1007/978-3-319-02738-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02738-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02737-1

  • Online ISBN: 978-3-319-02738-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics