Advertisement

Studying Weariness Prediction Using SMOTE and Random Forests

  • Yu Weng
  • Fengming Deng
  • Guosheng Yang
  • Liandong Chen
  • Jie Yuan
  • Xinkai Gui
  • Jue Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11344)

Abstract

This article is aimed at the low accuracy of student weariness prediction in education and the poor prediction effect of traditional prediction models. It was established the SMOTE (Synthetic Minority Oversampling Technique) algorithm and random forest prediction models. This study puts forward to useing the SMOTE oversampling method to balance the data set and then use the random forest algorithm to train the classifier. By comparing the common single classifier with the ensemble learning classifier, it was found that the SMOTE and Random forest method performed more prominently, and the reasons for the increase in the AUC value after using the SMOTE method were analyzed. Using Massive Open Online Course (Mooc) synthesis student’s datasets, which mainly include the length of class, whether the mouse has moved, whether there is a job submitted, whether there is participating in discussions and completing the accuracy of the assignments. It is proved that this method can significantly improve the classification effect of classifiers, so teachers can choose appropriate teaching and teaching interventions to improve student’s learning outcomes.

Keywords

Education SMOTE Random forest 

Notes

Acknowledgments

This work was partly supported by the National Key R&D Program of China (No. 2017YFB0203102), the State Key Program of National Natural Science Foundation of China (No. 91530324).

References

  1. 1.
    Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. J. 6(5), 429–450 (2012)CrossRefGoogle Scholar
  2. 2.
    Batista, G.E.A.P.A., Prati, P.C., Monard, M.C.: A study of the behavior ofseveral methods for balancing machine learning training data. SIGKDD Explor. 6(1), 20–29 (2014)CrossRefGoogle Scholar
  3. 3.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2010)Google Scholar
  4. 4.
    Parker, A.: A study of variables that predict dropoutfrom distance education. Int. J. Educ. Technol. 1(2), 1–11 (1999)MathSciNetGoogle Scholar
  5. 5.
    He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2015)Google Scholar
  6. 6.
    Japkowicz, N.: Learning from imbalanced data sets: a comparison of various strategies. In: AAAI Workshop on Learning from Imbalanced Data Sets, vol. 68, pp. 10–15 (2010)Google Scholar
  7. 7.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., et al.: SMOTE:synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)CrossRefGoogle Scholar
  8. 8.
    Loughlin, W.A., Tyndall, J.D., Glenn, M.P., et al.: Update 1 of: beta-strand mimetics. Chem. Rev. 110(6), 2017Google Scholar
  9. 9.
    Luengo, J., Fernndez, A., Garcia, S., et al.: Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling. Soft. Comput. 15(10), 1909–1936 (2011)CrossRefGoogle Scholar
  10. 10.
    Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-SMOTE: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 475–482. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-01307-2_43CrossRefGoogle Scholar
  11. 11.
    Ramentol, E., Caballero, Y., Bello, R., et al.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33 (2), 245–265 (2012)Google Scholar
  12. 12.
    Cutler, A., Cutler, D.R., Stevens, J.R.: Random forests. Mach. Learn. 45(1), 157–176 (2004)Google Scholar
  13. 13.
    Kandaswamy, K.K., Chou, K.C., Martinetz, T., et al.: AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties. J. Theoret. Biol. 271(1), 56–62 (2011)CrossRefGoogle Scholar
  14. 14.
    Chuanke, X., Chen, Y., Zhao, Y.: Prediction of protein-protein interaction based on improved pseudo amino acid composition. J. Shandong Univ.: Nat. Sci. 44(9), 17–21 (2016)Google Scholar
  15. 15.
    Kotsiantis, S.: Educational data mining: a case studyfor predicting dropout-prone students. Int. J. Knowl. Eng. Soft Data Parad. 1(2), 101–111 (2009)CrossRefGoogle Scholar
  16. 16.
    Heinz, S., Zobel, J., Williams, H.E.: BurstTries: a fast, efficient data structure for string keys. ACM Trans. Inf. Syst. 20(2), 192–223 (2012)CrossRefGoogle Scholar
  17. 17.
    Groot, S., Kitsuregawa, M.: Jumbo: Beyond MapReduce for workload balancing. In: 36th International Conference on Very Large Data Bases (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Yu Weng
    • 1
    • 2
  • Fengming Deng
    • 1
    • 2
  • Guosheng Yang
    • 1
    • 2
  • Liandong Chen
    • 3
  • Jie Yuan
    • 1
    • 2
  • Xinkai Gui
    • 1
    • 2
  • Jue Wang
    • 4
  1. 1.School of Information EngineeringMinzu University of ChinaBeijingChina
  2. 2.National & Local Joint Engineering Lab for Big Data Analysis and Computing TechnologyBeijingChina
  3. 3.State Grid Hebei Electric Power CompanyShijiazhuangChina
  4. 4.Computer Network Information CenterChinese Academy of SciencesBeijingChina

Personalised recommendations