Speeding Up Statistical Tests to Detect Recurring Concept Drifts

  • Paulo Mauricio Gonçalves Júnior
  • Roberto Souto Maior de Barros
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 493)

Abstract

RCD is a framework for dealing with recurring concept drifts. It reuses previously stored classifiers that were trained on examples similar to actual data, through the use of multivariate non-parametric statistical tests. The original proposal performed statistical tests sequentially. This paper improves RCD to perform the statistical tests in parallel by the use of a thread pool and presents how parallelism impacts performance. Results show that using parallel execution can considerably improve the evaluation time when compared to the corresponding sequential execution in environments where many concept drifts occur.

Keywords

Data streams recurring concept drifts multivariate non-parametric statistical tests parallelism 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baena-García, M., Del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early drift detection method. In: International Workshop on Knowledge Discovery from Data Streams, IWKDDS 2006, pp. 77–86 (2006), http://eprints.pascal-network.org/archive/00002509/
  2. 2.
    Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive online analysis. J. of Mach. Learn. Res. 11, 1601–1604 (2010), http://portal.acm.org/citation.cfm?id=1859890.1859903 Google Scholar
  3. 3.
    Bifet, A., Holmes, G., Pfahringer, B., Frank, E.: Fast perceptron decision tree learning from evolving data streams. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part II. LNCS (LNAI), vol. 6119, pp. 299–310. Springer, Heidelberg (2010), http://dx.doi.org/10.1007/978-3-642-13672-6_30 CrossRefGoogle Scholar
  4. 4.
    Blackard, J.A., Dean, D.J.: Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput. and Electron. in Agric. 24(3), 131–151 (1999), http://dx.doi.org/10.1016/S0168-16999900046-0 Google Scholar
  5. 5.
    Brzeziński, D., Stefanowski, J.: Accuracy updated ensemble for data streams with concept drift. In: Corchado, E., Kurzyński, M., Woźniak, M. (eds.) HAIS 2011, Part II. LNCS, vol. 6679, pp. 155–163. Springer, Heidelberg (2011), http://dx.doi.org/10.1007/978-3-642-21222-2_19 CrossRefGoogle Scholar
  6. 6.
    Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. Knowl.-Based Syst. 18(4-5), 187–195 (2005), http://dx.doi.org/10.1016/j.knosys.2004.10.002; AI-2004, Cambridge, England, December 13-15 (2004)
  7. 7.
    Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, New York, NY, USA, pp. 71–80 (2000), http://dx.doi.org/10.1145/347090.347107
  8. 8.
    Elwell, R., Polikar, R.: Incremental learning of concept drift in nonstationary environments. IEEE Trans. on Neural Netw. 22(10), 1517–1531 (2011), http://dx.doi.org/10.1109/TNN.2011.2160459 CrossRefGoogle Scholar
  9. 9.
    Ferrer-Troyano, F., Aguilar-Ruiz, J.S., Riquelme, J.C.: Discovering decision rules from numerical data streams. In: Proceedings of the 2004 ACM Symposium on Applied Computing, SAC 2004, New York, NY, USA, pp. 649–653 (2004), http://dx.doi.org/10.1145/967900.968036
  10. 10.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004), http://dx.doi.org/10.1007/978-3-540-28645-5_29 CrossRefGoogle Scholar
  11. 11.
    Gama, J., Medas, P., Rocha, R.: Forest trees for on-line data. In: Proceedings of the 2004 ACM Symposium on Applied Computing, SAC 2004, New York, NY, USA, pp. 632–636 (2004), http://dx.doi.org/10.1145/967900.968033
  12. 12.
    Gama, J., Medas, P., Rodrigues, P.: Learning decision trees from dynamic data streams. In: Proceedings of the 2005 ACM Symposium on Applied Computing, SAC 2005, New York, NY, USA, pp. 573–577 (2005), http://dx.doi.org/10.1145/1066677.1066809
  13. 13.
    Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, New York, NY, USA, pp. 523–528 (2003), http://dx.doi.org/10.1145/956750.956813
  14. 14.
    Gonçalves Jr., P.M., Barros, R.S.M.: A comparison on how statistical tests deal with concept drifts. In: Arabnia, H.R., et al. (eds.) Proceedings of the 2012 International Conference on Artificial Intelligence, ICAI 2012, vol. 2, pp. 832–838. CSREA Press, Las Vegas (2012)Google Scholar
  15. 15.
    Gonçalves Jr., P.M., Barros, R.S.M.: RCD: A recurring concept drift framework. Pattern Recognit. Lett. (to appear, 2013), http://dx.doi.org/10.1016/j.patrec.2013.02.005
  16. 16.
    Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, New York, NY, USA, pp. 97–106 (2001), http://dx.doi.org/10.1145/502512.502529
  17. 17.
    Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. J. of Mach. Learn. Res. 8, 2755–2790 (2007), http://dl.acm.org/citation.cfm?id=1314498.1390333 MATHGoogle Scholar
  18. 18.
    Lane, T., Brodley, C.E.: Approaches to online learning and concept drift for user identification in computer security. In: Agrawal, R., Stolorz, P. (eds.) Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, KDD 1998, pp. 259–263. AAAI Press, Menlo Park (1998), http://www.aaai.org/Papers/KDD/1998/KDD98-045.pdf Google Scholar
  19. 19.
    Roberts, S.W.: Control chart tests based on geometric moving averages. Technometrics 1(3), 239–250 (1959), http://www.jstor.org/stable/1266443 CrossRefGoogle Scholar
  20. 20.
    Ross, G.J., Adams, N.M., Tasoulis, D.K., Hand, D.J.: Exponentially weighted moving average charts for detecting concept drift. Pattern Recognit. Lett. 33(2), 191–198 (2012), http://dx.doi.org/10.1016/j.patrec.2011.08.019 CrossRefGoogle Scholar
  21. 21.
    Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, New York, NY, USA, pp. 377–382 (2001), http://dx.doi.org/10.1145/502512.502568
  22. 22.
    Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, New York, NY, USA, pp. 226–235 (2003), http://dx.doi.org/10.1145/956750.956778
  23. 23.
    Wang, S., Schlobach, S., Klein, M.: Concept drift and how to identify it. Web Semant.: Sci., Serv. and Agents on the World Wide Web 9(3), 247–265 (2011), http://dx.doi.org/10.1016/j.websem.2011.05.003 CrossRefGoogle Scholar
  24. 24.
    Wu, D., Wang, K., He, T., Ren, J.: A dynamic weighted ensemble to cope with concept drifting classification. In: The 9th International Conference for Young Computer Scientists, ICYCS 2008, pp. 1854–1859 (2008), http://dx.doi.org/10.1109/ICYCS.2008.491
  25. 25.
    Yeh, A.B., Mcgrath, R.N., Sembower, M.A., Shen, Q.: Ewma control charts for monitoring high-yield processes based on non-transformed observations. International Journal of Production Research 46(20), 5679–5699 (2008), http://dx.doi.org/10.1080/00207540601182252 MATHCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2013

Authors and Affiliations

  • Paulo Mauricio Gonçalves Júnior
    • 1
  • Roberto Souto Maior de Barros
    • 2
  1. 1.Instituto Federal de Educação, Ciência e Tecnologia de PernambucoCidade UniversitáriaRecifeBrasil
  2. 2.Centro de InformáticaUniversidade Federal de PernambucoRecifeBrasil

Personalised recommendations