Journal of Intelligent Information Systems

, Volume 39, Issue 2, pp 491–511 | Cite as

A new method of mining data streams using harmony search

Article

Abstract

Incremental learning has been used extensively for data stream classification. Most attention on the data stream classification paid on non-evolutionary methods. In this paper, we introduce new incremental learning algorithms based on harmony search. We first propose a new classification algorithm for the classification of batch data called harmony-based classifier and then give its incremental version for classification of data streams called incremental harmony-based classifier. Finally, we improve it to reduce its computational overhead in absence of drifts and increase its robustness in presence of noise. This improved version is called improved incremental harmony-based classifier. The proposed methods are evaluated on some real world and synthetic data sets. Experimental results show that the proposed batch classifier outperforms some batch classifiers and also the proposed incremental methods can effectively address the issues usually encountered in the data stream environments. Improved incremental harmony-based classifier has significantly better speed and accuracy on capturing concept drifts than the non-incremental harmony based method and its accuracy is comparable to non-evolutionary algorithms. The experimental results also show the robustness of improved incremental harmony-based classifier.

Keywords

Data stream Classification Concept drift Harmony search 

References

  1. Beyer, H., & Schwefel, H. (2002). Evolution strategies: A comprehensive introduction. Natural Computing, 1, 3–52.MathSciNetMATHCrossRefGoogle Scholar
  2. Bifet, A., & Gavaldà, R. (2009a). Adaptive parameter-free learning from evolving data streams. In IDA.Google Scholar
  3. Bifet A., & Gavaldà, R. (2009b). Adaptive XML tree classification on evolving data streams. In Proc. of European conference on machine learning and knowledge discovery in databases, ECML/PKDD.Google Scholar
  4. Cunningham, P., Nowlan, N., Delany, S. J., & Haahr, M. (2003). A case-based approach to spam filtering that can track concept drift. Technical Report TCD-CS-2003-16, Ireland, Trinity College Dublin.Google Scholar
  5. EGEE: Enabling Grids for E-science in Europe. http://www.euegee.org. Accessed October 2011.
  6. Fan, W. (2004a). StreamMiner: A classifier ensemble-based engine to mine concept-drifting data streams. In Proc. of 2004 international conference on Very Large Data Bases (VLDB’2004) (Vol. 30, pp. 1257–1260). Toronto, Canada.Google Scholar
  7. Fan, W. (2004b). Systematic data selection to mine concept-drifting data stream. In Proc. of ACM SIGKDD (pp. 128–137). Seattle, Washington USA.Google Scholar
  8. Fogel, L. (1994). Evolutionary programming in perspective: The top-down view. In: J. M. Zurada, R. Marks II, C. Robinson (Eds.), Computational intelligence: Imitating life (pp. 135–146). Piscataway: IEEE Press.Google Scholar
  9. Gama, J., Medas, P., & Rocha, R. (2004). Forest trees for on-line data. In Proc. ACM symp. applied computing (SAC’04) (pp. 632–636).Google Scholar
  10. Geem, Z. W., Kim, J. H., & Loganathan, G. V. (2002). A new heuristic optimization algorithm: Harmony search. Simulation, 76(2), 60–68.CrossRefGoogle Scholar
  11. Geem, Z. W., Tseng, C., & Park, Y. (2005). Harmony search for generalized orienteering problem: Best touring in China, Springer. Lecture Notes in Computer Science, 3412, 741–750.CrossRefGoogle Scholar
  12. Goldberg, D. (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley.Google Scholar
  13. Guan, S. U., & Zhucollard, F. (2005). An incremental approach to genetic-algorithm-based classification. IEEE Transactions on Systems, Man and Cybernetics, Part B–Cybernetics, 35(2), 227–239.CrossRefGoogle Scholar
  14. Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (2nd Edn.). Morgan Kaufmann Publisher.Google Scholar
  15. Hashemi, S., Yang, Y., Mirzamomen, Z., & Kangavari, M. (2009). Adapted one-versus-all decision trees for data stream classification. IEEE Transactions on Knowledge and Data Engineering, 21(5), 624–637.CrossRefGoogle Scholar
  16. Hettich, S., & Bay, S. D. (2010). The UCI KDD Archive. University of California, Department of Information and Computer Science, Irvine, CA. http://www.kdd.ics.uci.edu.
  17. Holland, J. H. (1986). Escaping brittleness: The possibilities of general purpose learning algorithms applied to parallel rule-based systems. In Machine learning: An artificial intelligence approach (Vol. II, pp. 593–623). Morgan Kaufmann.Google Scholar
  18. Hulten, G., Spencer, L., & Domingos, P. (2001). Mining time-changing data streams. In F. Provost (Ed.), Knowledge discovery and data mining (pp. 97–106). AAAI Press.Google Scholar
  19. Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, 8(3), 281–300.Google Scholar
  20. Kolter, J. Z., & Maloof, M. A. (2003). Dynamic weighted majority: A new ensemble method for tracking concept drift. In Proc. Of the 3rd IEEE int. conf. on data mining ICDM-2003 (pp. 123–130). IEEE CS Press: Los Alamitos, CA.CrossRefGoogle Scholar
  21. Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge: MIT Press.MATHGoogle Scholar
  22. Koza, J., & Poli, R. (2005). Genetic programming. In E. Burke & G. Kendall (Eds.), Introductory tutorials in optimization, decision support and search methodology (Chapter 5, pp. 127–164). Kluwer Press.Google Scholar
  23. Lee, K. S., & Geem, Z. W (2004). A new meta-heuristic algorithm for continues engineering optimization: Harmony search theory and practice. Computer Methods in Applied Mechanics and Engineering, 194, 3902–3933.CrossRefGoogle Scholar
  24. Liu, J., Li, X., & Zhong, W. (2009). Ambiguous decision trees for mining concept-drifting data streams. Pattern Recognition Letters, 30, 1347–1355.CrossRefGoogle Scholar
  25. Mahdavi, M., Fesanghary M., & Damangir, E. (2007). An improved harmony search algorithm for solving optimization problems. Applied Mathematics and Computation, 188, 1567–1579.MathSciNetMATHCrossRefGoogle Scholar
  26. Mukhopadhyay, A., Roy, A., Das, S., & Abraham, A. (2008). Population-variance and explorative power of harmony search: an analysis. In Proceedings of 3rd IEEE international conference on digital information management (ICDIM 2008) (pp. 13–16). London, United Kingdom.Google Scholar
  27. Omran, M. G. H., & Mahdavi, M. (2008). Global-best harmony search. Applied Mathematics and Computation, 198, 643–656.MathSciNetMATHCrossRefGoogle Scholar
  28. Polikar, R., Udpa, L., Udpa, S., & Honavar, V. (2001). Learn+ +: An incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man and Cybernetics; Part C–Cybernetics, 31, 497–508.CrossRefGoogle Scholar
  29. Storn R., & Price, K. (1997). Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11, 341–359.MathSciNetMATHCrossRefGoogle Scholar
  30. Street, W., & Kim, Y. (2001). A streaming ensemble algorithm for large scale classification. In Proceeding of the seventh international conference on knowledge discovery and data mining (pp. 377–382). NY.Google Scholar
  31. Wang, H., Fan, W., Yu, P., & Han, J. (2003). Mining concept-drifting data streams using ensemble classifiers. In Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD2003) (pp. 226–235). Washington, D.C.Google Scholar
  32. Widmer G., & Kubat, M. (1996). Learning in the presence of concept drift and hidden contexts. Machine Learning, 23(1), 69–101.Google Scholar
  33. Witten, I. H., & Frank, E. (1999). Data mining: Practical machine learning tools with Java implementations. San Francisco: Morgan Kaufmann.Google Scholar
  34. Zhang, Y., & Bhattacharyya, S. (2004). Genetic programming in classifying large-scale data: An ensemble method. Information Sciences, 163, 85–101.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Computer EngineeringSharif University of TechnologyTehranIran

Personalised recommendations