Progress in Artificial Intelligence

, Volume 1, Issue 1, pp 71–87 | Cite as

Scaling up data mining algorithms: review and taxonomy

  • Nicolás García-Pedrajas
  • Aida de Haro-García
Review

Abstract

The overwhelming amount of data that are now available in any field of research poses new problems for data mining and knowledge discovery methods. Due to this huge amount of data, most of the current data mining algorithms are inapplicable to many real-world problems. Data mining algorithms become ineffective when the problem size becomes very large. In many cases, the demands of the algorithm in terms of the running time are very large, and mining methods cannot be applied when the problem grows. This aspect is closely related to the time complexity of the method. A second problem is linked with performance; although the method might be applicable, the size of the search space prevents an efficient execution, and the resulting solutions are unsatisfactory. Two approaches have been used to deal with this problem: scaling up data mining algorithms and data reduction. However, because data reduction is a data mining task itself, this technique also suffers from scalability problems. Thus, for many problems, especially when dealing with very large datasets, the only way to deal with the aforementioned problems is to scale up the data mining algorithm. Many efforts have been made to obtain methods that can be used to scale up existing data mining algorithms. In this paper, we review the methods that have been used to address the problem of scalability. We focus on general ideas, rather than specific implementations, that can be used to provide a general view of the current approaches for scaling up data mining methods. A taxonomy of the algorithms is proposed, and many examples of different tasks are presented. Among the different techniques used for data mining, we will pay special attention to evolutionary methods, because these methods have been used very successfully in many data mining tasks.

Keywords

Data mining Scaling-up Parallel algorithms Very large datasets 

References

  1. 1.
    Alba E., Nebro A.J., Troya J.M.: Heterogeneous computing and parallel genetic algorithms. J. Parallel Distrib. Comput. 62, 1362–1385 (2002)CrossRefMATHGoogle Scholar
  2. 2.
    Aldinucci, M., Ruggieri, S., Torquati, M.: Porting decision tree algorithms to multicore using fastflow. In: Balcázar, J.L. Bonchi, F., Gionis, A., Sebag, M. (eds.) Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases ECML PKDD, Lecture Notes in Computer Science, vol. 6321, pp. 7–23 (2010)Google Scholar
  3. 3.
    Anderson, P.G., Arney, J.S., Inverso, S.A., Kunkle, D.R., Lebo, T., Merrigan, C.: Good halftone masks via genetic algorithms. In: Proceedings of the 2003 Western New York Image Processing Workshop (2003)Google Scholar
  4. 4.
    Andrews N.O., Fox E.A.: Clustering for data reduction: A divide and conquer approach. Technical Report, Virginia Tech (2007)Google Scholar
  5. 5.
    Aronis, J., Provost, F.: Increasing the efficiency of data mining algorithms with breadth-first marker propagation. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 119–122. AAAI Press, Menlo Park (1997)Google Scholar
  6. 6.
    Bader D.A., Cong G.: A fast, parallel spanning tree algorithm for symmetric multiprocessors (smps). J. Parallel Distrib. Comput. 65(9), 994–1006 (2005)CrossRefMATHGoogle Scholar
  7. 7.
    Barolli L., Ikeda M., de Marco G., Durresi A., Koyama A., Iwashige J.: A search space reduction algorithm for improving the performance of a ga-based qos routing method in ad-hoc networks. Int. J. Distrib. Sens. Netw. 3, 41–57 (2007)CrossRefGoogle Scholar
  8. 8.
    Bauer E., Kohavi R.: An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach. Learn. 36(1/2), 105–142 (1999)CrossRefGoogle Scholar
  9. 9.
    Bengtsson, T., Bickel, P., Li, B.: Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems. In: Probability and Statistics: Essays in Honor of David A. Freedman, IMS Collections, vol. 2, pp. 316–334. Institute of Mathematical Statistics (2008)Google Scholar
  10. 10.
    Bentley J.L.: Parallel algorithm for constructing minimum spanning trees. J. Algorithms 1, 51–59 (1980)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Berger J., Barkaoui M.: A new hybrid genetic algorithm for the capacitated vehicle routing problem. J. Oper. Res. Soc. 54, 1254–1262 (2003)CrossRefMATHGoogle Scholar
  12. 12.
    Berman H.M., Westbrook J., Feng Z., Gilliland G., Bhat T.N., Weissig H., Shindyalov I., Bourne P.E.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)CrossRefGoogle Scholar
  13. 13.
    Boullé M.: A parameter-free classification method for large scale learning. J. Mach. Learn. Res. 10, 1367–1385 (2009)MathSciNetGoogle Scholar
  14. 14.
    Brain, D., Webb, G.I.: The need for low bias algorithms in classification learning from large data sets. In: Proceedings of the 16th European Conference Principles of Data Mining and Knowledge Discovery (PKDD’2002), Lecture Notes in Artificial Intelligence, vol. 2431, pp. 62–73. Springer Verlag, New York (2002)Google Scholar
  15. 15.
    Breiman L.: Pasting small votes for classification in large databases and on-line. Mach. Learn. 36(1–2), 85–103 (1999)CrossRefGoogle Scholar
  16. 16.
    Brent M.R., Guigó R.: Recent advances in gene structure prediction. Curr. Opin. Struct. Biol. 14, 264–272 (2004)CrossRefGoogle Scholar
  17. 17.
    Brill F.Z., Brown D.E., Martin W.N.: Fast genetic selection of features for neural networks classifiers. IEEE Trans. Neural Netw. 3(2), 324–334 (1992)CrossRefGoogle Scholar
  18. 18.
    Brugger, S.T., Kelley, M., Sumikawa, K., Wakumoto, S.: Data mining for security information: A survey. In: Proceedings of the 8th Association for Computing Machinery Conference on Computer and Communications Security (2001)Google Scholar
  19. 19.
    Cano J.R., Herrera F., Lozano M.: Stratification for scaling up evolutionary prototype selection. Pattern Recognit. Lett. 26(7), 953–963 (2005)CrossRefGoogle Scholar
  20. 20.
    Cantú-Paz E.: A survey of parallel genetic algorithms. Calc. Paralleles 10, 141–171 (1997)Google Scholar
  21. 21.
    Cantú-Paz E.: Efficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publisher, Dordrecht (2001)Google Scholar
  22. 22.
    Cantú-Paz E., Kamath C.: Evolving neural networks to identify bent-double galaxies in the first survey. Neural Netw. 16, 507–517 (2003)CrossRefGoogle Scholar
  23. 23.
    Chang, E.Y., Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z.: Psvm: Parallelizing support vector machines on distributed computers. In: Advances in Neural Information Processing Systems vol. 20, pp. 329–340 (2007)Google Scholar
  24. 24.
    Chang F., Guo C.Y., Lin X.R., Lu C.J.: Tree decomposition for large-scale SVM problems. J. Mach. Learn. Res. 11, 2855–2892 (2010)MathSciNetGoogle Scholar
  25. 25.
    Chattratichat, J., Darlington, J., Ghanem, M., Guo, Y., Hüning, H., Köhler, M., Sutiwaraphun, J., To, H.W., Yang, D.: Large scale data mining: challenges and responses. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 143–146 (1997)Google Scholar
  26. 26.
    Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P.: SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)MATHGoogle Scholar
  27. 27.
    Chawla N.W., Hall L.O., Bowyer K.W., Kegelmeyer W.P.: Learning ensembles from bites: A scalable and accurate approach. J. Mach. Learn. Res. 5, 421–451 (2004)MathSciNetGoogle Scholar
  28. 28.
    Collobert R., Bengio S., Bengio Y.: A parallel mixture of SVMs for very large scale problems. Neural Comput. 14, 1105–1114 (2002)CrossRefMATHGoogle Scholar
  29. 29.
    Cordón O., Herrera-Viedma E., López-Pujalte C., Luque M., Zarco C.: A review on the application of evolutionary computation to information retrieval. Int. J. Approx. Reason. 34, 241–264 (2003)CrossRefMATHGoogle Scholar
  30. 30.
    Craven M., DiPasquoa D., Freitagb D., McCalluma A., Mitchella T., Nigama K., Slatterya S.: Learning to construct knowledge bases from the world wide web. Artif. Intell. 118(1–2), 69–113 (2000)CrossRefMATHGoogle Scholar
  31. 31.
    Cui J., Fogarty T.C., Gammack J.G.: Searching databases using parallel genetic algorithms on a transputer computing surface. Future Gener. Comput. Syst. 9(1), 33–40 (1993)CrossRefGoogle Scholar
  32. 32.
    Dean J., Ghemawat S.: Mapreduce: A flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)CrossRefGoogle Scholar
  33. 33.
    de Haro-García, A., García-Pedrajas, N.: Scaling up feature selection by means of pseudoensembles of feature selectors. IEEE Trans. Pattern Anal. Mach. Intell. (2011) (submitted)Google Scholar
  34. 34.
    de Haro-García, A., Kuncheva, L., García-Pedrajas, N.: Random splitting for cascade feature selection. Technical Report, University or Córdoba (2011)Google Scholar
  35. 35.
    de Haro-García A., Pedrajas N.G.: A divide-and-conquer recursive approach for scaling up instance selection algorithms. Data Min. Knowl. Discov. 18(3), 392–418 (2009)MathSciNetCrossRefGoogle Scholar
  36. 36.
    del Carpio C.A.: A parallel genetic algorithm for polypeptide three dimensional structure prediction. a transputer implementation. J. Chem. Inf. Comput. Sci. 36(2), 258–269 (1996)CrossRefGoogle Scholar
  37. 37.
    Dementiev, R., Sanders, P., Schultes, D.: Engineering an eternal memory minimum spanning tree algorithm. In: Proceedings of the Third IFIP International Conference on Theoretical Computer Science (TCS’04), pp. 195–208 (2004)Google Scholar
  38. 38.
    Derrac J., García S., Herrera F.: Stratified prototype selection based on a steady-state memetic algorithm: a study of scalability. Memet. Comput. 2, 183–189 (2010)CrossRefGoogle Scholar
  39. 39.
    Domingos, P., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 106–113. Morgan Kaufmann (2001)Google Scholar
  40. 40.
    Domingos, P., Hulten, G.: Learning from infinite data in finite time. In: Proceedings of Advances in Neural Information Systems, vol. 14, pp. 673–680. Vancouver, Canada (2001)Google Scholar
  41. 41.
    Domingos P., Hulten G.: A general framework for mining massive data streams. J. Comput. Graph. Stat. 12(4), 945–949 (2003)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Du Z., Lin F.: A novel approach for hierarchical clustering. Parallel Comput. 31, 523–527 (2005)CrossRefGoogle Scholar
  43. 43.
    Eggermont, J., Kok, J.N., Kosters, W.A.: Genetic programming for data classification: Refining the search space. In: Proceedings of the 2004 ACM symposium on Applied computing. ACM Press, New York (2004)Google Scholar
  44. 44.
    Eitrich, T., Lang, B.: Data mining with parallel support vector machines for classification. In: Yakhno, T., Neuhold, E. (eds.) Proceedings of the Fourth Biennial International Conference on Advances in Information Systems, Lectures Notes in Computer Science, vol. 4243, pp. 197–206 (2006)Google Scholar
  45. 45.
    Fan, W., Stolfo, S., Zhang, J.: The application of Adaboost for distributed, scalable and on-line learning. In: Proceedings of the Fifth ACD SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 362–366. San Diego, CA, USA (1999)Google Scholar
  46. 46.
    Fan Y., Jiang T., Evans D.J.: Volumetric segmentation of brain images using parallel genetic algorithms. IEEE Trans. Med. Imaging 21(8), 904–909 (2002)CrossRefGoogle Scholar
  47. 47.
    Fletcher J., Obradovic Z.: Combining prior symbolic knowledge and constructive neural networks. Connect. Sci. 5(3, 4), 365–375 (1993)CrossRefGoogle Scholar
  48. 48.
    Flores, J.J., Rodríguez, H., Graff, M.: Reducing the search space in evolutive design of arima and ann models for time series prediction. In: Proceedings of the 9th Mexican International Conference on Artificial Intelligence, Lecture Notes in Computer Science, vol. 6438, pp. 325–336 (2010)Google Scholar
  49. 49.
    Freitas, A.A.: A Survey of Parallel Data Mining. In: Arner, H.F., Mackin, N. (eds.) Proceedings of the 2nd International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 287–300. The Practical Application Company (1998)Google Scholar
  50. 50.
    García S., Cano J.R., Herrera F.: A memetic algorithm for evolutionary prototype selection: A scaling up approach. Pattern Recognit. 41, 2693–2709 (2008)CrossRefMATHGoogle Scholar
  51. 51.
    García-Osorio C., de Haro-García A., García-Pedrajas N.: Democratic instance selection: a linear complexity instance selection algorithm based on classifier ensemble concepts. Artif. Intell. 174, 410–441 (2010)CrossRefGoogle Scholar
  52. 52.
    García-Pedrajas N.: Supervised projection approach for boosting classifiers. Pattern Recognit. 42, 1741–1760 (2009)CrossRefGoogle Scholar
  53. 53.
    García-Pedrajas N., del Castillo J.A.R., Ortiz-Boyer D.: A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach. Learn. 78, 381–420 (2010)CrossRefGoogle Scholar
  54. 54.
    García-Pedrajas N., Hervás-Martínez C., Muñoz-Pérez J.: Covnet: A cooperative coevolutionary model for evolving artificial neural networks. IEEE Trans. Neural Netw. 14(3), 575–596 (2003)CrossRefGoogle Scholar
  55. 55.
    García-Pedrajas N., Hervás-Martínez C., Ortiz-Boyer D.: Cooperative coevolution of artificial neural network ensembles for pattern classification. IEEE Trans. Evol. Comput. 9(3), 271–302 (2005)CrossRefGoogle Scholar
  56. 56.
    García-Pedrajas N., Ortiz-Boyer D.: A cooperative constructive method for neural networks for pattern recognition. Pattern Recognit. 40(1), 80–99 (2007)CrossRefMATHGoogle Scholar
  57. 57.
    García-Pedrajas N., Pérez-Rodríguez J., García-Pedrajas M.D., Ortiz-Boyer D., Fyfe C.: Class imbalance methods for translation initiation site recognition in dna sequences. Knowl. Based Syst. 25, 22–34 (2012)CrossRefGoogle Scholar
  58. 58.
    Graf, H.P., Cosatto, E., Bottou, L., Dourdanovic, I., Vapnik, V.: Parallel support vector machines: the cascade svm. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Neural Information Processing Systems, vol. 17, pp. 521–528 (2004)Google Scholar
  59. 59.
    Graf, H.P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallel support vector machines: The cascade SVM. In: Advances in Neural Information Processing Systems, pp. 521–528. MIT Press, Cambridge (2005)Google Scholar
  60. 60.
    Griffin, J.D.: Methods for reducing search and evaluating fitness functions in genetic algorithms for the snake-in-the-box problem. Ph.D. thesis, The University of Georgia (2009)Google Scholar
  61. 61.
    Guyon I., Elisseeff A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  62. 62.
    Holte, R., Acker, L., Porterm, B.: Concept learning and the problem of small disjuncts. In: Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pp. 813–818. Morgan Kaufmann (2002)Google Scholar
  63. 63.
    Hong J.H., Cho S.B.: Efficient huge-scale feature selection with speciated genetic algorithm. Pattern Recognit. Lett. 27, 143–150 (2006)CrossRefGoogle Scholar
  64. 64.
    Howffding W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963)CrossRefGoogle Scholar
  65. 65.
    Huang, D.W., Lin, J.: Scaling populations of a genetic algorithm for job shop scheduling problems using mapreduce. In: Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp. 780–785 (2010)Google Scholar
  66. 66.
    Huber, P.: From large to huge: A statistician’s reaction to kdd and dm. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 304–308. AAAI Press (1997)Google Scholar
  67. 67.
    Hulten, G., Domingos, P.: Mining complex models from arbitrarily large databases in constant time. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp. 525–531. Edmonton, Canada (2002)Google Scholar
  68. 68.
    Hwang W.J., Ou C.M., Hung P.C., Yang C.Y., Yu T.H.: An efficient distributed genetic algorithm architecture for vector quantizer design. Open Artif. Intell. J. 4, 20–29 (2010)CrossRefGoogle Scholar
  69. 69.
    Islam M.M., Yao X., Murase K.: A constructive algorithm for training cooperative neural network ensembles. IEEE Trans. Neural Netw. 14(4), 820–834 (2003)CrossRefGoogle Scholar
  70. 70.
    Jin R., Yang G., Agrawal G.: Shared memory parallelization of data mining algorithms: Techniques, programming interface, and performance. IEEE Trans. Knowl. Data Eng. 17(1), 71–89 (2005)CrossRefGoogle Scholar
  71. 71.
    Johnson, D.B., Metaxas, P.: A parallel algorithm for computing minimum spanning trees. In: Proceedings of the Fourth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’92), pp. 363–372 (1992)Google Scholar
  72. 72.
    Judd D., McKinley P.K., Jain A.K.: Large-scale parallel data clustering. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 871–876 (1998)CrossRefGoogle Scholar
  73. 73.
    Kerdprasop K., Kerdprasop N.: A lightweight method to parallel k-means clustering. Int. J. Math. Comput. Simul. 4, 144–153 (2010)Google Scholar
  74. 74.
    Knuth D.E.: The Art of Computer Programming. Addison- Wesley, Reading (1997)Google Scholar
  75. 75.
    Kononova, A.V., Ingham, D.B., Pourkashanian, M.: Simple scheduled memetic algorithm for inverse problems in higher dimensions: application to chemical kinetics. In: Proceedings of the IEEE world congress on computational intelligence CEC’2008, pp. 3906–3913. IEEE Press (2008)Google Scholar
  76. 76.
    Kumari B., Swarnkar T.: Filter versus wrapper feature subset selection in large dimensionality micro array: A review. Int. J. Comput. Sci. Inf. Technol. 2, 1048–1053 (2011)Google Scholar
  77. 77.
    Larrañaga P., Kuijpers C.M.H., Murga R.H., Inza I., Dizdarevic S.: Genetic algorithms for the traveling salesman problem: A review of representations and operators. Artif. Intell. Rev. 13(2), 129–170 (1999)CrossRefGoogle Scholar
  78. 78.
    Lazarevic A., Obradovic Z.: Boosting algorithms for parallel and distributed learning. Distrib. Parallel Databases 11, 203–229 (2002)CrossRefMATHGoogle Scholar
  79. 79.
    Leavitt N.: Data mining for the corporate masses?. Computer 35, 22–24 (2002)CrossRefGoogle Scholar
  80. 80.
    Li X., Fang Z.: Parallel clustering algorithms. Parallel Comput. 11, 275–290 (1989)MathSciNetCrossRefMATHGoogle Scholar
  81. 81.
    Li, X., Yao, X.: Tackling high dimensional nonseparable optimization problems by cooperatively coevolving particle swarms. In: Proceedings of the IEEE Congress on Eevolutionary Computation CEC’2009, pp. 1546–1556 (2009)Google Scholar
  82. 82.
    Lim D., Ong Y.S., Jin Y., Sendhoff B., Lee B.S.: Efficient hierarchical parallel genetic algorithms using grid computing. Future Gener. Comput. Syst. 23, 658–670 (2007)CrossRefGoogle Scholar
  83. 83.
    Lin Y., Chung S.M.: Parallel bisecting k-means with prediction clustering algorithm. J. Supercomput. 39, 19–37 (2007)CrossRefGoogle Scholar
  84. 84.
    Liu Z., Liu A., Wang C., Niu Z.: Evolving neural networks using real coded genetic algorithm (ga) for multispectral image classification. Future Gener. Comput. Syst. 20(7), 1119–1129 (2004)CrossRefGoogle Scholar
  85. 85.
    Lodhi H., Saunders C., Shawe-Taylor J., Christiani N., Watkins C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)MATHGoogle Scholar
  86. 86.
    Lu, C.T., Boedihardjo, A.P., Manalwar, P.: Exploiting efficient data mining techniques to enhance intrusion detection systems. In: Proceedings of the 2005 IEEE International Conference on Information Reuse and Integration (IEEE IRI-2005 Knowledge Acquisition and Management), pp. 512–517 (2005)Google Scholar
  87. 87.
    Lu Y., Roychowdhury V.: Parallel randomized sampling for support vector machine (SVM) and support vector regression (SVR). Knowl. Inf. Syst. 14(2), 233–247 (2008)CrossRefGoogle Scholar
  88. 88.
    Lu Y., Roychowdhury V., Vandenberghe L.: Distributed parallel support vector machines in strongly connected networks. IEEE Trans. Neural Netw. 19(7), 1167–1178 (2008)CrossRefGoogle Scholar
  89. 89.
    Marchiori, E., Steenbeek, A.: An evolutionary algorithm for large scale set covering problems with application to airline crew scheduling, pp. 367–381. Lecture Notes in Computer Science. Springer, Berlin (2000)Google Scholar
  90. 90.
    Moore, A.: Very fast em-based mixture model clustering using multiresolution kd-trees. In: Kearns, M., Cohn, D. (eds.) Advances in Neural Information Processing Systems, pp. 543–549. Morgan Kaufman (1999)Google Scholar
  91. 91.
    Moriarty D.E., Miikkulainen R.: Efficient reinforcement learning through symbiotic evolution. Mach. Learn. 22, 11–32 (1996)Google Scholar
  92. 92.
    Moser, A., Murty, M.N.: On the scalability of genetic algorithms to very large-scale feature selection. In: Proceedings of EvoWorkshops 2000, Lecture Notes in Computer Science, vol. 1603, pp. 77–86. Springer-Verlag, New York (2000)Google Scholar
  93. 93.
    Murtagh, F.: Clustering in massive data sets. In: Handbook of Massive Data Sets, pp. 501–543. Kluwer Academic Publishers, Dordrecht (2002)Google Scholar
  94. 94.
    Neumann F., Wegener I.: Minimum spanning trees made easier. Nat. Comput. 5(3), 305–319 (2006)MathSciNetCrossRefMATHGoogle Scholar
  95. 95.
    Nopiah, Z.M., Khairir, M.I., Abdullah, S., Baharin, M.N., Airfin, A.: Time complexity analysis of the genetic algorithm clustering method. In: Proceedings of the 9th WSEAS international conference on Signal processing, robotics and automation, pp. 171–176 (2010)Google Scholar
  96. 96.
    Nowostawski, M., Poli, R.: Parallel genetic algorithm taxonomy. In: Proceedings of the Third International Conference on Knowledge-Based Intelligent Information Engineering Systems, pp. 88–92 (1999)Google Scholar
  97. 97.
    Obradovic, Z., Rangarajan, S.: Constructive neural networks design using genetic optimization, pp. 133–146. No. 15 in Mathematics and Informatics. University of Nis (2000)Google Scholar
  98. 98.
    Oliveto P.S., He J., Yao X.: Time complexity of evolutionary algorithms for combinatorial optimization: A decade of results. Int. J. Autom. Comput. 4(1), 100–106 (2007)Google Scholar
  99. 99.
    Olman V., Mao F., Wu H., Xu Y.: Parallel clustering algorithm for large data sets with applications in bioinformatics. IEEE/ACM Trans. Comput. Biol. Bioinforma. 6(2), 344–352 (2009)CrossRefGoogle Scholar
  100. 100.
    Olson C.F.: Parallel algorithms for hierarchical clustering. Parallel Comput. V 21, 1313–1325 (1995)MathSciNetCrossRefMATHGoogle Scholar
  101. 101.
    Othman, F., Abdullah, R., Rashid, N.A., Salam, R.A.: Parallel k-means clustering algorithm on dna dataset. In: Proceedings of the 5th International Conference on Parallel and Distributed Computing: Applications and Technologies, (PDCAT’04), Lecture Notes in Computer Science, vol. 3320, pp. 248–251 (2004)Google Scholar
  102. 102.
    Pal S.K., Bandyopadhyay S.: Evolutionary computation in bioinformatics: A review. IEEE Trans. Syst. Man Cybern. Part B Cybern. 36, 601–615 (2006)CrossRefGoogle Scholar
  103. 103.
    Panigrahy, R.: An improved algorithm finding nearest neighbor using kd-trees. In: Proceedings of the 8th Latin American Symposium, Lectures Notes in Computer Science, vol. 4957, pp. 387–398. Springer, Berlin (2008)Google Scholar
  104. 104.
    Parekh R., Yang J., Honavar V.: Constructive neural-network learning algorithms for pattern classification. IEEE Trans. Neural Netw. 11(2), 436–450 (2000)CrossRefGoogle Scholar
  105. 105.
    Potter, M.A.: The design and analysis of a computational model of cooperative coevolution. Ph.D. thesis, George Mason University, Fairfax, Virginia (1997)Google Scholar
  106. 106.
    Potter M.A., De Jong K.A.: Cooperative coevolution: An architecture for evolving coadapted subcomponents. Evol. Comput. 8(1), 1–29 (2000)CrossRefGoogle Scholar
  107. 107.
    Provost F.J., Kolluri V.: A survey of methods for scaling up inductive learning algorithms. Data Min. Knowl. Discov. 2, 131–169 (1999)CrossRefGoogle Scholar
  108. 108.
    Quinn M.J.: Parallel Computing: Theory and Practice. McGraw-Hill, New York (1994)Google Scholar
  109. 109.
    Rasmussen E.M., Willet P.: Efficiency of hierarchical agglomerative clustering using ICL distributed array processors. J. Doc. 45(1), 1–24 (1989)CrossRefGoogle Scholar
  110. 110.
    Rausch T., Thomas A., Camp N.J., Cannon-Albrigth L.A., Facelli J.C.: A parallel genetic algorithm to discover patterns in genetic markers that indicate predisposition to multifactorial disease. Comput. Biol. Med. 38, 826–836 (2008)CrossRefGoogle Scholar
  111. 111.
    Rida, A., Labbi, A., Pellegrini, C.: Local experts combination through density decomposition. In: Proceedings of the 7th International Workshop on Artificial Intelligence and Statistics, pp. 692–699 (1999)Google Scholar
  112. 112.
    Rodríguez M., Escalante D.M., Peregrín A.: Efficient distributed genetic algorithm for rule extraction. Appl. Soft Comput. 11, 733–743 (2011)CrossRefGoogle Scholar
  113. 113.
    Rosset S., Zhu J., Hastie T.: Boosting as a regularized path to a maximum margin classifier. J. Mach. Learn. Res. 5, 941–973 (2004)MathSciNetMATHGoogle Scholar
  114. 114.
    Rudin C., Daubechies I., Schapire R.E.: The dynamics of adaboost: Cyclic behavior and convergence of margins. J. Mach. Learn. Res. 5, 1557–1595 (2004)MathSciNetMATHGoogle Scholar
  115. 115.
    Ruiz R.: Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognit. 39, 2383–2392 (2006)CrossRefGoogle Scholar
  116. 116.
    Schapire R.E., Freund Y., Bartlett P.L., Lee W.S.: Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Stat. 26(5), 1651–1686 (1998)MathSciNetCrossRefMATHGoogle Scholar
  117. 117.
    Sebban M., Nock R.: A hybrid filter/wrapper approach of feature selection using information theory. Pattern Recognit. 35, 835–846 (2002)CrossRefMATHGoogle Scholar
  118. 118.
    Sena G.S., Megherbi D., Iserm G.: Implementation of a parallel genetic algorithm on a cluster of workstations: Travelling salesman problem, a case study. Future Gener. Comput. Syst. 17(4), 477–488 (2001)CrossRefMATHGoogle Scholar
  119. 119.
    Sibson R.: Slink: An optimally efficient algorithm for the single link cluster method. Comput. J. 16, 30–34 (1973)MathSciNetCrossRefGoogle Scholar
  120. 120.
    Sikonja, M.R.: Speeding up relief algorithm with k-d trees. In: Proceedings of Electrotechnical and Computer Science Conference (ERK’98), pp. 137–140. Portoroz, Slovenia (1998)Google Scholar
  121. 121.
    Skillicorn D.: Strategies for parallel data mining. IEEE Concurr. 7(4), 26–35 (1999)CrossRefGoogle Scholar
  122. 122.
    Smieja F.: Neural-network constructive algorithms: Trading generalization for learning efficiency?. Circuits Syst. Signal Process. 12(2), 331–374 (1993)CrossRefMATHGoogle Scholar
  123. 123.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000)Google Scholar
  124. 124.
    Steinhaeuser, K., Chawla, N.V., Kogge, P.M.: Exploiting thread-level parallelism to build decision trees. In: Proceedings of the ECML/PKDD Workshop on Parallel Data Mining (PDM). Berlin, Germany (2006)Google Scholar
  125. 125.
    Stoffel, K., Belkoniene, A.: Parallel k/h-means clustering for large data sets. In: Proceedings of the 5th International Parallel Processing Conference (Euro-Par’99), Lecture Notes in Computer Science, vol. 1685, pp. 1451–1454 (1999)Google Scholar
  126. 126.
    Tresp V.: A bayesian committee machine. Neural Comput. 12, 2719–2741 (2000)CrossRefGoogle Scholar
  127. 127.
    van den Bergh F., Engelbrecht A.P.: A cooperative approach to particle swarm optimization. IEEE Trans. Evol. Comput. 8, 225–239 (2004)CrossRefGoogle Scholar
  128. 128.
    Verma, A., Llorà, X., Goldberg, D.E., Campbell, R.H.: Scaling genetic algorithms using mapreduce. In: Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications, pp. 13–17 (2009)Google Scholar
  129. 129.
    Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.: Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer, Berlin, Germany (2010)Google Scholar
  130. 130.
    Yang Z., Tang K., Yao X.: Large scale evolutionary optimization using cooperative coevolution. Inf. Sci. 178, 2985–2999 (2008)MathSciNetCrossRefGoogle Scholar
  131. 131.
    Yao X.: Evolving artificial neural networks. Proc. IEEE 9(87), 1423–1447 (1999)Google Scholar
  132. 132.
    Yen S.H., Shih C.Y., Li T.K., Chang H.W.: Applying multiple kd-trees in high dimensional nearest neighbor searching. Int. J. Circuits Syst. Signal Process. 4, 153–160 (2010)Google Scholar
  133. 133.
    Yıldız O.T., Dikmen O.: Parallel univariate decision trees. Neural Process. Lett. 28, 825–832 (2007)Google Scholar
  134. 134.
    Yin, D., An, C., Baird, H.S.: Imbalance and concentration in k-nn classification. In: Proceedings of 20th International Conference on Pattern Recognition (ICPR’2010), pp. 2170–2173. IEEE Press (2010)Google Scholar
  135. 135.
    Yong, Z., Sannomiya, N.: A method for solving large-scale flowshop problems by reducing search space of genetic algorithms. In: 2000 IEEE International Conference on Systems, Man, and Cybernetics, vol. 3, pp. 1776–1781. IEEE Press (2000)Google Scholar
  136. 136.
    Yu, T., Davis, L., Baydar, C., Roy, R. (eds.): Evolutionary Computation in Practice, Studies in Computational Intelligence, vol. 88. Springer, Berlin (2008)Google Scholar
  137. 137.
    Zien A., Rätsch G., Mika S., Schölkopf B., Lengauer T., Müller K.R.: Engineering support vector machines kernels that recognize translation initiation sites. Bioinformatics 16(9), 799–807 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Nicolás García-Pedrajas
    • 1
  • Aida de Haro-García
    • 1
  1. 1.Computational Intelligence and Bioinformatics Research GroupUniversity of CórdobaCórdobaSpain

Personalised recommendations