Abstract
This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to perform data management, design of multiple kind of experiments, statistical analyses, etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring data partitions and algorithms’ results over these problems. In this work, we describe the most recent components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility of KEEL to deal with more modern data mining problems.
Article PDF
Avoid common mistakes on your manuscript.
References
KNIME: The konstanz information miner, 2016. http://www.knime.org
J. Alcalá-Fdez, R. Alcalá, and F. Herrera. A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Transactions on Fuzzy Systems, 19(5):857–872, 2011.
J. Alcalá-Fdez and J. M. Alonso. A survey of fuzzy systems software: Taxonomy, current research trends, and prospects. IEEE Transactions on Fuzzy Systems, 24(1):40–56, 2016.
J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, L. Sánchez, and F. Herrera. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17(2-3):255–287, 2011.
J. Alcalá-Fdez, L. Sánchez, S. García, M.J. del Jesus, S. Ventura, J.M. Garrell, J. Otero, C. Romero, J. Bacardit, V.M. Rivas, J.C. Fernández, and F. Herrera. KEEL: A software tool to assess evolutionary algorithms to data mining problems. Soft Computing, 13(3):307–318, 2009.
J. Amores. Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence, 201:81–105, 2013.
Revolution Analytics and Steve Weston. doParallel: Foreach Parallel Adaptor for the ‘parallel’ Package, 2015. R package version 1.0.10.
Revolution Analytics and Steve Weston. foreach: Provides Foreach Looping Construct for R, 2015. R package version 1.4.3.
M. Atzmueller. Subgroup discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 5(1):35–49, 2015.
P. Branco, L. Torgo, and R. P. Ribeiro. A survey of predictive modeling on imbalanced domains. ACM Comput. Surv., 49(2):31:1–31:50, August 2016.
C. J. Carmona, P. González, M. J. del Jesus, and F. Herrera. Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(2):87–103, 2014.
Winston Chang. R6: Classes with Reference Semantics, 2015. R package version 2.1.1.
O. Chapelle, B. Schlkopf, and A. Zien. Semi-Supervised Learning. The MIT Press, 1st edition, 2010.
F. Coelho, A.P. Braga, and M. Verleysen. A mutual information estimator for continuous and discrete variables applied to feature selection and classification problems. International Journal of Computational Intelligence Systems, 9 – 4:726 – 733, 2016.
J. Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1–30, 2006.
J. Derrac, S. García, and F. Herrera. Fuzzy nearest neighbor algorithms: Taxonomy, experimental analysis and prospects. Information Sciences, 260:98 – 119, 2014.
A. Fernández, V. López, M.J. del Jesus, and F. Herrera. Revisiting evolutionary fuzzy systems: Taxonomy, applications, new trends and challenges. Knowledge-Based Systems, 80:109 – 121, 2015.
Eibe Frank, Mark A. Hall, and Ian H. Witten. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniqueś’. Morgan Kaufmann, Fourth Edition, 2016.
B. Frénay and M. Verleysen. Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5):845– 869, 2014.
M. Galar, A. Fernández, E. Barrenechea, H. Bustince, and F. Herrera. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 42(4):463–484, 2012.
S. García and F. Herrera. An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. Journal of Machine Learning Research, 9:2677–2694, 2008.
S. García, J. Luengo, and F. Herrera. Data Preprocessing in Data Mining, volume 72 of Intelligent Systems Reference Library. Springer, 2015.
S. García, J. Luengo, and F. Herrera. Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowledge-Based Systems, 98:1 – 29, 2016.
S. García, J. Luengo, J. A. Sáez, V. López, and F. Herrera. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. Knowledge and Data Engineering, IEEE Transactions on, 25(4):734–750, 2013.
J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., 3rd edition, 2011.
F. Herrera, C.J. Carmona, P. González, and M.J. del Jesus. An overview on subgroup discovery: foundations and applications. Knowledge and Information Systems, 29(3):495–525, 2011.
F. Herrera, S. Ventura, R. Bello, C. Cornelis, A. Zafra, D. Sanchez-Tarrago, and S. Vluymans. Multiple Instance Learning. Foundations and Algorithms. Springer, 2016.
G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning. Springer, 2013.
B. Krawczyk. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4):221–232, 2016.
Duncan Temple Lang and the CRAN Team. XML: Tools for Parsing and Generating XML Within R and S-Plus, 2015. R package version 3.98-1.3.
V. López, A. Fernández, S. García, V. Palade, and F. Herrera. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250:113–141, 2013.
J. M. Moyano and L. Sanchez. RKEEL: using KEEL in R code. 2016 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2016, pages 257–264, July 2016.
J. M. Moyano and L. Sanchez. RKEEL: Using Keel in R Code, 2017. R package version 1.1.21.
S. Ramírez-Gallego, S. García, H. Mourio-Talín, D. Martínez-Rego, V. Bolón-Canedo, A. Alonso-Betanzos, J.M. Benítez, and F. Herrera. Data discretization: taxonomy and big data challenge. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 6(1):5–21, 2016.
S. Sonnenburg, M.L. Braun, Ch.S. Ong, S. Bengio, L. Bottou, G. Holmes, Y. LeCun, K.-R. Müller, F. Pereira, C.E. Rasmussen, G. Rätsch, B. Schölkopf, A. Smola, P. Vincent, J. Weston, and R. Williamson. The need for open source software in machine learning. Journal of Machine Learning Research, 8:2443– 2466, 2007.
I. Triguero, S. García, and F. Herrera. SEG-SSC: A framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Transactions on Cybernetics, 45(4):622–634, April 2015.
I. Triguero, S. García, and F. Herrera. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems, 2(2):245–284, 2015.
I. Triguero, S González, J.M Moyano, S. García, J. Alcalá-Fdez, J. Luengo, A. Fernández, M.J. del Jesus, L. Sánchez, and F. Herrera. Keel data-mining software suite: Integration of new algorithms, 2017. http://www.keel.es/documents/keelextension.pdf.
Simon Urbanek. rJava: Low-Level R to Java Interface, 2015. R package version 0.9-7.
G. R. Warnes, B. Bolker, G. Gorjanc, G. Grothendieck, A. Korosec, T. Lumley, D. Mac-Queen, A. Magnusson, J. Rogers, and others. gdata: Various R Programming Tools for Data Manipulation, 2015. R package version 2.17.0.
I. Yoo, P. Alafaireet, M. Marinov, K. Pena-Hernandez, R. Gopidi, J.-F. Chang, and L. Hua. Data mining in healthcare and biomedicine: A survey of the literature. Journal of Medical Systems, 36(4):2431–2448, 2012.
X. Zhu, A.B. Goldberg, R. Brachman, and T. Dietterich. Introduction to Semi-Supervised Learning. Morgan and Claypool Publishers, 2009.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
About this article
Cite this article
Triguero, I., González, S., Moyano, J.M. et al. KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining. Int J Comput Intell Syst 10, 1238–1249 (2017). https://doi.org/10.2991/ijcis.10.1.82
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.2991/ijcis.10.1.82