KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining

Triguero, Isaac; González, Sergio; Moyano, Jose M.; García, Salvador; Alcalá-Fdez, Jesús; Luengo, Julián; Fernández, Alberto; del Jesús, Maria José; Sánchez, Luciano; Herrera, Francisco

doi:10.2991/ijcis.10.1.82

KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining

Research Article
Open access
Published: 25 September 2017

Volume 10, pages 1238–1249, (2017)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computational Intelligence Systems Aims and scope Submit manuscript

KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining

Download PDF

Isaac Triguero¹,
Sergio González²,
Jose M. Moyano⁴,
Salvador García²,
Jesús Alcalá-Fdez²,
Julián Luengo²,
Alberto Fernández²,
Maria José del Jesús⁵,
Luciano Sánchez³ &
…
Francisco Herrera²

74 Accesses
Explore all metrics

Abstract

This paper introduces the 3^rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to perform data management, design of multiple kind of experiments, statistical analyses, etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring data partitions and algorithms’ results over these problems. In this work, we describe the most recent components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility of KEEL to deal with more modern data mining problems.

Article PDF

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

KNIME: The konstanz information miner, 2016. http://www.knime.org
J. Alcalá-Fdez, R. Alcalá, and F. Herrera. A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning. IEEE Transactions on Fuzzy Systems, 19(5):857–872, 2011.
Google Scholar
J. Alcalá-Fdez and J. M. Alonso. A survey of fuzzy systems software: Taxonomy, current research trends, and prospects. IEEE Transactions on Fuzzy Systems, 24(1):40–56, 2016.
Google Scholar
J. Alcalá-Fdez, A. Fernández, J. Luengo, J. Derrac, S. García, L. Sánchez, and F. Herrera. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic and Soft Computing, 17(2-3):255–287, 2011.
J. Alcalá-Fdez, L. Sánchez, S. García, M.J. del Jesus, S. Ventura, J.M. Garrell, J. Otero, C. Romero, J. Bacardit, V.M. Rivas, J.C. Fernández, and F. Herrera. KEEL: A software tool to assess evolutionary algorithms to data mining problems. Soft Computing, 13(3):307–318, 2009.
Google Scholar
J. Amores. Multiple instance classification: Review, taxonomy and comparative study. Artificial Intelligence, 201:81–105, 2013.
Google Scholar
Revolution Analytics and Steve Weston. doParallel: Foreach Parallel Adaptor for the ‘parallel’ Package, 2015. R package version 1.0.10.
Revolution Analytics and Steve Weston. foreach: Provides Foreach Looping Construct for R, 2015. R package version 1.4.3.
M. Atzmueller. Subgroup discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 5(1):35–49, 2015.
Google Scholar
P. Branco, L. Torgo, and R. P. Ribeiro. A survey of predictive modeling on imbalanced domains. ACM Comput. Surv., 49(2):31:1–31:50, August 2016.
C. J. Carmona, P. González, M. J. del Jesus, and F. Herrera. Overview on evolutionary subgroup discovery: analysis of the suitability and potential of the search performed by evolutionary algorithms. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(2):87–103, 2014.
Google Scholar
Winston Chang. R6: Classes with Reference Semantics, 2015. R package version 2.1.1.
O. Chapelle, B. Schlkopf, and A. Zien. Semi-Supervised Learning. The MIT Press, 1st edition, 2010.
F. Coelho, A.P. Braga, and M. Verleysen. A mutual information estimator for continuous and discrete variables applied to feature selection and classification problems. International Journal of Computational Intelligence Systems, 9 – 4:726 – 733, 2016.
Google Scholar
J. Demšar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1–30, 2006.
Google Scholar
J. Derrac, S. García, and F. Herrera. Fuzzy nearest neighbor algorithms: Taxonomy, experimental analysis and prospects. Information Sciences, 260:98 – 119, 2014.
Google Scholar
A. Fernández, V. López, M.J. del Jesus, and F. Herrera. Revisiting evolutionary fuzzy systems: Taxonomy, applications, new trends and challenges. Knowledge-Based Systems, 80:109 – 121, 2015.
Google Scholar
Eibe Frank, Mark A. Hall, and Ian H. Witten. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniqueś’. Morgan Kaufmann, Fourth Edition, 2016.
B. Frénay and M. Verleysen. Classification in the presence of label noise: A survey. IEEE Transactions on Neural Networks and Learning Systems, 25(5):845– 869, 2014.
Google Scholar
M. Galar, A. Fernández, E. Barrenechea, H. Bustince, and F. Herrera. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 42(4):463–484, 2012.
Google Scholar
S. García and F. Herrera. An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. Journal of Machine Learning Research, 9:2677–2694, 2008.
Google Scholar
S. García, J. Luengo, and F. Herrera. Data Preprocessing in Data Mining, volume 72 of Intelligent Systems Reference Library. Springer, 2015.
S. García, J. Luengo, and F. Herrera. Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowledge-Based Systems, 98:1 – 29, 2016.
Google Scholar
S. García, J. Luengo, J. A. Sáez, V. López, and F. Herrera. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. Knowledge and Data Engineering, IEEE Transactions on, 25(4):734–750, 2013.
Google Scholar
J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., 3rd edition, 2011.
F. Herrera, C.J. Carmona, P. González, and M.J. del Jesus. An overview on subgroup discovery: foundations and applications. Knowledge and Information Systems, 29(3):495–525, 2011.
Google Scholar
F. Herrera, S. Ventura, R. Bello, C. Cornelis, A. Zafra, D. Sanchez-Tarrago, and S. Vluymans. Multiple Instance Learning. Foundations and Algorithms. Springer, 2016.
G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning. Springer, 2013.
B. Krawczyk. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4):221–232, 2016.
Google Scholar
Duncan Temple Lang and the CRAN Team. XML: Tools for Parsing and Generating XML Within R and S-Plus, 2015. R package version 3.98-1.3.
V. López, A. Fernández, S. García, V. Palade, and F. Herrera. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250:113–141, 2013.
J. M. Moyano and L. Sanchez. RKEEL: using KEEL in R code. 2016 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2016, pages 257–264, July 2016.
J. M. Moyano and L. Sanchez. RKEEL: Using Keel in R Code, 2017. R package version 1.1.21.
S. Ramírez-Gallego, S. García, H. Mourio-Talín, D. Martínez-Rego, V. Bolón-Canedo, A. Alonso-Betanzos, J.M. Benítez, and F. Herrera. Data discretization: taxonomy and big data challenge. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 6(1):5–21, 2016.
Google Scholar
S. Sonnenburg, M.L. Braun, Ch.S. Ong, S. Bengio, L. Bottou, G. Holmes, Y. LeCun, K.-R. Müller, F. Pereira, C.E. Rasmussen, G. Rätsch, B. Schölkopf, A. Smola, P. Vincent, J. Weston, and R. Williamson. The need for open source software in machine learning. Journal of Machine Learning Research, 8:2443– 2466, 2007.
I. Triguero, S. García, and F. Herrera. SEG-SSC: A framework based on synthetic examples generation for self-labeled semi-supervised classification. IEEE Transactions on Cybernetics, 45(4):622–634, April 2015.
I. Triguero, S. García, and F. Herrera. Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study. Knowledge and Information Systems, 2(2):245–284, 2015.
Google Scholar
I. Triguero, S González, J.M Moyano, S. García, J. Alcalá-Fdez, J. Luengo, A. Fernández, M.J. del Jesus, L. Sánchez, and F. Herrera. Keel data-mining software suite: Integration of new algorithms, 2017. http://www.keel.es/documents/keelextension.pdf.
Simon Urbanek. rJava: Low-Level R to Java Interface, 2015. R package version 0.9-7.
G. R. Warnes, B. Bolker, G. Gorjanc, G. Grothendieck, A. Korosec, T. Lumley, D. Mac-Queen, A. Magnusson, J. Rogers, and others. gdata: Various R Programming Tools for Data Manipulation, 2015. R package version 2.17.0.
I. Yoo, P. Alafaireet, M. Marinov, K. Pena-Hernandez, R. Gopidi, J.-F. Chang, and L. Hua. Data mining in healthcare and biomedicine: A survey of the literature. Journal of Medical Systems, 36(4):2431–2448, 2012.
Google Scholar
X. Zhu, A.B. Goldberg, R. Brachman, and T. Dietterich. Introduction to Semi-Supervised Learning. Morgan and Claypool Publishers, 2009.

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Nottingham, Jubilee Campus, NG8 1BB, Nottingham, UK
Isaac Triguero
Department of Computer Science and Artificial Intelligence, University of Granada, 18071, Granada, Spain
Sergio González, Salvador García, Jesús Alcalá-Fdez, Julián Luengo, Alberto Fernández & Francisco Herrera
Department of Computer Science, University of Oviedo, 33204, Gijón, Spain
Luciano Sánchez
Department of Computer Science and Numerical Analysis, University of Cordoba, 14071, Cordoba, Spain
Jose M. Moyano
Department of Computer Science, University of Jaén, Jaén, Spain
Maria José del Jesús

Authors

Isaac Triguero
View author publications
You can also search for this author in PubMed Google Scholar
Sergio González
View author publications
You can also search for this author in PubMed Google Scholar
Jose M. Moyano
View author publications
You can also search for this author in PubMed Google Scholar
Salvador García
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Alcalá-Fdez
View author publications
You can also search for this author in PubMed Google Scholar
Julián Luengo
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Maria José del Jesús
View author publications
You can also search for this author in PubMed Google Scholar
Luciano Sánchez
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isaac Triguero.

Rights and permissions

This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Reprints and permissions

About this article

Cite this article

Triguero, I., González, S., Moyano, J.M. et al. KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining. Int J Comput Intell Syst 10, 1238–1249 (2017). https://doi.org/10.2991/ijcis.10.1.82

Download citation

Received: 06 March 2017
Accepted: 09 September 2017
Published: 25 September 2017
Issue Date: January 2017
DOI: https://doi.org/10.2991/ijcis.10.1.82

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation