A Transformation Approach Towards Big Data Multilabel Decision Trees

Rivera Rivas, Antonio Jesús; Charte Ojeda, Francisco; Pulgar, Francisco Javier; del Jesus, Maria Jose

doi:10.1007/978-3-319-59153-7_7

A Transformation Approach Towards Big Data Multilabel Decision Trees

Antonio Jesús Rivera Rivas¹⁶,
Francisco Charte Ojeda¹⁶,
Francisco Javier Pulgar¹⁶ &
…
Maria Jose del Jesus¹⁶

Conference paper
First Online: 18 May 2017

1874 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10305))

Abstract

A large amount of the data processed nowadays is multilabel in nature. This means that every pattern usually belongs to several categories at once. Multilabel data are abundant, and most multilabel datasets are quite large. This causes that many multilabel classification methods struggle with their processing. Tackling this task by means of big data methods seems a logical choice. However, this approach has been scarcely explored by now. The present work introduces several big data multilabel classifiers, all of them based on decision trees. After detailing how they have been designed, their predictive performance, as well as the execution time, are analyzed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Card, Dens and many other multilabel characterization metrics can be easily obtained with the mldr package [14].
2.
All of them can be found in the RUMDR [23] repository.
3.
Names of metrics have been abbreviated to better fit them as column captions.

References

Kotsiantis, S.: Supervised machine learning: a review of classification techniques. In: Proceedings of Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real World AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, pp. 3–24. IOS Press (2007)
Google Scholar
Svetnik, V., Liaw, A., Tong, C., Culberson, J.C., Sheridan, R.P., Feuston, B.P.: Random forest: a classification and regression tool for compound classification and qsar modeling. J. Chem. Inf. Comput. Sci. 43(6), 1947–1958 (2003)
Article Google Scholar
Wieczorkowska, A., Synak, P., Raś, Z.: Multi-label classification of emotions in music. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining. AISC, vol. 35, pp. 307–315. Springer, Heidelberg (2006)
Chapter Google Scholar
Boutell, M., Luo, J., Shen, X., Brown, C.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)
Article Google Scholar
Charte, F., Rivera, A.J., del Jesus, M.J., Herrera, F.: QUINTA: a question tagging assistant to improve the answering ratio in electronic forums. In: Proceedings of IEEE International Conference on Computer as a Tool, EUROCON 2015, pp. 1–6. IEEE (2015)
Google Scholar
Herrera, F., Charte, F., Rivera, A.J., Del Jesus, M.J.: Multilabel Classification: Problem Analysis, Metrics and Techniques. Springer, Heidelberg (2016)
Book Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Steinberg, D., Colla, P.: CART: Tree-Structured Non-Parametric Data Analysis. Salford Systems, San Diego (1995)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993). ISBN 1-55860-238-0
Google Scholar
Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of 14th ACM International Conference on Multimedia, MULTIMEDIA 2006, pp. 421–430 (2006)
Google Scholar
Srivastava, A.N., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Aerospace Conference, pp. 3853–3862. IEEE (2005)
Google Scholar
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems, vol. 14, pp. 681–687. MIT Press (2001)
Google Scholar
Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Case studies and metrics. Multilabel Classification, pp. 33–63. Springer, Cham (2016). doi:10.1007/978-3-319-41111-8_3
Google Scholar
Charte, F., Charte, D.: Working with multilabel datasets in R: the mldr package. R. J. 7(2), 149–162 (2015)
Google Scholar
Clare, A., King, R.D.: Knowledge discovery in multi-label phenotype data. In: Raedt, L., Siebes, A. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001). doi:10.1007/3-540-44794-6_4
Chapter Google Scholar
Zhang, M.: Ml-rbf: RBF neural networks for multi-label learning. Neural Process. Lett. 29, 61–74 (2009)
Article Google Scholar
Zhang, M., Zhou, Z.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article MATH Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Google Scholar
Gillick, D., Faria, A., DeNero, J.: Mapreduce: distributed computing for machine learning, Berkley, 18 December 2006
Google Scholar
Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(34), 1–7 (2016)
MathSciNet MATH Google Scholar
del Río, S., López, V., Benítez, J.M., Herrera, F.: On the use of mapreduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014)
Article Google Scholar
Charte, F., Charte, D., Rivera, A., de Jesus, M.J., Herrera, F.: R ultimate multilabel dataset repository. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds.) HAIS 2016. LNCS (LNAI), vol. 9648, pp. 487–499. Springer, Cham (2016). doi:10.1007/978-3-319-32034-2_41
Chapter Google Scholar
Crammer, K., Dredze, M., Ganchev, K., Talukdar, P.P., Carroll, S.: Automatic code assignment to medical text. In: Proceedings of Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007, pp. 129–136. Association for Computational Linguistics (2007)
Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85, 333–359 (2011)
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work is partially supported by the Spanish Ministry of Science and Technology under project TIN2015-68454-R.

Author information

Authors and Affiliations

Department of Computer Science, University of Jaén, Jaén, Spain
Antonio Jesús Rivera Rivas, Francisco Charte Ojeda, Francisco Javier Pulgar & Maria Jose del Jesus

Authors

Antonio Jesús Rivera Rivas
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Charte Ojeda
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Javier Pulgar
View author publications
You can also search for this author in PubMed Google Scholar
Maria Jose del Jesus
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Jesús Rivera Rivas .

Editor information

Editors and Affiliations

Universidad de Granada , Granada, Spain
Ignacio Rojas
University of Malaga , Malaga, Spain
Gonzalo Joya
Polytechnic University of Catalonia, Vilanova i la Geltrú, Barcelona, Spain
Andreu Catala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rivera Rivas, A.J., Charte Ojeda, F., Pulgar, F.J., del Jesus, M.J. (2017). A Transformation Approach Towards Big Data Multilabel Decision Trees. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2017. Lecture Notes in Computer Science(), vol 10305. Springer, Cham. https://doi.org/10.1007/978-3-319-59153-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-59153-7_7
Published: 18 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59152-0
Online ISBN: 978-3-319-59153-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics