Abstract
There are several Data Science methodologies that entities and organizations have daily contact with however real-time decision support is seen as a decisive factor for success in making a decision. Due to the complexity, quantity, and diversity of data currently existing, a set of Data Science methodologies has emerged that help in the implementation of solutions. This article arises, fundamentally, with the purpose of answering the following question: What is the most complete and comprehensive data science methodology for any Data Science project? In carrying out this article, twenty-four methodologies were found and analyzed in detail. This study was based on a comparative benchmarking of methodologies, consisting of three phases of analysis, a first that evaluates and compares the phases of all the methodologies collected, a second that analyzes, compares and evaluates the cost, usability, maintenance, scalability, precision, speed, flexibility, reliability, explainability, interpretability, cyclicity and the support of OLAP technology by each methodology, and a third phase where the previous evaluations are compiled and the methodologies with the best results are returned. Quotes. After the three analyses, the methodologies that stood out the most were AgileData.io and IBM – Base Methodology for Data Science, however both obtained a quotation of 63.03%, which demonstrates a low percentage compared to the requirements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Azevedo, A., Santos, M.F.: KDD, semma and CRISP-DM: a parallel overview. IADIS European Conference on Data Mining, pp. 182–185 (2008). http://recipp.ipp.pt/bitstream/10400.22/136/3/KDD-CRISP-SEMMA.pdf
Shafique, U., Qaiser, H.: A comparative study of data mining process models (KDD, CRISP-DM and SEMMA). Int. J. Innov. Sci. Res. 12(1), 217–222 (2014). http://www.ijisr.issr-journals.org/
Yessad, L., Labiod, A.: Comparative study of data warehouses modeling approaches: Inmon, Kimball and data vault. In: 2016 International Conference on System Reliability and Science ICSRS 2016 - Proceedings, pp. 95–99 (2017). https://doi.org/10.1109/ICSRS.2016.7815845
AgileData.io Limited. AGILEDATA.IO. agiledata.io (2023). https://agiledata.io/
Di Tria, F., Lefons, E., Tangorra, F.: A proposal of methodology for designing big data warehouses. Preprints, no. June, p. 2018. https://doi.org/10.20944/preprints201806.0219.v1
Paneque, M., del M. Roldán-García, M., García-Nieto, J.: e-LION: data integration semantic model to enhance predictive analytics in e-learning. Expert Syst. Appl. 213, 118892 (2023). https://doi.org/10.1016/j.eswa.2022.118892
Sawadogo, P., Darmont, J.: On data lake architectures and metadata management. J. Intell. Inf. Syst. 56(1), 97–120 (2021). https://doi.org/10.1007/s10844-020-00608-7
Haertel, C., Pohl, M., Staegemann, D., Turowski, K.: Project artifacts for the data science lifecycle: a comprehensive overview. In: Proceedings of - 2022 IEEE International Conference on Big Data (Big Data) 2022, pp. 2645–2654 (2022). https://doi.org/10.1109/BigData55660.2022.10020291
geeksforgeeks. Data science process. geeksforgeeks (2023). https://www.geeksforgeeks.org/data-science-process/
Campos, L.: A complete guide to data mining and how to use it. HubSpot (2023). https://blog.hubspot.com/website/data-mining
IBM. IBM analytics solution unified method. IBM (2015). http://i2t.icesi.edu.co/ASUM-DM_External/index.htm#cognos.external.asum-DM_Teaser/deliveryprocesses/ASUM-DM_8A5C87D5.html
Ceri, S., Fraternali, P.: The story of the idea methodology. In: Olivé, A., Pastor, J.A. (eds.) CAiSE 1997. LNCS, vol. 1250, pp. 1–17. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63107-0_1
Grady, N.W., Payne, J.A., Parker, H.: Agile big data analytics: analyticsops for data science. In: Proceedings of - 2017 IEEE International Conference on Big Data (Big Data) 2017, vol. 2018-Janua, pp. 2331–2339 (2017). https://doi.org/10.1109/BigData.2017.8258187
Rollins, J.B.: Metodologia de base para ciência de dados. IBM Anal. Route 100 Somers, NY 10589 (2015). https://www.ibm.com/downloads/cas/B1WQ0GM2
Lean. Agile framework for managing data science product and projects. leands.ai (2023). https://leands.ai/
Kumari, K., Bhardwaj, M., Sharma, S.: OSEMN approach for real time data analysis. Int. J. Eng. Manag. Res. 10(02), 107–110 (2020). https://doi.org/10.31033/ijemr.10.2.11
Microsoft. What is the Team Data Science Process?. Microsoft (2023). https://learn.microsoft.com/en-us/azure/architecture/data-science-process/overview
Astera Software. Automação de data warehouse. Astera.com (2023). https://www.astera.com/pt/knowledge-center/data-warehouse-automation-a-complete-guide/
IBM. Dimensional modeling life cycle and work flow. ibm.com (2021). https://www.ibm.com/docs/en/ida/9.1.2?topic=modeling-dimensional-life-cycle-work-flow
Acknowledgement
This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Machado, L., Portela, F. (2024). Data Science Methodologies – A Benchmarking Study. In: Guarda, T., Portela, F., Diaz-Nafria, J.M. (eds) Advanced Research in Technologies, Information, Innovation and Sustainability. ARTIIS 2023. Communications in Computer and Information Science, vol 1935. Springer, Cham. https://doi.org/10.1007/978-3-031-48858-0_42
Download citation
DOI: https://doi.org/10.1007/978-3-031-48858-0_42
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48857-3
Online ISBN: 978-3-031-48858-0
eBook Packages: Computer ScienceComputer Science (R0)