Skip to main content

Data Science Methodologies – A Benchmarking Study

  • Conference paper
  • First Online:
Advanced Research in Technologies, Information, Innovation and Sustainability (ARTIIS 2023)

Abstract

There are several Data Science methodologies that entities and organizations have daily contact with however real-time decision support is seen as a decisive factor for success in making a decision. Due to the complexity, quantity, and diversity of data currently existing, a set of Data Science methodologies has emerged that help in the implementation of solutions. This article arises, fundamentally, with the purpose of answering the following question: What is the most complete and comprehensive data science methodology for any Data Science project? In carrying out this article, twenty-four methodologies were found and analyzed in detail. This study was based on a comparative benchmarking of methodologies, consisting of three phases of analysis, a first that evaluates and compares the phases of all the methodologies collected, a second that analyzes, compares and evaluates the cost, usability, maintenance, scalability, precision, speed, flexibility, reliability, explainability, interpretability, cyclicity and the support of OLAP technology by each methodology, and a third phase where the previous evaluations are compiled and the methodologies with the best results are returned. Quotes. After the three analyses, the methodologies that stood out the most were AgileData.io and IBM – Base Methodology for Data Science, however both obtained a quotation of 63.03%, which demonstrates a low percentage compared to the requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Azevedo, A., Santos, M.F.: KDD, semma and CRISP-DM: a parallel overview. IADIS European Conference on Data Mining, pp. 182–185 (2008). http://recipp.ipp.pt/bitstream/10400.22/136/3/KDD-CRISP-SEMMA.pdf

  2. Shafique, U., Qaiser, H.: A comparative study of data mining process models (KDD, CRISP-DM and SEMMA). Int. J. Innov. Sci. Res. 12(1), 217–222 (2014). http://www.ijisr.issr-journals.org/

  3. Yessad, L., Labiod, A.: Comparative study of data warehouses modeling approaches: Inmon, Kimball and data vault. In: 2016 International Conference on System Reliability and Science ICSRS 2016 - Proceedings, pp. 95–99 (2017). https://doi.org/10.1109/ICSRS.2016.7815845

  4. AgileData.io Limited. AGILEDATA.IO. agiledata.io (2023). https://agiledata.io/

  5. Di Tria, F., Lefons, E., Tangorra, F.: A proposal of methodology for designing big data warehouses. Preprints, no. June, p. 2018. https://doi.org/10.20944/preprints201806.0219.v1

  6. Paneque, M., del M. Roldán-García, M., García-Nieto, J.: e-LION: data integration semantic model to enhance predictive analytics in e-learning. Expert Syst. Appl. 213, 118892 (2023). https://doi.org/10.1016/j.eswa.2022.118892

  7. Sawadogo, P., Darmont, J.: On data lake architectures and metadata management. J. Intell. Inf. Syst. 56(1), 97–120 (2021). https://doi.org/10.1007/s10844-020-00608-7

    Article  Google Scholar 

  8. Haertel, C., Pohl, M., Staegemann, D., Turowski, K.: Project artifacts for the data science lifecycle: a comprehensive overview. In: Proceedings of - 2022 IEEE International Conference on Big Data (Big Data) 2022, pp. 2645–2654 (2022). https://doi.org/10.1109/BigData55660.2022.10020291

  9. geeksforgeeks. Data science process. geeksforgeeks (2023). https://www.geeksforgeeks.org/data-science-process/

  10. Campos, L.: A complete guide to data mining and how to use it. HubSpot (2023). https://blog.hubspot.com/website/data-mining

  11. IBM. IBM analytics solution unified method. IBM (2015). http://i2t.icesi.edu.co/ASUM-DM_External/index.htm#cognos.external.asum-DM_Teaser/deliveryprocesses/ASUM-DM_8A5C87D5.html

  12. Ceri, S., Fraternali, P.: The story of the idea methodology. In: Olivé, A., Pastor, J.A. (eds.) CAiSE 1997. LNCS, vol. 1250, pp. 1–17. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-63107-0_1

    Chapter  Google Scholar 

  13. Grady, N.W., Payne, J.A., Parker, H.: Agile big data analytics: analyticsops for data science. In: Proceedings of - 2017 IEEE International Conference on Big Data (Big Data) 2017, vol. 2018-Janua, pp. 2331–2339 (2017). https://doi.org/10.1109/BigData.2017.8258187

  14. Rollins, J.B.: Metodologia de base para ciência de dados. IBM Anal. Route 100 Somers, NY 10589 (2015). https://www.ibm.com/downloads/cas/B1WQ0GM2

  15. Lean. Agile framework for managing data science product and projects. leands.ai (2023). https://leands.ai/

  16. Kumari, K., Bhardwaj, M., Sharma, S.: OSEMN approach for real time data analysis. Int. J. Eng. Manag. Res. 10(02), 107–110 (2020). https://doi.org/10.31033/ijemr.10.2.11

  17. Microsoft. What is the Team Data Science Process?. Microsoft (2023). https://learn.microsoft.com/en-us/azure/architecture/data-science-process/overview

  18. Astera Software. Automação de data warehouse. Astera.com (2023). https://www.astera.com/pt/knowledge-center/data-warehouse-automation-a-complete-guide/

  19. IBM. Dimensional modeling life cycle and work flow. ibm.com (2021). https://www.ibm.com/docs/en/ida/9.1.2?topic=modeling-dimensional-life-cycle-work-flow

Download references

Acknowledgement

This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filipe Portela .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Machado, L., Portela, F. (2024). Data Science Methodologies – A Benchmarking Study. In: Guarda, T., Portela, F., Diaz-Nafria, J.M. (eds) Advanced Research in Technologies, Information, Innovation and Sustainability. ARTIIS 2023. Communications in Computer and Information Science, vol 1935. Springer, Cham. https://doi.org/10.1007/978-3-031-48858-0_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48858-0_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48857-3

  • Online ISBN: 978-3-031-48858-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics