Skip to main content

A Selective Comparative Review of CRISP-DM and TDSP Development Methodologies for Big Data Analytics Systems

  • Chapter
  • First Online:
Development Methodologies for Big Data Analytics Systems

Abstract

Big data analytics systems (BDAS) are modern software systems with descriptive, predictive, or prescriptive purposes developed by current organizations. BDAS are viable due to the convergence of analytics techniques and the availability of sources of massive data, internal and external to the organization. BDAS are developed in a variety of domains of application such as marketing, healthcare, finance, manufacturing, logistics, education, and tourism, among others. However, although BDAS are modern software systems, organizations have used trial-and-error practical guidelines or old rigor-oriented heavyweight methodologies (a.k.a. plan-driven ones). The business competitive environment demands currently modern – i.e., lightweight or agile – BDAS development methodologies, and in the last years, the first modern methodologies have been proposed. However, studies contrasting rigor-oriented vs. lightweight or agile BDAS development methodologies are still scarce in the literature. In this chapter, we address this knowledge gap, and we report a comparative review between CRISP-DM – the main rigor-oriented BDAS methodology – and Team Data Science Process (TDSP), a new relevant proprietary agile one, by using a Scrum-XP workflow of practices as the theoretical agile development framework. Our comparative review provides theoretical and practical insights for discriminating both BDAS development approaches useful for researchers and practitioners in the BDAS development domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Martínez-Plumed, F., Contreras-Ochando, L., Ferri, C., Orallo, J.H., Kull, M., Lachiche, N., et al.: CRISP-DM twenty years later: from data mining processes to data science trajectories. IEEE Trans. Knowl. Data Eng. 33(8), 3048–3061 (2019)

    Article  Google Scholar 

  2. Halper, F.: Next-generation analytics and platforms for business success. TDWI Research Report. https://tdwi.org/webcasts/2015/01/next-generation-analytics-and-platforms-for-business-success.aspx. Accessed 13 Dec 2022

  3. Walker, J.: Big data strategies disappoint with 85 percent failure rate. Digital J. https://www.digitaljournal.com/tech-science/big-data-strategies-disappoint-with-85-percent-failure-rate/article/508325. Accessed 13 December 2022

  4. Mariscal, G., Marban, O., Fernandez, C.: A survey of data mining and knowledge discovery process models and methodologies. Knowl. Eng. Rev. 25(2), 137–166 (2010)

    Article  Google Scholar 

  5. Why do 87% of data science projects never make it into production?, VentureBeat. https://venturebeat.com/ai/why-do-87-of-data-science-projects-never-make-it-into-production/. Accessed 14 Dec 2022

  6. Saltz, J., Hotz, N., Wild, D., Stirling, K.: Exploring project management methodologies used within data science teams. In: Paper presented at 24th Americas Conference on Information Systems 2018: digital Disruption, AMCIS 2018. Association for Information Systems (2018, 16–18 Aug)

    Google Scholar 

  7. Saltz, J.S.: The need for new processes, methodologies and tools to support big data teams and improve big data project effectiveness. In: Paper presented at 2015 IEEE International Conference on Big Data (Big Data). IEEE (2015, 29 Oct–01 Nov)

    Google Scholar 

  8. Ambler, S.W., Lines, M.: The disciplined agile process decision framework. In: Software Quality. The Future of Systems-and Software Development: paper Presented at 8th International Conference, SWQD 2016, Vienna, Austria, Proceedings. Springer International Publishing (2016, Jan 18–21)

    Google Scholar 

  9. Davenport, T. H., Dyché, J.: Big data in big companies. International Institute for Analytics, 3(1–31). https://www.iqpc.com/media/7863/11710.pdf. Accessed 14 Dec 2022

  10. Larson, D., Chang, V.: A review and future direction of agile, business intelligence, analytics and data science. Int. J. Inf. Manag. 36(5), 700–710 (2016)

    Article  Google Scholar 

  11. Dremel, C., Wulf, J., Herterich, M.M., Waizmann, J.C., Brenner, W.: How AUDI AG established big data analytics in its digital transformation. MISQE. 16(2), 81 (2017)

    Google Scholar 

  12. Baijens, J., Helms, R.W.: Developments in knowledge discovery processes and methodologies: anything new? In: Paper presented at Twenty-fifth Americas Conference on Information Systems (2019, 15–19 Aug)

    Google Scholar 

  13. Grady, N.W., Payne, J.A., Parker, H.: Agile big data analytics: AnalyticsOps for data science. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 2331–2339. IEEE (2017, 11–14 Dec)

    Chapter  Google Scholar 

  14. 15th Annual State of Agile Report. Digital.Ai. https://digital.ai/catalyst-blog/15th-state-of-agile-report-agile-leads-the-way-through-the-pandemic-and-digital/. Accessed 15 Dec 2022

  15. Piatetsky, G.: CRISP-DM, still the top methodology for analytics, data mining, or data science projects. KDD News. https://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html. Accessed 15 Dec 2022

  16. Schmidt, C., Sun, W.N.: Synthesizing agile and knowledge discovery: case study results. J. Comput. Inf. Syst. 58(2), 142–150 (2018)

    Google Scholar 

  17. do Nascimento, G.S., de Oliveira, A.A.: An agile knowledge discovery in databases software process. In: Paper presented at Data and Knowledge Engineering: third International Conference, ICDKE 2012, Wuyishan, Fujian, China. Proceedings. Springer, Berlin Heidelberg (2012, 21–23 Nov)

    Google Scholar 

  18. Grady, N.W., Payne, J.A., Parker, H.: Agile big data analytics: AnalyticsOps for data science. In: Paper presented at 2017 IEEE international conference on big data (big data). IEEE (2017, 11–14 Dec)

    Google Scholar 

  19. Cooper, H.M.: Organizing knowledge syntheses: a taxonomy of literature reviews. Knowl. Soc. 1(1), 104–126 (1988)

    Google Scholar 

  20. Templier, M., Paré, G.: A framework for guiding and evaluating literature reviews. Commun. Assoc. Inf. Syst. 37(1), 112–137 (2015)

    Google Scholar 

  21. Cox, M., Ellsworth, D.: Managing big data for scientific visualization. In: ACM siggraph, vol. 97, pp. 21–38. MRJ/NASA Ames Research Center (1997)

    Google Scholar 

  22. Gandomi, A., Haider, M.: Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)

    Article  Google Scholar 

  23. Rich S.: Big Data is a “New Natural Resource” http://www.govtech.com/policy-management/Big-Data-Is-a-New-Natural-Resource-IBM-Says.html. Accessed 20 Dec 2022

  24. Watson, H.J.: Tutorial: Big data analytics: concepts, technologies, and applications. Commun. Assoc. Inf. Syst. 34(1), 65 (2014)

    Google Scholar 

  25. Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MISQ. 36, 1165–1188 (2012)

    Article  Google Scholar 

  26. Lee, T., Lee, H., Rhee, K.H., Shin, U.S.: The efficient implementation of distributed indexing with Hadoop for digital investigations on Big Data. Comput. Sci. Inf. Syst. 11(3), 1037–1054 (2014)

    Article  Google Scholar 

  27. Russom, P.: Big data analytics. TDWI best practices report, fourth quarter. 19(4), 1–34 (2022) https://tdwi.org/research/2011/09/best-practices-report-q4-big-data-analytics.aspx?tc=page0&tc=assetpg&tc=page0&tc=assetpg&m=1. Accessed 21 Dec 2022

    Google Scholar 

  28. Schroeck, M., Shockley, R., Smart, J., Romero-Morales, D., Tufano, P.: Analytics: the real-world use of big data. IBM Global Business Services. Accessed https://www.ibm.com/downloads/cas/VXOJQW1L. 21 Dec 2022

  29. Kitchin, R., Lauriault, T.P.: Small data in the era of big data. GeoJournal. 80(4), 463–475 (2015)

    Article  Google Scholar 

  30. Kitchin, R.: The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. Sage (2014)

    Google Scholar 

  31. Katsis, Y., Balac, N., Chapman, D., Kapoor, M., Block, J., Griswold, W.G., et al.: Big data techniques for public health: a case study. In: Paper presented at 2017 IEEE/ACM International Conference on Connected Health: applications, Systems and Engineering Technologies (CHASE), pp. 222–231. IEEE (2017, 17–19 July)

    Google Scholar 

  32. Fowler, M., Highsmith, J.: Manifesto for Agile Software Development. https://agilemanifesto.org/. Accessed 22 Dec 2022

  33. Campanelli, A.S., Parreiras, F.S.: Agile methods tailoring–a systematic literature review. J. Syst. Softw. 110, 85–100 (2015)

    Article  Google Scholar 

  34. Stavru, S.: A critical examination of recent industrial surveys on agile method usage. J. Syst. Softw. 94, 87–97 (2014)

    Article  Google Scholar 

  35. Tripp, F., Armstrong, D.J.: Agile methodologies: organizational adoption motives, tailoring, and performance. J. Comput. Inf. Syst. 58(2), 170–179 (2018)

    Google Scholar 

  36. Sutherland, J., Schwaber, K.: The scrum guide. The definitive guide to scrum: the rules of the game. Scrum.org. https://scrumguides.org/docs/scrumguide/v2020/2020-Scrum-Guide-US.pdf. Accessed 10 Jan 2023

  37. Schwaber, K.: Scrum development process. In: Business Object Design and Implementation, pp. 117–134. Springer, London (1997)

    Chapter  Google Scholar 

  38. Dudziak, T.: Extreme programming an overview. Methoden and Werkzeuge der Software: produktion WS. pp. 1–28 (1999)

    Google Scholar 

  39. Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0: step-by-step data mining guide. SPSS inc. 9(13), 1–73 (2000)

    Google Scholar 

  40. Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. Paper presented at Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining. 11–13 April 2000

    Google Scholar 

  41. Data Science Process Documentation.: Microsoft Team https://learn.microsoft.com/en-us/azure/architecture/data-science-process/overview. Accessed 23 Jan 2023

  42. Marbán, O., Segovia, J., Menasalvas, E., Fernández-Baizán, C.: Toward data mining engineering: a software engineering approach. Inf. Syst. 34(1), 87–107 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gerardo Salazar-Salazar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Salazar-Salazar, G., Mora, M., Duran-Limon, H.A., Rodríguez, F.J.Á. (2024). A Selective Comparative Review of CRISP-DM and TDSP Development Methodologies for Big Data Analytics Systems. In: Mora, M., Wang, F., Marx Gomez, J., Duran-Limon, H. (eds) Development Methodologies for Big Data Analytics Systems. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-031-40956-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-40956-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-40955-4

  • Online ISBN: 978-3-031-40956-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics