Abstract
Big data analytics systems (BDAS) are modern software systems with descriptive, predictive, or prescriptive purposes developed by current organizations. BDAS are viable due to the convergence of analytics techniques and the availability of sources of massive data, internal and external to the organization. BDAS are developed in a variety of domains of application such as marketing, healthcare, finance, manufacturing, logistics, education, and tourism, among others. However, although BDAS are modern software systems, organizations have used trial-and-error practical guidelines or old rigor-oriented heavyweight methodologies (a.k.a. plan-driven ones). The business competitive environment demands currently modern – i.e., lightweight or agile – BDAS development methodologies, and in the last years, the first modern methodologies have been proposed. However, studies contrasting rigor-oriented vs. lightweight or agile BDAS development methodologies are still scarce in the literature. In this chapter, we address this knowledge gap, and we report a comparative review between CRISP-DM – the main rigor-oriented BDAS methodology – and Team Data Science Process (TDSP), a new relevant proprietary agile one, by using a Scrum-XP workflow of practices as the theoretical agile development framework. Our comparative review provides theoretical and practical insights for discriminating both BDAS development approaches useful for researchers and practitioners in the BDAS development domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Martínez-Plumed, F., Contreras-Ochando, L., Ferri, C., Orallo, J.H., Kull, M., Lachiche, N., et al.: CRISP-DM twenty years later: from data mining processes to data science trajectories. IEEE Trans. Knowl. Data Eng. 33(8), 3048–3061 (2019)
Halper, F.: Next-generation analytics and platforms for business success. TDWI Research Report. https://tdwi.org/webcasts/2015/01/next-generation-analytics-and-platforms-for-business-success.aspx. Accessed 13 Dec 2022
Walker, J.: Big data strategies disappoint with 85 percent failure rate. Digital J. https://www.digitaljournal.com/tech-science/big-data-strategies-disappoint-with-85-percent-failure-rate/article/508325. Accessed 13 December 2022
Mariscal, G., Marban, O., Fernandez, C.: A survey of data mining and knowledge discovery process models and methodologies. Knowl. Eng. Rev. 25(2), 137–166 (2010)
Why do 87% of data science projects never make it into production?, VentureBeat. https://venturebeat.com/ai/why-do-87-of-data-science-projects-never-make-it-into-production/. Accessed 14 Dec 2022
Saltz, J., Hotz, N., Wild, D., Stirling, K.: Exploring project management methodologies used within data science teams. In: Paper presented at 24th Americas Conference on Information Systems 2018: digital Disruption, AMCIS 2018. Association for Information Systems (2018, 16–18 Aug)
Saltz, J.S.: The need for new processes, methodologies and tools to support big data teams and improve big data project effectiveness. In: Paper presented at 2015 IEEE International Conference on Big Data (Big Data). IEEE (2015, 29 Oct–01 Nov)
Ambler, S.W., Lines, M.: The disciplined agile process decision framework. In: Software Quality. The Future of Systems-and Software Development: paper Presented at 8th International Conference, SWQD 2016, Vienna, Austria, Proceedings. Springer International Publishing (2016, Jan 18–21)
Davenport, T. H., Dyché, J.: Big data in big companies. International Institute for Analytics, 3(1–31). https://www.iqpc.com/media/7863/11710.pdf. Accessed 14 Dec 2022
Larson, D., Chang, V.: A review and future direction of agile, business intelligence, analytics and data science. Int. J. Inf. Manag. 36(5), 700–710 (2016)
Dremel, C., Wulf, J., Herterich, M.M., Waizmann, J.C., Brenner, W.: How AUDI AG established big data analytics in its digital transformation. MISQE. 16(2), 81 (2017)
Baijens, J., Helms, R.W.: Developments in knowledge discovery processes and methodologies: anything new? In: Paper presented at Twenty-fifth Americas Conference on Information Systems (2019, 15–19 Aug)
Grady, N.W., Payne, J.A., Parker, H.: Agile big data analytics: AnalyticsOps for data science. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 2331–2339. IEEE (2017, 11–14 Dec)
15th Annual State of Agile Report. Digital.Ai. https://digital.ai/catalyst-blog/15th-state-of-agile-report-agile-leads-the-way-through-the-pandemic-and-digital/. Accessed 15 Dec 2022
Piatetsky, G.: CRISP-DM, still the top methodology for analytics, data mining, or data science projects. KDD News. https://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html. Accessed 15 Dec 2022
Schmidt, C., Sun, W.N.: Synthesizing agile and knowledge discovery: case study results. J. Comput. Inf. Syst. 58(2), 142–150 (2018)
do Nascimento, G.S., de Oliveira, A.A.: An agile knowledge discovery in databases software process. In: Paper presented at Data and Knowledge Engineering: third International Conference, ICDKE 2012, Wuyishan, Fujian, China. Proceedings. Springer, Berlin Heidelberg (2012, 21–23 Nov)
Grady, N.W., Payne, J.A., Parker, H.: Agile big data analytics: AnalyticsOps for data science. In: Paper presented at 2017 IEEE international conference on big data (big data). IEEE (2017, 11–14 Dec)
Cooper, H.M.: Organizing knowledge syntheses: a taxonomy of literature reviews. Knowl. Soc. 1(1), 104–126 (1988)
Templier, M., Paré, G.: A framework for guiding and evaluating literature reviews. Commun. Assoc. Inf. Syst. 37(1), 112–137 (2015)
Cox, M., Ellsworth, D.: Managing big data for scientific visualization. In: ACM siggraph, vol. 97, pp. 21–38. MRJ/NASA Ames Research Center (1997)
Gandomi, A., Haider, M.: Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 35(2), 137–144 (2015)
Rich S.: Big Data is a “New Natural Resource” http://www.govtech.com/policy-management/Big-Data-Is-a-New-Natural-Resource-IBM-Says.html. Accessed 20 Dec 2022
Watson, H.J.: Tutorial: Big data analytics: concepts, technologies, and applications. Commun. Assoc. Inf. Syst. 34(1), 65 (2014)
Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MISQ. 36, 1165–1188 (2012)
Lee, T., Lee, H., Rhee, K.H., Shin, U.S.: The efficient implementation of distributed indexing with Hadoop for digital investigations on Big Data. Comput. Sci. Inf. Syst. 11(3), 1037–1054 (2014)
Russom, P.: Big data analytics. TDWI best practices report, fourth quarter. 19(4), 1–34 (2022) https://tdwi.org/research/2011/09/best-practices-report-q4-big-data-analytics.aspx?tc=page0&tc=assetpg&tc=page0&tc=assetpg&m=1. Accessed 21 Dec 2022
Schroeck, M., Shockley, R., Smart, J., Romero-Morales, D., Tufano, P.: Analytics: the real-world use of big data. IBM Global Business Services. Accessed https://www.ibm.com/downloads/cas/VXOJQW1L. 21 Dec 2022
Kitchin, R., Lauriault, T.P.: Small data in the era of big data. GeoJournal. 80(4), 463–475 (2015)
Kitchin, R.: The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. Sage (2014)
Katsis, Y., Balac, N., Chapman, D., Kapoor, M., Block, J., Griswold, W.G., et al.: Big data techniques for public health: a case study. In: Paper presented at 2017 IEEE/ACM International Conference on Connected Health: applications, Systems and Engineering Technologies (CHASE), pp. 222–231. IEEE (2017, 17–19 July)
Fowler, M., Highsmith, J.: Manifesto for Agile Software Development. https://agilemanifesto.org/. Accessed 22 Dec 2022
Campanelli, A.S., Parreiras, F.S.: Agile methods tailoring–a systematic literature review. J. Syst. Softw. 110, 85–100 (2015)
Stavru, S.: A critical examination of recent industrial surveys on agile method usage. J. Syst. Softw. 94, 87–97 (2014)
Tripp, F., Armstrong, D.J.: Agile methodologies: organizational adoption motives, tailoring, and performance. J. Comput. Inf. Syst. 58(2), 170–179 (2018)
Sutherland, J., Schwaber, K.: The scrum guide. The definitive guide to scrum: the rules of the game. Scrum.org. https://scrumguides.org/docs/scrumguide/v2020/2020-Scrum-Guide-US.pdf. Accessed 10 Jan 2023
Schwaber, K.: Scrum development process. In: Business Object Design and Implementation, pp. 117–134. Springer, London (1997)
Dudziak, T.: Extreme programming an overview. Methoden and Werkzeuge der Software: produktion WS. pp. 1–28 (1999)
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0: step-by-step data mining guide. SPSS inc. 9(13), 1–73 (2000)
Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. Paper presented at Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining. 11–13 April 2000
Data Science Process Documentation.: Microsoft Team https://learn.microsoft.com/en-us/azure/architecture/data-science-process/overview. Accessed 23 Jan 2023
Marbán, O., Segovia, J., Menasalvas, E., Fernández-Baizán, C.: Toward data mining engineering: a software engineering approach. Inf. Syst. 34(1), 87–107 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Salazar-Salazar, G., Mora, M., Duran-Limon, H.A., Rodríguez, F.J.Á. (2024). A Selective Comparative Review of CRISP-DM and TDSP Development Methodologies for Big Data Analytics Systems. In: Mora, M., Wang, F., Marx Gomez, J., Duran-Limon, H. (eds) Development Methodologies for Big Data Analytics Systems. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-031-40956-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-40956-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40955-4
Online ISBN: 978-3-031-40956-1
eBook Packages: EngineeringEngineering (R0)