Skip to main content

On Developing Data Science

  • Chapter
  • First Online:
Applied Data Science

Abstract

Understanding phenomena based on the facts—on the data—is a touchstone of data science. The power of evidence-based, inductive reasoning distinguishes data science from science. Hence, this chapter argues that, in its initial stages, data science applications and the data science discipline itself be developed inductively and deductively in a virtuous cycle.

The virtues of the twentieth Century Virtuous Cycle (aka virtuous hardware-software cycle, Intel-Microsoft virtuous cycle) that built the personal computer industry (National Research Council, The new global ecosystem in advanced computing: Implications for U.S. competitiveness and national security. The National Academies Press, Washington, DC, 2012) were being grounded in reality and being self-perpetuating—more powerful hardware enabled more powerful software that required more powerful hardware, enabling yet more powerful software, and so forth. Being grounded in reality—solving genuine problems at scale—was critical to its success, as it will be for data science. While it lasted, it was self-perpetuating, due to a constant flow of innovation, and to benefitting all participants—producers, consumers, the industry, the economy, and society. It is a wonderful success story for twentieth Century applied science. Given the success of virtuous cycles in developing modern technology, virtuous cycles grounded in reality should be used to develop data science, driven by the wisdom of the sixteenth Century proverb, Necessity is the mother of invention.

This chapter explores this hypothesis using the example of the evolution of database management systems over the last 40 years. For the application of data science to be successful and virtuous, it should be grounded in a cycle that encompasses industry (i.e., real problems), research, development, and delivery. This chapter proposes applying the principles and lessons of the virtuous cycle to the development of data science applications; to the development of the data science discipline itself, for example, a data science method; and to the development of data science education; all focusing on the critical role of collaboration in data science research and management, thereby addressing the development challenges faced by the more than 150 Data Science Research Institutes (DSRIs) worldwide. A companion chapter (Brodie, What is Data Science, in Braschler et al (Eds.), Applied data science – Lessons learned for the data-driven business, Springer 2019), addresses essential questions that DSRIs should answer in preparation for the developments proposed here: What is data science? What is world-class data science research?

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • ACM. (2015). Michael Stonebraker, 2014 Turing Award Citation, Association of Computing Machinery, April 2015. http://amturing.acm.org/award_winners/stonebraker_1172121.cfm

  • AJTR. (2018). American Journal of Translational Research, e-Century Publishing Corporation. http://www.ajtr.org

  • Angwin, J., Larson, J., Mattu, S., Kirchner, L., Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks, ProPublica, May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

  • Braschler, M., Stadelmann, T., & Stockinger, K. (Eds.). (2019). Applied data science – Lessons learned for the data-driven business. Berlin: Springer.

    Google Scholar 

  • Brodie, M. L. (2015). Understanding data science: An emerging discipline for data-intensive discovery. In S. Cutt (Ed.), Getting data right: Tackling the challenges of big data volume and variety. Sebastopol, CA: O’Reilly Media.

    Google Scholar 

  • Brodie, M. L. (2019a). What is data science? In M. Braschler, T. Stadelmann, & K. Stockinger (Eds.), Applied data science – Lessons learned for the data-driven business. Berlin: Springer.

    Google Scholar 

  • Brodie, M. L. (Ed.). (2019b, January). Making databases work: The pragmatic wisdom of Michael Stonebraker. ACM Books series (Vol. 22). San Rafael, CA: Morgan & Claypool.

    Google Scholar 

  • Chipman, I., (2016). How data analytics is going to transform all industries. Stanford Engineering Magazine, February 13, 2016.

    Google Scholar 

  • Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.

    Article  Google Scholar 

  • Davenport, T. H., & Patil, D. J. (2012). Data scientist: The sexiest job of the 21st century. Harvard Business Review, 90(10), 70–76.

    Google Scholar 

  • Demirkan, H. & Dal, B. (2014). The data economy: Why do so many analytics projects fail? Analytics Magazine, July/August 2014

    Google Scholar 

  • Dohzen, T., Pamuk, M., Seong, S. W., Hammer, J., & Stonebraker, M. (2006). Data integration through transform reuse in the Morpheus project (pp. 736–738). ACM SIGMOD International Conference on Management of Data, Chicago, IL, June 27–29, 2006.

    Google Scholar 

  • Economist. (2017). Who’s afraid of disruption? The business world is obsessed with digital disruption, but it has had little impact on profits, The Economist, September 30, 2017.

    Google Scholar 

  • Economist. (March 2018a). GrAIt expectations, Special Report AI in Business, The Economist, March 31, 2018.

    Google Scholar 

  • Economist. (March 2018b). External providers: Leave it to the experts, Special report AI in business, The Economist, March 31, 2018.

    Google Scholar 

  • Economist. (March 2018c). The future: Two-faced, Special report AI in business, The Economist, March 31, 2018.

    Google Scholar 

  • Economist. (March 2018d). Supply chains: In algorithms we trust, Special report AI in business, The Economist, March 31, 2018.

    Google Scholar 

  • Economist. (March 2018e). America v China: The battle for digital supremacy: America’s technological hegemony is under threat from China, The Economist, March 15, 2018.

    Google Scholar 

  • Economist. (2018f). A study finds nearly half of jobs are vulnerable to automation, The Economist, April 24, 2018.

    Google Scholar 

  • Fang, F. C., & Casadevall, A. (2010). Lost in translation-basic science in the era of translational research. Infection and Immunity, 78(2), 563–566.

    Article  Google Scholar 

  • Forrester. (2015a). Brief: Why data-driven aspirations fail. Forrester Research, Inc., October 7, 2015.

    Google Scholar 

  • Forrester. (2015b). Predictions 2016: The path from data to action for marketers: How marketers will elevate systems of insight. Forrester Research, November 9, 2015.

    Google Scholar 

  • Forrester. (2017). The Forrester WaveTM: Data preparation tools, Q1 2017, Forrester, March 13, 2017.

    Google Scholar 

  • Gartner G00310700. (2016). Survey analysis: Big data investments begin tapering in 2016, Gartner, September 19, 2016.

    Google Scholar 

  • Gartner G00316349. (2016). Predicts 2017: Analytics strategy and technology, Gartner, report G00316349, November 30, 2016.

    Google Scholar 

  • Gartner G00301536. (2017). 2017 Magic quadrant for data science platforms, 14 February 2017.

    Google Scholar 

  • Gartner G00315888. (2017) Market guide for data preparation, Gartner, 14 December 2017.

    Google Scholar 

  • Gartner G00326671. (2017). Critical capabilities for data science platforms, Gartner, June 7, 2017.

    Google Scholar 

  • Gartner G00326456. (2018). Magic quadrant for data science and machine-learning platforms, 22 February 2018.

    Google Scholar 

  • Gartner G00326555. (2018). Magic quadrant for analytics and business intelligence platforms, 26 February 2018.

    Google Scholar 

  • Gartner G00335261. (2018) Critical capabilities for data science and machine learning platforms, 4 April 2018.

    Google Scholar 

  • Harari, Y. N. (2016). Homo Deus: A brief history of tomorrow, Random House, 2016.

    Google Scholar 

  • Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8), e124.

    Article  Google Scholar 

  • Lee, K-F., The real threat of artificial intelligence. New York Times, June 24, 2017.

    Google Scholar 

  • Lohr, S. & Singer, N. (2016) How data failed us in calling an election. New York Times, November 10, 2016.

    Google Scholar 

  • Marr, B., (2017). How big data is transforming every business. In Every Industry, Forbes.com, November 21, 2017.

  • Meierhofer, J., Stadelmann, T., & Cieliebak, M. (2019). Data products. In M. Braschler, T. Stadelmann, & K. Stockinger (Eds.), Applied data science – Lessons learned for the data-driven business. Berlin: Springer.

    Google Scholar 

  • Nagarajan, M., et al. (2015). Predicting future scientific discoveries based on a networked analysis of the past literature. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’15) (pp. 2019–2028). New York, NY: ACM.

    Chapter  Google Scholar 

  • National Research Council. (2012). The new global ecosystem in advanced computing: Implications for U.S. competitiveness and national security. Washington, DC: The National Academies Press.

    Google Scholar 

  • Naumann, F. (2018). Genealogy of relational database management systems. Hasso-Plattner Institüt, Universität, Potsdam. https://hpi.de/naumann/projects/rdbms-genealogy.html

  • Nedelkoska, L., & Quintini, G. (2018) Automation, skills use and training. OECD Social, Employment and Migration Working Papers, No. 202, OECD Publishing, Paris, doi:https://doi.org/10.1787/2e2f4eea-en.

  • New York Times. (2018). H&M, a Fashion Giant, has a problem: $4.3 Billion in unsold clothes. New York Times, March 27, 2018.

    Google Scholar 

  • O’Neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. New York, NY: Crown Publishing Group.

    MATH  Google Scholar 

  • Olson, M. (2019). Stonebraker and open source, to appear in (Brodie 2019b)

    Google Scholar 

  • Palmer, A. (2019) How to create & run a Stonebraker Startup – The Real Story, to appear in (Brodie 2019b).

    Google Scholar 

  • Piatetsky, G. (2016). Trump, failure of prediction, and lessons for data scientists, KDnuggets, November 2016.

    Google Scholar 

  • Ramanathan, A. (2016). The data science delusion, Medium.com, November 18, 2016.

  • Russel, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). Boston, MA: Pearson Education.

    Google Scholar 

  • Spangler, S., et al. (2014). Automated hypothesis generation based on mining scientific literature. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14) (pp. 1877–1886). New York, NY: ACM.

    Google Scholar 

  • STM. (2018). Science Translational Medicine, a journal of the American Association for the Advancement of Science.

    Google Scholar 

  • Stonebraker, M. (2019a). How to start a company in 5 (not so) easy steps, to appear in (Brodie 2019b).

    Google Scholar 

  • Stonebraker, M. (2019b). Where do good ideas come from and how to exploit them? to appear in (Brodie 2019b).

    Google Scholar 

  • Stonebraker, M., & Kemnitz, G. (1991). The postgres next generation database management system. Communications of the ACM, 34(10), 78–92.

    Article  Google Scholar 

  • Stonebraker, M., Wong, E., Kreps, P., & Held, G. (1976). The design and implementation of INGRES. ACM Transactions on Database Systems, 1(3), 189–222.

    Article  Google Scholar 

  • Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., et al. (2005). C-store: A column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, 2005.

    Google Scholar 

  • Stonebraker, M., Castro Fernandez, R., Deng, D., & Brodie, M. L. (2016a). Database decay and what to do about it. Communications of the ACM, 60(1), 10–11.

    Article  Google Scholar 

  • Stonebraker, M., Deng, D., & Brodie, M. L. (2016b). Database decay and how to avoid it. In Proceedings of the IEEE International Conference on Big Data (pp. 1–10), Washington, DC.

    Google Scholar 

  • Stonebraker, M., Deng, D., & Brodie, M. L. (2017). Application-database co-evolution: A new design and development paradigm. In New England Database Day (pp. 1–3).

    Google Scholar 

  • van der Aalst, W. M. P. (2014). Data scientist: The engineer of the future. In K. Mertins, F. Bénaben, R. Poler, & J.-P. Bourrières (Eds.) Presented at the Enterprise Interoperability VI (pp. 13–26). Cham: Springer International Publishing.

    Google Scholar 

  • Veeramachaneni, K. (2016). Why you’re not getting value from your data science. Harvard Business Review, December 7, 2016.

    Google Scholar 

  • Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.

    Article  Google Scholar 

Download references

Acknowledgments

Thanks to Dr. Thilo Stadelmann, Zurich University of Applied Sciences, Institute for Applied Information Technology in the Swiss Fachhochschule system, for insights into these ideas; and to Dr. He H. (Anne) Ngu, Texas State University, for insights into applying these principles and pragmatics to the development of Texas State University’s Twenty-First Century Applied PhD Program in Computer Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael L. Brodie .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Brodie, M.L. (2019). On Developing Data Science. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds) Applied Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-11821-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11821-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11820-4

  • Online ISBN: 978-3-030-11821-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics