Skip to main content

Data Science and Empirical Software Engineering

  • Chapter
  • First Online:
Contemporary Empirical Methods in Software Engineering

Abstract

Empirical Software Engineering (ESE) roots back to the 1970s and has since then gained growing recognition as the standard approach to scientific inquiry in the context of software engineering. Many different quantitative and qualitative research methods have been described and supplied with guidelines and checklists and several books have been written about good practice in ESE. With the emerging amount of data being produced during software development, a new paradigm of scientific inquiry has gained much attention, i.e., Data Science (DS). The goal of this chapter is to discuss whether DS could replace traditional ESE or, if it does not replace it, how traditional ESE could benefit from adopting DS practices—and vice versa. In this chapter, we first give some general background information about ESE and DS, then we describe in more detail how both paradigms are typically used in the context of software engineering research and what are their respective strengths and weaknesses. Finally, we illustrate with the help of an industry-driven case example how both paradigms, ESE and DS, could benefit from each other if used in combination.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Alahyari H, Gorschek T, Svensson RB (2019) An exploratory study of waste in software development organizations using agile or lean approaches: a multiple case study at 14 organizations. Inf Softw Technol 105:78–94

    Article  Google Scholar 

  • Basili VR (1985) Quantitative evaluation of software methodology. In: Keynote address, proceedings of the first pan pacific computer conference, Melbourne, pp 379–398

    Google Scholar 

  • Basili VR, Caldiera G, Rombach DH (1994) The goal question metric approach. Wiley, Hoboken, pp 528–532

    Google Scholar 

  • Basili VR, Caldiera G, Rombach HD (2001) The experience factory. In: Marciniak J (ed) Encyclopedia of software engineering. Wiley, Hoboken

    Google Scholar 

  • Basili VR, Rombach HD, Schneider K, Kitchenham B, Pfahl D, Selby RW (2007) Empirical software engineering issues. Critical assessment and future directions. In: International workshop, Dagstuhl Castle, June, 2006. Revised Papers. LNCS 4336, Springer, Berlin, pp 26–30

    Google Scholar 

  • Bird C, Menzies T, Zimmermann T (2015) The art and science of analyzing software data. Elsevier, Amsterdam

    Google Scholar 

  • Blackstone A (2012) Principles of sociological inquiry–Qualitative and quantitative methods. BC open textbook collection, Open Textbook Library

    Google Scholar 

  • Boehm B, Rombach HD, Zelkowitz MV (2010) Foundations of empirical software engineering: the legacy of Victor R, 1st edn. Springer; Basili Publishing Company, Berlin

    Google Scholar 

  • Briand LC, Differding CM, Rombach HD (1996) Practical guidelines for measurement-based process improvement. Softw Process Improv Prac 2:253–280

    Article  Google Scholar 

  • Clark KB, Wheelwright SC (1993) Managing new product and process development: text and cases. The Free Press, New York

    Google Scholar 

  • Cohn M (2005) Agile estimating and planning. Pearson Education, London

    Google Scholar 

  • Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery in databases. AI Mag 17:37–37

    Google Scholar 

  • Hannay J, Sjøberg D, Dybå TA (2007) Systematic review of theory use in software engineering experiments. IEEE Trans Softw Eng 33:87–107

    Article  Google Scholar 

  • Hayashi C (1998) What is data science? Fundamental concepts and a heuristic example. In: Hayashi C, Yajima K, Bock H-H, Ohsumi N, Tanaka Y, Baba Y (eds) Data science, classification, and related methods. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 40–51

    Google Scholar 

  • Hayashi C, Yajima K, Bock H-H, Ohsumi N, Tanaka Y, Baba YDS (eds) (1998) Classification, and related methods, studies in classification, data analysis, and knowledge organization. Springer, Berlin

    Google Scholar 

  • Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. Technical report, EBSE-2007-01, School of Computer Science and Mathematics, Keele University

    Google Scholar 

  • Kitchenham B, Pickard L, Pfleeger SL (1995) Case studies for method and tool evaluation. IEEE Softw 12(4):52–62

    Article  Google Scholar 

  • Kitchenham BA, Dybå T, Jorgensen M (2004) Evidence-based software engineering. In: Proceedings of the 26th international conference on software engineering (ICSE ’04), pp 273–281

    Google Scholar 

  • McCollum JK, Sherman JD (1991) The effects of matrix organization size and number of project assignments on performance. IEEE Trans Eng Manag 38(1):75–78

    Article  Google Scholar 

  • Menzies T, Kocaguneli E, Turhan B, Minku L, Peters F (2014) Sharing data and models in software engineering. Morgan Kaufmann, Burlington

    Google Scholar 

  • Menzies T, Williams L, Zimmermann T (2016) Perspectives on data science for software engineering. Morgan Kaufmann, Burlington

    Book  Google Scholar 

  • Morgan JM, Liker JK (2006) The Toyota product development system: integrating people, process and technology. Productivity Press, New York

    Book  Google Scholar 

  • Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) Systematic mapping studies in software engineering. In: Visaggio G, Baldassarre MT, Linkman S, Turner M (eds) Proceedings of the 12th international conference on evaluation and assessment in software engineering (EASE’08). BCS Learning & Development Ltd., Swindon, pp 68–77

    Google Scholar 

  • Poppendieck M, Poppendieck T (2003) Lean software development: an agile toolkit: an agile toolkit. Addison Wesley, Boston

    Google Scholar 

  • Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):2009

    Article  Google Scholar 

  • Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng 25(4):1999

    Article  Google Scholar 

  • Shull FJ, Singer J (2007) Sjøberg, i. In: ‘K’, guide to advanced empirical software engineering. Springer, Berlin

    Google Scholar 

  • Stol K, Fitzgerald BC (2017) Continuous software engineering: a roadmap and agenda. J Syst Softw 123:176–189

    Article  Google Scholar 

  • Stol K, Fitzgerald B (2018) The ABC of software engineering research. ACM Trans Softw Eng Methodol 27:3

    Article  Google Scholar 

  • Vasilescu B, Blincoe K, Xuan Q, Casalnuovo C, Damian D, Devanbu P, Filkov V (2016) The sky is not the limit: multitasking across GitHub projects. In: IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, Piscataway, pp 994–1005

    Google Scholar 

  • Wallace WL (1971) The logic of science in sociology. Aldine, New York

    Google Scholar 

Download references

Acknowledgements

This research was partly supported by the institutional research grant IUT20-55 of the Estonian Research Council and the Estonian Centre of Excellence in ICT Research (EXCITE).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dietmar Pfahl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Scott, E., Milani, F., Pfahl, D. (2020). Data Science and Empirical Software Engineering. In: Felderer, M., Travassos, G. (eds) Contemporary Empirical Methods in Software Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-32489-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-32489-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-32488-9

  • Online ISBN: 978-3-030-32489-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics