Skip to main content

Introduction to Data Science

  • Chapter
  • First Online:
  • 8780 Accesses

Abstract

Let me start by making an analogy between software engineering and data science. Software engineering may be summarized as the application of engineering principles and methods to the development of software. The aim is to produce a dependable software product. In a similar vein, data science may be described as the application of scientific principles and methods in working with data. The goal is to synthesize reliable and actionable insights from data (sometimes referred as data product). To continue with our analogy, the systems/software development life cycle (SDLC) prescribes the major phases of a software development process: project initiation, requirements engineering, design, construction, testing, deployment, and maintenance. The data science process also encompasses multiple phases: project initiation, data acquisition, data preparation, data analysis, reporting, and execution of actions (another “phase” is data exploration, which is more of an all-embracing activity than a stand-alone phase). As in software development, these phases are quite interwoven, and the process is inherently iterative and incremental. An overarching activity that is indispensable in both software engineering and data science (and any other iterative and incremental endeavor) is retrospection, which involves reviewing a project or process to determine what was successful and what could be improved. Another similarity to software engineering is that data science also relies on a multidimensional team or team of teams. A typical project requires domain experts, software engineers specializing in various technologies, and mathematicians (a single person may take different roles at various times). Yet another common denominator with software engineering is a penchant for automation (via programmability of most activities) to increase productivity, reproducibility, and quality. The aim of this chapter is to explain the key concepts regarding data science and put them into proper context.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    A retrospective gives you and your team an opportunity to take notes, make comparisons, hold meetings, and overall improve the current process. In Agile projects, each iteration closes with a retrospective, where the team contemplates what worked well and what went wrong and makes pragmatic steps to enhance the current way of working. A data science team may use a similar strategy to smooth out process-related difficulties.

  2. 2.

    The original version refers to hacking skills instead of software engineering. I think this isn’t appropriate anymore. As Python data science software solutions reach an enterprise level, they must be professionally developed to be maintainable and evolvable in a cost-effective manner.

  3. 3.

    Consult reference [8] for an overview how Docker may help package up artifacts inside containers. This a complementary approach to using Anaconda’s environments and virtualization.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Ervin Varga

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Varga, E. (2019). Introduction to Data Science. In: Practical Data Science with Python 3. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4859-1_1

Download citation

Publish with us

Policies and ethics