© 2017

The Data Science Design Manual


Part of the Texts in Computer Science book series (TCS)

Table of contents

  1. Front Matter
    Pages i-xvii
  2. Steven S. Skiena
    Pages 1-25
  3. Steven S. Skiena
    Pages 27-56
  4. Steven S. Skiena
    Pages 57-93
  5. Steven S. Skiena
    Pages 95-120
  6. Steven S. Skiena
    Pages 121-154
  7. Steven S. Skiena
    Pages 155-200
  8. Steven S. Skiena
    Pages 201-236
  9. Steven S. Skiena
    Pages 237-265
  10. Steven S. Skiena
    Pages 267-302
  11. Steven S. Skiena
    Pages 303-349
  12. Steven S. Skiena
    Pages 351-390
  13. Steven S. Skiena
    Pages 391-421
  14. Steven S. Skiena
    Pages 423-425
  15. Back Matter
    Pages 427-445

About this book


This engaging and clearly written textbook/reference provides a must-have introduction to the rapidly emerging interdisciplinary field of data science. It focuses on the principles fundamental to becoming a good data scientist and the key skills needed to build systems for collecting, analyzing, and interpreting data.

The Data Science Design Manual is a source of practical insights that highlights what really matters in analyzing data, and provides an intuitive understanding of how these core concepts can be used. The book does not emphasize any particular programming language or suite of data-analysis tools, focusing instead on high-level discussion of important design principles.

This easy-to-read text ideally serves the needs of undergraduate and early graduate students embarking on an “Introduction to Data Science” course. It reveals how this discipline sits at the intersection of statistics, computer science, and machine learning, with a distinct heft and character of its own. Practitioners in these and related fields will find this book perfect for self-study as well.

Additional learning tools:

  • Contains “War Stories,” offering perspectives on how data science applies in the real world
  • Includes “Homework Problems,” providing a wide range of exercises and projects for self-study
  • Provides a complete set of lecture slides and online video lectures at
  • Provides “Take-Home Lessons,” emphasizing the big-picture concepts to learn from each chapter
  • Recommends exciting “Kaggle Challenges” from the online platform Kaggle
  • Highlights “False Starts,” revealing the subtle reasons why certain approaches fail
  • Offers examples taken from the data science television show “The Quant Shop” (


Data Science Data Analytics Pattern Recognition Analytical Statistics Data Visualisation Machine Learning

Authors and affiliations

  1. 1.Computer Science DepartmentStony Brook UniversityStony BrookUSA

About the authors

Dr. Steven S. Skiena is Distinguished Teaching Professor of Computer Science at Stony Brook University, with research interests in data science, natural language processing, and algorithms. He was awarded the IEEE Computer Science and Engineering Undergraduate Teaching Award “for outstanding contributions to undergraduate education ...and for influential textbooks and software.”  Dr. Skiena is the author of six books, including the popular Springer titles The Algorithm Design Manual and Programming Challenges: The Programming Contest Training Manual.

Bibliographic information


“The book is more than a typical manual. In fact, the author himself designates it as a textbook for an introductory course on data science. The chapters are richly equipped with exercises. The topics are always explained starting with a proper motivation and continuing with practical examples. This is perhaps the most outstanding feature of the book. It can serve as a regular textbook for an academic course. In fact, I should like to recommend it exactly for this purpose. On the other hand, it provides a wealth of material for people from industry, such as software engineers, and can serve as a manual for them to accomplish data science tasks. It should be noted that the book is not just a text, but a much more complex product, including a full set of lecture slides available online as well as a solutions wiki.” (P. Navrat, Computing Reviews, February, 23, 2018)