Skip to main content
  • Textbook
  • Open Access
  • © 2020

Leveraging Data Science for Global Health

  • Is the first and currently the only book on digital disease surveillance through the application of machine learning to non-traditional data sources

  • Focuses on combating disease and promoting health, especially in resource-constrained settings

  • Includes and expands on the latest non-traditional data sources such as Google Trends, Google Street View, the news media, and social media

  • Is an open access book

Buying options

Softcover Book USD 49.99
Price excludes VAT (USA)
Hardcover Book USD 59.99
Price excludes VAT (USA)

Table of contents (29 chapters)

  1. Front Matter

    Pages i-xii
  2. Building a Data Science Ecosystem for Healthcare

    1. Front Matter

      Pages 1-1
    2. Building Electronic Health Record Databases for Research

      • Lucas Bulgarelli, Antonio Núñez-Reiz, Rodrigo Octavio Deliberato
      Pages 55-64Open Access
    3. Funding Global Health Projects

      • Katharine Morley, Michael Morley, Andrea Beratarrechea
      Pages 65-75Open Access
    4. From Causal Loop Diagrams to System Dynamics Models in a Data-Rich Ecosystem

      • Gary Lin, Michele Palopoli, Viva Dadwal
      Pages 77-98Open Access
    5. Workshop on Blockchain Use Cases in Digital Health

      • Philip Christian C. Zuniga, Rose Ann C. Zuniga, Marie Jo-anne Mendoza, Ada Angeli Cariaga, Raymond Francis Sarmiento, Alvin B. Marcelo
      Pages 99-107Open Access
  3. Health Data Science Workshops

    1. Front Matter

      Pages 109-109
    2. Applied Statistical Learning in Python

      • Calvin J. Chiew
      Pages 111-128Open Access
    3. Machine Learning for Patient Stratification and Classification Part 3: Supervised Learning

      • Cátia M. Salgado, Susana M. Vieira
      Pages 169-198Open Access
    4. Machine Learning for Clinical Predictive Analytics

      • Wei-Hung Weng
      Pages 199-217Open Access
    5. Robust Predictive Models in Clinical Data—Random Forest and Support Vector Machines

      • Siqi Liu, Hao Du, Mengling Feng
      Pages 219-228Open Access
    6. Introduction to Clinical Natural Language Processing with Python

      • Leo Anthony Celi, Christina Chen, Daniel Gruhl, Chaitanya Shivade, Joy Tzung-Yu Wu
      Pages 229-250Open Access
    7. Introduction to Digital Phenotyping for Global Health

      • Olivia Mae Waring, Maiamuna S. Majumder
      Pages 251-261Open Access
    8. Biomedical Signal Processing: An ECG Application

      • Chen Xie
      Pages 285-303Open Access

About this book

This open access book explores ways to leverage information technology and machine learning to combat disease and promote health, especially in resource-constrained settings. It focuses on digital disease surveillance through the application of machine learning to non-traditional data sources. Developing countries are uniquely prone to large-scale emerging infectious disease outbreaks due to disruption of ecosystems, civil unrest, and poor healthcare infrastructure – and without comprehensive surveillance, delays in outbreak identification, resource deployment, and case management can be catastrophic. In combination with context-informed analytics, students will learn how non-traditional digital disease data sources – including news media, social media, Google Trends, and Google Street View – can fill critical knowledge gaps and help inform on-the-ground decision-making when formal surveillance systems are insufficient.


  • Open Access
  • Big Data
  • Machine Learning
  • Artificial Intelligence
  • Health Informatics
  • Digital Disease Surveillance
  • Health Mapping
  • Health Records for Non-Communicable Diseases
  • HealthMap
  • Tools for Clinical Trials


“This book seems to empower the reader to gradually embark on the development of medical applications incorporating data science. … This book is well structured, written with a good level of linguistic guts, and could be recommended to data science students rather than researchers or health professionals.” (Thierry Edoh, Computing Reviews, March 24, 2022)

Editors and Affiliations

  • Massachusetts Institute of Technology, Cambridge, USA

    Leo Anthony Celi

  • Boston Children’s Hospital, Harvard Medical School, Boston, USA

    Maimuna S. Majumder

  • University of Puerto Rico Río Piedras, San Juan, USA

    Patricia Ordóñez

  • ScienteLab, Department of Global Health, University of Washington, Seattle, USA

    Juan Sebastian Osorio

  • Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, USA

    Kenneth E. Paik

  • Imperial College London, London, UK

    Melek Somai

About the editors

Leo Anthony Celi, M.D., M.S., M.P.H., has practiced medicine in three continents, giving him broad perspectives in healthcare delivery. As clinical research director and principal research scientist at the MIT Laboratory for Computational Physiology (LCP) and as an attending physician at the Beth Israel Deaconess Medical Center (BIDMC), he brings together clinicians and data scientists to support research using data routinely collected in the process of care. Leo also founded and co-directs Sana, a cross-disciplinary organization based at the Institute for Medical Engineering and Science at MIT, whose objective is to leverage information technology to improve health outcomes in low- and middle-income countries. He is one of the course directors for global health informatics to improve quality of care, and collaborative data science in medicine, both at MIT. He is an editor of the textbook for each course, both released under an open access license. Leo has spoken in 25 countries about the value of data in improving health outcomes. 

Bibliographic Information

  • Book Title: Leveraging Data Science for Global Health

  • Editors: Leo Anthony Celi, Maimuna S. Majumder, Patricia Ordóñez, Juan Sebastian Osorio, Kenneth E. Paik, Melek Somai

  • DOI:

  • Publisher: Springer Cham

  • eBook Packages: Computer Science, Computer Science (R0)

  • Copyright Information: The Editor(s) (if applicable) and The Author(s) 2020

  • License: CC BY

  • Hardcover ISBN: 978-3-030-47993-0Published: 01 August 2020

  • Softcover ISBN: 978-3-030-47996-1Published: 18 September 2020

  • eBook ISBN: 978-3-030-47994-7Published: 31 July 2020

  • Edition Number: 1

  • Number of Pages: XII, 475

  • Number of Illustrations: 21 b/w illustrations, 175 illustrations in colour

  • Topics: Health Informatics, Health Economics

Buying options

Softcover Book USD 49.99
Price excludes VAT (USA)
Hardcover Book USD 59.99
Price excludes VAT (USA)