© 2008

Statistical Methods for Environmental Epidemiology with R

A Case Study in Air Pollution and Health


Part of the Use R book series (USE R)

About this book


Advances in statistical methodology and computing have played an important role in allowing researchers to more accurately assess the health effects of ambient air pollution. The methods and software developed in this area are applicable to a wide array of problems in environmental epidemiology. This book provides an overview of the methods used for investigating the health effects of air pollution and gives examples and case studies in R which demonstrate the application of those methods to real data. The book will be useful to statisticians, epidemiologists, and graduate students working in the area of air pollution and health and others analyzing similar data.

The authors describe the different existing approaches to statistical modeling and cover basic aspects of analyzing and understanding air pollution and health data. The case studies in each chapter demonstrate how to use R to apply and interpret different statistical models and to explore the effects of potential confounding factors. A working knowledge of R and regression modeling is assumed. In-depth knowledge of R programming is not required to understand and run the examples.

Researchers in this area will find the book useful as a ``live'' reference. Software for all of the analyses in the book is downloadable from the web and is available under a Free Software license. The reader is free to run the examples in the book and modify the code to suit their needs. In addition to providing the software for developing the statistical models, the authors provide the entire database from the National Morbidity Mortality and Air Pollution Study (NMMAPS) in a convenient R package. With the database, readers can run the examples and experiment with their own methods and ideas.

Roger D. Peng is an Assistant Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. He is a prominent researcher in the areas of air pollution and health risk assessment and statistical methods for spatial and temporal data. Dr. Peng is the author of numerous R packages and is a frequent contributor to the R mailing lists.

Francesca Dominici is a Professor in the Department of Biostatistics at the Johns Hopkins Bloomberg School of Public Health. She has published extensively on hierarchical and semiparametric modeling and has been the leader of major national studies of the health effects of air pollution. She has also participated in numerous panels conducted by the National Academy of Science assessing the health effects of environmental exposures and has consulted for the US Environmental Protection Agency's Clean Air Act Advisory Board.


NMMAPS Radiologieinformationssystem air pollution reproducible research semiparametric models time series

Authors and affiliations

There are no affiliations available

Bibliographic information


From the Reviews:

"This volume is another in the Springer series, Use R!.… It differs somewhat with respect to others in the series in a t least two ways. One of course is that it focuses on environmental epidemiology but more importantly on reproducible research. The authors strongly emphasize reproducible research and all of the example analyses in the book are made available by the use of the R package cacher, written by the first author.… It is organized around case studies using two public databases, NMMAPS and MCAPS, both of which are available in R packages.… As noted ‘reproducibility’ is a principal theme in this volume. This idea has received considerable attention by other authors, although the use of the cacher package seems to be new." (Donald E. Myers, Technometrics, August 2009, VOL. 51, NO. 3)

"What makes this book interesting to me is not the precise final form of the regression model, the technical expertise shown in fitting the model, or even the posterior distributions of the parameters. I happen to be interested in air pollution, and a large, clean, and well-organized environmental health database makes me feel all warm inside. Add the way data are handled, stored, manipulated int his book, and the way in which the analyses are cached and can be completely retrieved by anybody who is interested. That, I think, is its most important contribution." (Jan de Leeuw, Journal of Statistical Software, 2008)

“This book bridges the theory and implementation of statistical methods for air pollution risk estimation in multisite time-series data, relying primarily on data from the National Morbidity Mortality Air Pollution Study (NMMAPS) for examples. The implementation of methods relies exclusively on the R statistical software. The authors are both recognized experts in the statistical analysis of the health effects of air pollution and Professor Peng is also well known within the R community. …The general approach taken by the authors, to combine leading statistical theory in this area with clearly worked examples in R, fills an important gap in materials available for learning how to conduct such analyses. Students and researchers frequently obtain through books and coursework a general understanding of the theory relevant to a particular problem, but less frequently do they receive practical guidance in how to use available software to put the theory into practice. This book addresses exactly this gap in the context of additive time-series, and Bayesian hierarchical models applied to time series of air pollution and health data. …the writing is clear… . The described software and cached analysis were easy to download and run as described in the book and it was relatively straightforward to work through the analyses exactly as described. …the authors …make a strong argument for how R can be used to support reproducible research … .”  (Biometrics, Summer 2009, 65, 996–997)