Computational Statistics

, Volume 29, Issue 1–2, pp 141–157 | Cite as

Detecting the impact area of BP deepwater horizon oil discharge: an analysis by time varying coefficient logistic models and boosted trees

Original Paper

Abstract

The Deepwater Horizon oil discharge in the Gulf of Mexico is considered to be one of the worst environmental disasters to date. The spread of the oil spill and its consequences thereof had various environmental impacts. The National Oceanic and Atmospheric Administration (NOAA) in conjunction with the Environmental Protection Agency (EPA), the US Fish and Wildlife Service, and the American Statistical Association (ASA) have made available a few datasets containing information of the oil spill. In this paper, we analyzed four of these datasets in order to explore the use of applied statistics and machine learning methods to understand the spread of the oil spill. In particular, we analysed the “gliders, floats, boats” and “birds” data. The former contains various measurements on sea water such as salinity, temperature, spacial locations, depth and time. The latter contains information on the living conditions of birds, such as living status, oil conditions, locations and time. A varying-coefficients logistic regression was fitted to the birds data. The result indicated that the oil was spreading more quickly along the East–West direction. Analysis via boosted trees and logistic regression showed similar results based on the information provided by the above data.

Keywords

Varying-coefficients model Logistic regression Boosting trees  Oil impacts 

References

  1. Cook D (2013) The 2011 data expo of the American Statistical Association. Comput Stat (forthcoming)Google Scholar
  2. Friedman J (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232CrossRefGoogle Scholar
  3. Friedman J (2002a) Multiple additive regression trees. An interface with R from Salford systemsGoogle Scholar
  4. Friedman J (2002b) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378CrossRefMATHGoogle Scholar
  5. Hastie T, Tibshirani R (1990) Generalized additive models. Chapman & Hall, LondonMATHGoogle Scholar
  6. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Springer series in statistics, 2nd edn. Springer, New YorkCrossRefGoogle Scholar
  7. Hosmer D, Lemeshow S, Sturdivant R (2013) Applied logistic regression. Wiley series in probability and statistics. Wiley, HobokenCrossRefGoogle Scholar
  8. Leathwick J, Elith J, Hastie T (2006) Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecol Model 199(2):188–196CrossRefGoogle Scholar
  9. Loecher M (2011) RgoogleMaps: overlays on Google map tiles in R. R package version 1.1.9.15Google Scholar
  10. R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  11. Ridgeway G (2010) gbm: Generalized boosted regression models. R package version 1(6–3):1Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Tianxi Li
    • 1
  • Chao Gao
    • 2
  • Meng Xu
    • 3
  • Bala Rajaratnam
    • 1
    • 4
  1. 1.Department of StatisticsStanford UniversityStanfordUSA
  2. 2.Department of StatisticsYale UniversityNew HavenUSA
  3. 3.Department of Environmental ScienceNankai UniversityTianjinChina
  4. 4.Department of Environmental Earth System ScienceStanford UniversityStanfordUSA

Personalised recommendations