Environmental and Ecological Statistics

, Volume 18, Issue 4, pp 709–733 | Cite as

Geoadditive regression modeling of stream biological condition

  • Matthias Schmid
  • Torsten Hothorn
  • Kelly O. Maloney
  • Donald E. Weller
  • Sergej Potapov


Indices of biotic integrity have become an established tool to quantify the condition of small non-tidal streams and their watersheds. To investigate the effects of watershed characteristics on stream biological condition, we present a new technique for regressing IBIs on watershed-specific explanatory variables. Since IBIs are typically evaluated on an ordinal scale, our method is based on the proportional odds model for ordinal outcomes. To avoid overfitting, we do not use classical maximum likelihood estimation but a component-wise functional gradient boosting approach. Because component-wise gradient boosting has an intrinsic mechanism for variable selection and model choice, determinants of biotic integrity can be identified. In addition, the method offers a relatively simple way to account for spatial correlation in ecological data. An analysis of the Maryland Biological Streams Survey shows that nonlinear effects of predictor variables on stream condition can be quantified while, in addition, accurate predictions of biological condition at unsurveyed locations are obtained.


Proportional odds model Gradient boosting Geoadditive regression Stream biological condition Maryland Biological Streams Survey 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

10651_2010_158_MOESM1_ESM.pdf (3.3 mb)
ESM 1 (PDF 3329 kb)


  1. Agresti A (2002) Categorical data analysis. 2 edn. Wiley, New YorkCrossRefGoogle Scholar
  2. Angermeier PL, Schlosser IJ (1989) Species–area relationship for stream fishes. Ecology 70: 1450–1462CrossRefGoogle Scholar
  3. Barbour MT, Gerritsen J, Snyder BD, Stribling JB (1999) Rapid bioassessment protocols for use in streams and wadeable rivers: Periphyton, benthic macroinvertebrates and fish. 2 edn. Office of Water, US Environmental Protection Agency, Washington, DCGoogle Scholar
  4. Barker LS, Felton GK, Russek-Cohen E (2006) Use of Maryland biological stream survey data to determine effects of agricultural riparian buffers on measures of biological stream health. Environ Monitor Assess 117: 1–19CrossRefGoogle Scholar
  5. Bigler C, Kulakowski D, Veblen TT (2005) Multiple disturbance interactions and drought influence fire severity in Rocky Mountain subalpine forests. Ecology 86: 3018–3029CrossRefGoogle Scholar
  6. Breiman L (2001) Random forests. Mach Learn 45: 5–32CrossRefGoogle Scholar
  7. Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat Sci 22: 477–522CrossRefGoogle Scholar
  8. Bühlmann P, Yu B (2003) Boosting with the L 2 loss: regression and classification. J Am Stat Assoc 98: 324–338CrossRefGoogle Scholar
  9. Collier KJ (2009) Linking multimetric and multivariate approaches to assess the ecological condition of streams. Environ Monitor Assess 157: 113–124CrossRefGoogle Scholar
  10. Cooper C (2009) Assessing environmental impact on riparian benthic community vigor with unconditional estimates of quantile differences. Environ Ecol Stat (to appear)Google Scholar
  11. Cushing CE, Allan JD (2001) Streams: their ecology and life. Academic Press, New YorkGoogle Scholar
  12. Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT (2007) Random forests for classification in ecology. Ecology 88: 2783–2792PubMedCrossRefGoogle Scholar
  13. Efron B, Johnston I, Hastie T, Tibshirani R. (2004) Least angle regression. Ann Stat 32: 407–499CrossRefGoogle Scholar
  14. Fahrmeir L, Kneib T, Lang S (2004) Penalized structured additive regression: a Bayesian perspective. Stat Sin 14: 731–761Google Scholar
  15. Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Measure 33: 613–619CrossRefGoogle Scholar
  16. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28: 337–407CrossRefGoogle Scholar
  17. Gelfand AE (2007) Guest editorial: spatial and spatio-temporal modeling in environmental and ecological statistics. Environ Ecol Stat 14: 191–192CrossRefGoogle Scholar
  18. Hastie T (2007) Discussion of “Boosting algorithms: Regularization, prediction and model fitting” by P. Bühlmann and T. Hothorn. Stat Sci 22: 513–515CrossRefGoogle Scholar
  19. Hastie T., Tibshirani R (1990) Generalized additive models. Chapman & Hall, LondonGoogle Scholar
  20. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2 edn. Springer, New YorkGoogle Scholar
  21. Helms BS, Schoonover JE, Feminella JW (2009) Assessing influences of hydrology, physicochemistry, and habitat on stream fish assemblages across a changing landscape. J Am Water Resour Assoc 45: 157–169CrossRefGoogle Scholar
  22. Homer C, Huang CQ, Yang LM, Wylie B, Coan M (2004) Development of a 2001 national land-cover database for the United States. Photogrammetr Eng Remote Sens 70: 829–840Google Scholar
  23. Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2010) mboost: Model-Based Boosting. R package version 2.0-6.
  24. Hothorn T, Leisch F, Zeileis A, Hornik K (2005) The design and analysis of benchmark experiments. J Comput Graph Stat 14(3): 675–699CrossRefGoogle Scholar
  25. Joy MK, Death RG (2004) Predictive modelling and spatial mapping of freshwater fish and decapod assemblages using GIS and neural networks. Freshw Biol 49: 1036–1052CrossRefGoogle Scholar
  26. Karr JR (1991) Biological integrity: a long-neglected aspect of water resource management. Ecol Appl 1: 66–84CrossRefGoogle Scholar
  27. Karr JR, Fausch KD, Angermeier PL, Yant PR, Schlosser IJ (1986) Assessing biological integrity in running waters: a method and its rationale, 2 edn. Illinois Natural History Survey Special Publication 5, Champaign, ILGoogle Scholar
  28. King RS, Baker ME, Whigham DF, Weller DE, Jordan TE, Kazyak PF, Hurd MK (2005) Spatial considerations for linking watershed land cover to ecological indicators in streams. Ecol Appl 15: 137–153CrossRefGoogle Scholar
  29. Kneib T, Hothorn T, Tutz G (2009) Variable selection and model choice in geoadditive regression models. Biometrics 65: 626–634PubMedCrossRefGoogle Scholar
  30. Kneib T, Müller J, Hothorn T (2008) Spatial smoothing techniques for the assessment of habitat suitability. Environ Ecol Stat 15: 343–364CrossRefGoogle Scholar
  31. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2: 18–22Google Scholar
  32. Liaw A, Wiener M (2009) randomForest: Breiman and Cutler’s random forests for classification and regression. R package version 4.5-33.
  33. Maloney KO, Weller DE, Russell MJ, Hothorn T (2009) Classifying the biological condition of small streams: an example using benthic macroinvertebrates. J North Am Benthol Soc 28: 869–884CrossRefGoogle Scholar
  34. Matthews WJ, Robison HW (1998) Influence of drainage connectivity, drainage area and regional species richness on fishes of the Interior Highlands in Arkansas. Am Midland Nat 139: 1–19CrossRefGoogle Scholar
  35. McCullagh P (1980) Regression models for ordinal data (with discussion). J R Stat Soc Ser B 42: 109–142Google Scholar
  36. Meier L, van de Geer S, Bühlmann P (2009) High-dimensional additive modeling. Ann Stat 37: 3779–3821CrossRefGoogle Scholar
  37. Meyer D, Zeileis A, Hornik K (2009) vcd: Visualizing Categorical Data. R package version 1.2-7.
  38. Montgomery DR (1999) Process domains and the river continuum. J Am Water Resour Assoc 35: 397–410CrossRefGoogle Scholar
  39. Oberdorff T, Hughes RM (1992) Modification of an index of biotic integrity based on fish assemblages to characterize rivers of the Seine Basin, France. Hydrobiologia 228: 117–130CrossRefGoogle Scholar
  40. O’Hara RB, Sillanpää MJ (2009) A review of Bayesian variable selection methods: what, how and which. Bayesian Anal 4: 85–118CrossRefGoogle Scholar
  41. Omernik JM (1987) Ecoregions of the conterminous United States. Ann Assoc Am Geograph 77: 118–125CrossRefGoogle Scholar
  42. Park T, Casella G (2008) The Bayesian Lasso. J Am Stat Assoc 103: 681–686CrossRefGoogle Scholar
  43. Paul MJ, Meyer JL (2001) Streams in the urban landscape. Annu Rev Ecol Syst 32: 333–365CrossRefGoogle Scholar
  44. Pebesma EJ, Bivand R (2009) sp: Classes and methods for spatial data. R package version 0.9-47.
  45. Peterson EE, Urquhart NS (2006) Predicting water quality impaired stream segments using landscape-scale data and a regional geostatistical model: a case study in Maryland. Environ Monitor Assess 121: 615–638Google Scholar
  46. Pyne MI, Rader RB, Christensen WF (2007) Predicting local biological characteristics in streams: a comparison of landscape classifications. Freshw Biol 52: 1302–1321CrossRefGoogle Scholar
  47. R Development Core Team (2009) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. url:
  48. Rawlings JO, Pantula SG, Dickey DA (1998) Applied regression analysis: a research tool. 2 edn. Springer, New YorkCrossRefGoogle Scholar
  49. Roy AH, Rosemond AD, Paul MJ, Leigh DS, Wallace JB (2003) Stream macroinvertebrate response to catchment urbanisation (Georgia, U.S.A.). Freshw Biol 48: 329–346CrossRefGoogle Scholar
  50. Schleiger SL (2000) Use of an index of biotic integrity to detect effects of land uses on stream fish communities in west-central Georgia. Trans Am Fish Soc 129: 1118–1133CrossRefGoogle Scholar
  51. Schmid M, Hothorn T (2008) Boosting additive models using component-wise P-splines. Comput Stat Data Anal 53: 298–311CrossRefGoogle Scholar
  52. Schmid M, Potapov S, Pfahlberg A, Hothorn T (2010) Estimation and regularization techniques for regression models with multidimensional prediction functions. Stat Comput 20: 139–150CrossRefGoogle Scholar
  53. Southerland MT, Rogers GM, Kline MJ, Morgan RP, Boward DM, Kazyak PF, Klauda RJ, Stranko SA (2005) Maryland Biological Stream Survey 2000–2004, Volume XVI: new biological indicators to better assess the condition of Maryland streams. DNR-12-0305-0100, Maryland Department of Natural Resources, Annapolis, MDGoogle Scholar
  54. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58: 267–288Google Scholar
  55. USEPA (1999) From the mountains to the sea: the state of Maryland’s freshwater streams. EPA 903-R-99-023. Office of Research and Development, US Environmental Protection Agency, Washington, DCGoogle Scholar
  56. USEPA (2006) Wadeable streams assessment: a collaborative survey of the Nation’s streams. EPA 841-B-06-002. Office of Water, US Environmental Protection Agency, Washington, DCGoogle Scholar
  57. Vannote RL, Minshall GW, Cummins KW, Sedell JR, Cushing CE (1980) The river continuum concept. Can J Fish Aquatic Sci 37: 130–137CrossRefGoogle Scholar
  58. Walsh CJ, Roy AH, Feminella JW, Cottingham PD, Groffman PM, Morgan RP (2005) The urban stream syndrome: current knowledge and the search for a cure. J North Am Benthol Soc 24: 706–723Google Scholar
  59. Wang L, Lyons J (2003) Fish and benthic macroinvertebrate assemblages as indicators of stream degradation in urbanizing watersheds. In: Simon TP (ed.) Biological response signatures: indicator patterns using aquatic communities. CRC Press, New York, pp 227–249Google Scholar
  60. Wood S (2006) Generalized additive models: an introduction with R. Chapman & Hall/CRC, Boca RatonGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Matthias Schmid
    • 1
  • Torsten Hothorn
    • 2
  • Kelly O. Maloney
    • 3
  • Donald E. Weller
    • 3
  • Sergej Potapov
    • 1
  1. 1.Institut für Medizininformatik, Biometrie und EpidemiologieFriedrich-Alexander-Universität Erlangen-NürnbergErlangenGermany
  2. 2.Institut für StatistikLudwig-Maximilians-Universität MünchenMunichGermany
  3. 3.Smithsonian Environmental Research CenterEdgewaterUSA

Personalised recommendations