The content of historical books as an indicator of past interest in environmental issues


In order to better understand public interest in environmental issues it is necessary to not only consider present and recent levels of environmental awareness, but to set a longer term historical baseline. Large databases derived from scanned historical books, such as Google Ngram, provide a resource which can be used to assess historical levels of interest in environmental issues. Historical trends in the occurrence of nine environmental indicator terms were analysed between 1800 and 2009, and it was found that usage of all terms was highest during the last 50 years of this period. However, the usage of seven of the indicator terms investigated has now peaked and is in decline, and in some cases this decline began around 20 years ago. The observed patterns may indicate reduced interest in the environment, acceptance of environmental issues, or shifting trends in the terminology used by the environmental movement.


Biodiversity conservation and environmental management are issues of broad public concern (Van Liere and Dunlap 1981; Hays 2000), and public opinion is a key factor driving the implementation and determining the success of policy and legislation (Hobolt and Klemmemsen 2005; Whiteley 1981; Phillis et al. 2013). The willingness of the public to accept environmental policies and to contribute time or money to conservation efforts depends largely on their interest in conservation and environmental issues (Grob 1995; Kollmuss and Agyeman 2002; Barr 2003). It is therefore useful for conservation scientists and policy makers to be able to reliably gauge public interest in environmental issues, and survey and interview methods have been applied in the past to achieve this (Mccann et al. 1997; Nisbet and Myers 2007; Wray-Lake et al. 2010). More recently, data mining methods have been used to monitor public opinion, as large sources of data such as Twitter, the number of catalogued websites mentioning specific topics, and the volume of internet searches made through Google for specific keywords, have become publicly available to researchers (Baram-Tsabari and Segev 2011; Pak and Paroubek 2010; Evans and Foster 2011). Conservation scientists have used some of these data sources to quantify public interest in bird and butterfly species (Żmihorski et al. 2012), and to measure temporal trends in public interest in environmental issues (Mccallum and Bury 2013). Online sources of data provide useful indicators of the background level of public interest or concern and are updated frequently, allowing policy makers to respond rapidly to changes in interest (Ginsberg et al. 2009; Scheitle 2011). However, they do not provide a long-term historical record of public interest in issues; search volume data from Google is only available since 2004 (Mccallum and Bury 2013). Without a longer term baseline that is comparable to Google search volume it is difficult to interpret the conclusion that interest in environmental issues has declined over time (Mccallum and Bury 2013), although data from surveys of young people show similar declines since the late 1970s (Wray-Lake et al. 2010). It is thought that public awareness of environmental issues increased during the second half of the twentieth century, thus driving the mainstreaming of the environmental movement (Van Liere and Dunlap 1981; Clapp 1994; Hays 2000). If public interest in environmental issues has declined in more recent history (Wray-Lake et al. 2010; Mccallum and Bury 2013) then it is important to know how long ago this decline began.

A longer term baseline for public interest in environmental issues can be established through analysis of another large dataset available online. Google Books’ Ngram is a database created from the largest digitised library of books in the world (Michel et al. 2011). The Ngram corpus catalogues the content of a subset of the digitised books in the Google Books library, and includes a count of the number of times that each word was mentioned every year. The proportional occurrence of environmental words in the Ngram literature could be used as an index of historical environmental interest which is comparable to the Google search volume index. The sample population that Ngram represents is limited because a broader range of the public uses Google to search the internet than publishes books. However, the supply of book content to the public is driven partly by demand (Hjorth-Andersen 2000), so Ngram data should indicate the topics that interested the public through time (Acerbi et al. 2013; Phillis et al. 2013). The database has previously been used to investigate patterns of interest in religion, philosophy, and food (Michel et al. 2011), and has received some limited attention in conservation and environmental research. The proportional occurrence of the terms “climate change” and “protected areas” has been used to indicate documentation of protected area creation (Ervin 2011), and a model describing the transfer of keywords between scientists and the public has used climate science as an example discipline (Bentley et al. 2012). Trends in the usage of natural catastrophe terms have been analysed from the perspective of cultural anthropology (Marriner and Morhange 2012). The increase in usage of catastrophic terms in the last quarter of the twentieth century suggests increased public interest in some environmental issues (Marriner and Morhange 2012), although this study focused mainly on event catastrophes rather than longer term ecological issues. Additionally, the occurrence of three specific environmental terms in a related data source, the Google News Archives database, has been used as an indicator of public interest in environmental issues (Phillis et al. 2013). In this study the relationships between environmental stressors, scientific publications, public interest and policy change were analysed to investigate how these factors interact to allow conservation success (Phillis et al. 2013). No previous studies have used historical word usage in published books as a broad indicator of public interest in environmental issues.

To investigate historical public interest in conservation and environmental management issues the proportional occurrence of nine environmental indicator keywords was analysed in the Ngram database. To investigate long-term trends in public interest in environmental issues the temporal trends in usage of these keywords were analysed over the period 1800–2009. To investigate whether there has been a peak and subsequent decline in interest in more recent history, the usage of keywords was analysed separately for the years between 1960 and 2009.


The 2012 version of the British English Ngram database for 1-grams (frequencies of single words rather than combinations of words) was downloaded (Google Ngram 2013), and the occurrence of each environmental keyword in the database was then recorded by year. Only years between 1800 and 2009 were analysed so that a reasonable sample of books was available each year (Michel et al. 2011). It is known that the composition of the Ngram corpus is not directly comparable prior to and following the year 2000 (Michel et al. 2011), but the most recent years (2000–2009) are of key interest for this study, so analyses were conducted over the full period. Additional analyses were conducted excluding the years 2001–2009 to establish that the conclusions were comparable. Six of the keywords chosen for analysis were the general terms used by Mccallum and Bury (2013): “conservation”, “biodiversity”, “environment”, “ecology”, “wildlife”, and “fisheries”. An additional three slightly more specific keywords from their selection were also used: “pollution”, “extinction”, and “sustainability”. The number of words recorded in the Ngram database every year is several orders of magnitude smaller than the number of searches that Google receives each year, so it was not possible to analyse more specific combinations using multiple words. However, the nine keywords used in this analysis are still suitable indicators of broad environmental interest.

Proportional occurrence of environmental keywords was modelled using a generalised linear model using a quasibinomial error structure to account for overdispersion (Crawley 2007). Separate models were produced for the period 1800–2009, and 1960–2009, so that any difference between long-term trends and more recent patterns could be investigated. For the 1800–2009 period proportional occurrence of each keyword was modelled as a function of sample year only. For the 1960–2009 period proportional keyword occurrence was modelled using quadratic and linear terms for sample year (2nd order polynomial), to determine whether or not there was a significant peak in usage during this period. The polynomial and simple linear models were compared using ANOVA to determine the presence of a statistically significant unimodal response. For the polynomial models the peak year of word frequency was estimated from the coefficients using the ratio –a/2b, where a was the coefficient of year and b was the coefficient of the quadratic term (Zar 2010). All Ngram database processing and statistical analyses were conducted in R version 2.15.2 (R Core Development Team 2012).


The environmental keyword with the greatest total usage over the period 1800–2009 was “environment”, followed by “conservation” and “pollution” (Table 1). The keyword with the least usage was “biodiversity” (Table 1). Several keywords were not mentioned in the earliest years of the period; the first record for “wildlife” was made in 1806, “ecology” in 1816, “sustainability” in 1835, and “biodiversity” in 1965. All nine of the environmental keywords used in this study show statistically significant increases in frequency of usage between 1800 and 2009 (Fig. 1; Table 1). There is some variation in the shape of the trends, with the frequency of most keywords increasing rapidly over the last 40 or 50 years. However, the terms “fisheries” and “extinction” show more gradual increases in word occurrence throughout the entire period (Fig. 1).

Table 1 Total usage of nine environmental keywords, temporal trends and significance over the time periods 1800–2009 and 1960–2009, and peak date of keyword occurrence
Fig. 1

Nine keywords related to biological conservation or environmental management show increasing frequencies of occurrence through time in British English language books catalogued by Google Ngram, during the period 1800–2009

Seven of the nine environmental keywords show statistically significant unimodal patterns in frequency of occurrence between 1960 and 2009 (Fig. 2; Table 1). “Biodiversity” shows a statistically significant increasing frequency of occurrence, and “extinction” shows no statistically significant pattern in frequency of usage over this period. The modelled peaks in keyword frequency for the unimodal responses all occurred between 1992 and 2006 (Table 1).

Fig. 2

Seven keywords related to biological conservation or environmental management show unimodal frequencies of occurrence through time in British English language books catalogued by Google Ngram, during the period 1960–2009. One keyword shows an increasing frequency of occurrence over this period, and one shows no statistically significant trend


The increasing occurrence of keywords relating to environmental issues in the British English Ngram corpus between 1800 and 2009 suggests that overall there has been increasing public interest in these issues over this period. This is consistent with the history of the environmental movement. It is thought that the modern environmental movement grew up in the post-war period (Sheal 1984) and began to gain momentum in the 1960s (Nerlich 2003). There is evidence that increasing public interest has impacted policy, for example in the rapid rate of designation of protected areas between the 1960s and 1990s (Ervin 2011; Radeloff et al. 2012). Environmental keyword usage thus provides an indicator of interest in environmental issues that extends further back into history than Google Trends or any known survey schemes, although the Ngram corpus does have some limitations. The relationship between book content and public opinion is not simple, as the majority of the public do not write books (Acerbi et al. 2013). The public audience has changed as literacy has increased, particularly through the earliest 100 years of the study period, and earlier texts may also disproportionately represent religious, academic and educational interests (Altick 1957). An additional issue is that some words have uses outside the discipline of ecology (Żmihorski et al. 2012). For example, use of the word “conservation” could be attributed to archaeological or architectural conservation, and a manual search of the Google Books database reveals that older records for “extinction” frequently refer to non-ecological extinctions (Google Books 2013). However, in this case, the observed patterns are consistent in direction between environmental-specific keywords and those with multiple uses. Despite the limitations of the Google Ngram database it is thought to reliably represent public interest in other disciplines (Michel et al. 2011; Acerbi et al. 2013), and as a noninvasive method it has advantages over traditional surveys which can be limited by their sample populations (Couper 2000), and may be impacted by nonresponse or respondent bias (Phillips and Segal 1969; Groves and Peytcheva 2008).

Environmental campaigners, policy makers and managers may be concerned that the usage of seven environmental keywords in a subset of published literature appears to have already peaked, particularly as in some cases the peak occurred around 20 years ago (Table 1). Such peaks may have been caused by information dilution in more recent years, as new book topics have been invented at a rapidly expanding pace. However, a declining trend in environmental awareness is consistent with the findings of previous studies using internet search data (Mccallum and Bury 2013) and surveys of young people (Wray-Lake et al. 2010). If interest in environmental issues is in decline, a range of cultural and social factors may be responsible. It has been suggested that economic crises may reduce interest in the environment (Kahn and Kotchen 2010), however, all keyword peaks occurred before the 2007–2008 global financial crisis (Bordo 2008). It is possible that the public have undergone a form of compassion fatigue (Tester 2001; Moeller 1999), and have become desensitised to the environmental issues summarised by the keywords used in this study. The timing of the keyword peaks suggests that this may be the case: the earliest, “pollution”, was a major driver of the early environmental movement, following noticeable declines in air and water quality (Hays 2000), and fears of pesticide misuse (Phillis et al. 2013). However, the social and cultural issues that are of most interest to the public change over time (Burns 2008; Michel et al. 2011), and the keyword with the most recent peak is “sustainability”, a newer term applied to modern, holistic approaches to environmental management (United Nations 1987). It is concerning if interest in environmental issues follows fashion cycles because older issues may become neglected by policy makers without public support (Hobolt and Klemmemsen 2005). However, neglect of environmental issues in web searches and written literature may not present a problem for the environmental movement if the issues that they represent have been resolved (Burns 2008; Phillis et al. 2013), or if the public have accepted them as issues of importance and no longer desire to research them.

Despite the selection of the linear model over the polynomial for the term “biodiversity”, it is not clear whether this keyword followed an increasing or unimodal trajectory over the period 1970–2009 because of a large amount of variation over the last 10 years (Fig. 2). However, it can be concluded that this term has either not yet peaked, or peaked last out of the selected keywords. This suggests that new generations of environmental keywords can replace older terms; in this case “biodiversity” is commonly used as a more holistic replacement for “wildlife”. If terminology changes through time but the environmental issues that are of interest remain the same then public interest in environmental problems may not actually be declining. In fact, continual rebranding through the use of new terminology may increase public environmental awareness.


Historical records of word occurrence frequencies in published books can provide a long-term indicator of public interest in environmental issues. As measured by these indices, awareness of environmental issues is greater now than it was for much of the nineteenth and twentieth centuries, although there is evidence that interest may now have peaked and begun to decline. Further study should establish the existence of emerging trends in the words used to describe biodiversity conservation and environmental management. It should also investigate whether emerging concepts are continuously replacing existing terminology, or whether the extent of public interest in the environment is reducing over time.


  1. Acerbi A, Lampos V, Garnett P, Bentley RA (2013) The expression of emotions in 20th century books. PLoS One. doi:10.1371/journal.pone.0059030

    Google Scholar 

  2. Altick RD (1957) The English common reader: a social history of the mass reading public, 1800–1900. University of Chicago Press, Chicago

  3. Baram-Tsabari A, Segev E (2011) Exploring new web-based tools to identify public interest in science. Public Underst Sci 20:130–143. doi:10.1177/0963662509346496

    Article  Google Scholar 

  4. Barr S (2003) Strategies for sustainability: citizens and responsible environmental behaviour. Area 35:227–240. doi:10.1111/1475-4762.00172

    Article  Google Scholar 

  5. Bentley RA, Garnett P, O’Brien MJ, Brock WA (2012) Word diffusion and climate science. PLoS One. doi:10.1371/journal.pone.0047966

    Google Scholar 

  6. Bordo MD (2008) An historical perspective on the crisis of 2007–2008. Working paper No. w14569. National Bureau of Economic Research. Accessed 10 May 2013

  7. Burns S (2008) Environmental policy and politics: trends in public debate. Nat Resour Environ 23:8–12

    Google Scholar 

  8. Clapp BW (1994) An environmental history of Britain since the industrial revolution. Longman Group, Harlow

    Google Scholar 

  9. Couper MP (2000) Web surveys: a review of issues and approaches. Public Opin Q 64:464–494

    PubMed  Article  CAS  Google Scholar 

  10. Crawley MJ (2007) The R book. John Wiley and Sons, Chicester

  11. Ervin J (2011) Integrating protected areas into climate planning. Biodiversity 12:2–10. doi:10.1080/14888386.2011.564850

    Article  Google Scholar 

  12. Evans JA, Foster JG (2011) Metaknowledge. Science 331:721–725

    PubMed  Article  CAS  Google Scholar 

  13. Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2009) Detecting influenza epidemics using search engine query data. Nature 457:1012–1014. doi:10.1038/nature07634

    PubMed  Article  CAS  Google Scholar 

  14. Google Books (2013) Google Books search by date and keywords.,cdr:1,cd_min:1800,cd_max:1824&lr=lang_en. Accessed 10 May 2013

  15. Google Ngram (2013) Google Ngram data download webpage. Accessed 29 November 2012

  16. Grob A (1995) A structural model of environmental attitudes and behaviour. J Environ Psychol 15:209–220

    Article  Google Scholar 

  17. Groves RM, Peytcheva E (2008) The impact of nonresponse rates on nonresponse bias: a meta-analysis. Public Opin Q 72:167–189

    Article  Google Scholar 

  18. Hays SP (2000) A history of environmental politics since 1945. University of Pittsburgh Press, Pittsburgh

    Google Scholar 

  19. Hjorth-Andersen C (2000) A model of the Danish book market. J Cult Econ 24:27–43

    Article  Google Scholar 

  20. Hobolt SB, Klemmemsen R (2005) Responsive government? Public opinion and government policy preferences in Britain and Denmark. Polit Stud 53:379–402. doi:10.1111/j.1467-9248.2005.00534.x

    Article  Google Scholar 

  21. Kahn ME, Kotchen MJ (2010) Environmental concern and the business cycle: the chilling effect of recession. Working paper No. w16241. National Bureau of Economic Research. Accessed 10 May 2013

  22. Kollmuss A, Agyeman J (2002) Mind the gap: why do people act environmentally and what are the barriers to pro-environmental behaviour? Environ Educ Res 8:239–260. doi:10.1080/13504620220145401

    Article  Google Scholar 

  23. Marriner N, Morhange C (2012) Data mining the intellectual revival of “Catastrophic” Mother Nature. Found Sci. doi:10.1007/s10699-012-9299-2

    Google Scholar 

  24. Mccallum ML, Bury GW (2013) Google search patterns suggest declining interest in the environment. Biodivers Conserv. doi:10.1007/s10531-013-0476-6

    Google Scholar 

  25. Mccann E, Sullivan S, Erickson D, de Young R (1997) Environmental awareness, economic orientation, and farming practices: a comparison of organic and conventional farmers. Environ Manag 21:747–758

    Article  Google Scholar 

  26. Michel J-B, Shen YK, Aiden AP, Veres A, Gray MK, The Google Books Team, Pickett JP, Hoiberg D, Clancy D, Norvig P, Orwant J, Pinker S, Nowak MA, Aiden EL (2011) Quantitative analysis of culture using millions of digitized books. Science 331:176–182. doi:10.1126/science.1199644

    PubMed  Article  CAS  Google Scholar 

  27. Moeller S (1999) Compassion fatigue: how the media sell disease, famine, war and death. Routledge, New York

    Google Scholar 

  28. Nerlich B (2003) Tracking the fate of the metaphor silent spring in British environmental discourse: towards an evolutionary ecology of metaphor. 4:115–140

    Google Scholar 

  29. Nisbet MC, Myers T (2007) The polls—trends: twenty years of public opinion about global warming. Public Opin Q 71:444–470. doi:10.1093/poq/nfm031

    Article  Google Scholar 

  30. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In Calzolari N, Choukri K, Maegaard B, Mariani B, Odijk J, Piperidis S, Rosner M, Tapias D (eds) Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta (May 2010), pp 1320–1326

  31. Phillips DL, Segal BE (1969) Sexual status and psychiatric symptoms. Am Sociol Rev 34:58–72

    PubMed  Article  CAS  Google Scholar 

  32. Phillis CC, O’Regan SM, Green SJ, Bruce JEB, Anderson SC, Linton JN, Earth2 Ocean Research Derby, Favaro B (2013) Multiple pathways to conservation success. Conserv Lett 6:98–106

    Google Scholar 

  33. R Core Development Team (2012) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. Accessed 13 December 2012

  34. Radeloff VC, Beaudry F, Brooks TM, Butsic V, Dubinin M, Kuemmerle T, Pidgeon AM (2012) Hot moments for biodiversity conservation. Conserv Lett. doi:10.1111/j.1755-263X.2012.00290.x

    Google Scholar 

  35. Scheitle CP (2011) Google’s insights for search: a note evaluating the use of search engine data in social research. Soc Sci Q 92:285–295

    Article  Google Scholar 

  36. Sheal J (1984) Nature reserves, national parks, and post-war reconstruction, in Britain. Environ Conserv 11:29–34

    Article  Google Scholar 

  37. Tester K (2001) Compassion, morality and the media. Open University Press, Buckingham

    Google Scholar 

  38. United Nations (1987) Our common future. Report of the World Commission on Environment and Development. United Nations document storage website. Accessed 10 May 2013

  39. Van Liere KD, Dunlap RE (1981) Environmental concern: does it make a difference how it’s measured? Environ Behav 13:651–676

    Article  Google Scholar 

  40. Whiteley P (1981) Public opinion and the demand for social welfare in Britain. J Soc Policy 10:453–475. doi:10.1017/S0047279400001537

    Article  Google Scholar 

  41. Wray-Lake L, Flanagan CA, Osgood DW (2010) Examining trends in adolescent environmental attitudes, beliefs, and behaviors across three decades. Environ Behav 42:61–85. doi:10.1177/0013916509335163

    PubMed  Article  Google Scholar 

  42. Zar JH (2010) Biostatistical analysis, 5th edn. Pearson Prentice-Hall, Upper Saddle River

    Google Scholar 

  43. Żmihorski M, Dziarska-Pałac J, Sparks TH, Tryjanowski P (2012) Ecological correlates of the popularity of birds and butterflies in Internet information resources. Oikos. doi:10.1111/j.1600-0706.2012.20486.x

    Google Scholar 

Download references


I thank Lorraine Maltby and Philip Warren for informative discussions during the writing of this manuscript, and Hannah Worrall for comments on an early draft.

Author information



Corresponding author

Correspondence to Daniel Rex Richards.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Richards, D.R. The content of historical books as an indicator of past interest in environmental issues. Biodivers Conserv 22, 2795–2803 (2013).

Download citation


  • Environmental awareness
  • Google Books
  • Google Ngram
  • Public opinion
  • Culturomics