Introduction

Recording and evaluating the current state of the environment and estimating its future status are common objectives in environmental science and management. However, within the environmental sciences, many terms are used to describe the various approaches used to achieve these goals. Curiosity-driven, mandated, and question-driven monitoring are three examples (Lindenmayer and Likens 2009), but others such as long-term research and various types of monitoring such as long-term, surveillance, status assessment, non-targeted, regulatory, implementation, effectiveness, and ecological effects are also common (Wintle et al. 2010; Nichols and Williams 2006; Hutto and Belote 2013; Stem et al. 2005).

Among these various terms, two common themes emerge, and two words are often used to describe them: research and monitoring (Buxton et al. 2020). Commonly, these terms appear together in the environmental science and management literature (e.g., Buxton et al. 2020; Dörnhöfer and Oppelt 2016; Magurran et al. 2010; Lindenmayer and Likens 2009; Marsh and Trenham 2008; Parr et al. 2003) and are often used deliberately to refer to specific activities and goals. For example, the words “research” and “monitoring” may be used to clearly invoke and distinguish between specific scopes of work, approaches, or objectives of a given program or study (e.g., Marsh and Trenham 2008). However, each may also form part of a larger program and may interact with the other. For example, researchers may occasionally be engaged in monitoring and vice versa and each may support the other (McDonald-Madden et al. 2011; Yoccoz et al. 2001; Anderson et al. 2012; Harris et al. 2017; Qian and Miltner 2018).

Although there may be clear instances where one term or the other is used specifically and clearly, there are other situations where the distinction is not well established. For example, how research and monitoring may interact is not always standardized (Marsh and Trenham 2008 and references therein), and research can also be either explicitly included within a monitoring framework (Arciszewski et al. 2011) or be separate from it (Kelly et al. 2009; 2010; RAMP 2009). Additionally, both terms can easily be used to describe the same practices. Both research and monitoring often refer to measuring or testing the impacts of human activities and estimating risks to individuals and populations of humans, plants, or animals. For example, both research and monitoring are used to address the status of ecosystem health (e.g., O’Brien et al. 2016; Dobbs et al. 2011) typically through the measurement of state variables (Yoccoz et al. 2001), the concentration of a contaminant (Hellawell 1991), or the abundance of a mammal population (Hammond et al. 2021). Research and monitoring also overlap in other ways. Both are typically done by the same inquisitive individuals, and the activities share a high ethical standard (e.g., Mebane et al. 2019).

Often the conceptual and practical overlap of research and monitoring leads to their conflation, but this is further exacerbated by additional factors. Particular words may become jargon in particular programs (Stem et al. 2005). However, informal mechanisms, such as convenience, tradition, or culture, may also affect the diction of a program. As a broad example, long-term monitoring can easily be considered as a sub-type of research (Lindenmayer and Likens 2009), whereas others may consider similar programs as “surveillance” (e.g., Arciszewski et al. 2017; Summers and Tonnessen 1998; Messer et al. 1991).

Whether the terms are being used deliberately or casually, or to what they refer, is not always apparent when programs or their results are described in the environmental science literature potentially causing further confusion among participants (Marsh and Trenham 2008). The confusion, especially if it is not recognized, can also have real consequences. Importantly, the potential and specific differences in the intent of the words can substantially influence the development, direction, and progression of a given program. This can be especially true when programs are simultaneously addressing multiple and nuanced concerns of regulators, policymakers, industry, local users of landscapes, and often the general public, all with varying expectations and familiarity with environmental programs. In these types of programs, which are likely to become more common in the future (Haddaway et al. 2017), focusing on technically nuanced research questions for example may be initially prudent but may also not provide stakeholders with the information they want, need, or expect (Lindenmayer and Likens 2009). In contrast, a monitoring program focused solely on collecting data may also not meet its objectives (Lindenmayer and Likens 2009). Any discrepancies between the expected and actual outcomes may perpetuate the alienation of stakeholders (Beausoleil et al. 2022), but there may also be other ecologically or socially relevant consequences. Relevant impacts may not be detected, or unnecessary or low-impact/high-expense interventions may be promoted over other priorities. Continued divergence of expected and actual outcomes will likely increase discomfort with the risks the program was designed to address and may increase environmental harms, undermine the social license of the program and the reputations of sponsoring organizations, and/or divert attention away from more serious threats. All of these challenges increase when more complexity, including multiple habitat types, is added to the monitoring scope (Cronmiller and Noble 2018).

Discrepancies in how particular words are used are not new challenges in science and scholarship; the contrasts between science and engineering and applied and pure science have been explored in the past (e.g., Petroski 2011; Lucier 2012). As participants in environmental monitoring, we are keenly aware of the problems associated with the conflation of research and monitoring within large-scale, multi-stakeholder programs with participants from various backgrounds. In particular, we are familiar with the effect an absence of a shared understanding of foundational concepts like research and monitoring among participants can have on a program. Despite the need for coherence in such widely used terminology, there are few resources explicitly comparing and contrasting these two closely related concepts and activities either within a given program or across an entire discipline. While a given program can have many problems and errors—and many of them, such as unknowingly sampling at an ineffective frequency or missed measurements, may only be known in retrospect—many others, such as data incompatibility, or discrepancies in terminology among participants are foreseeable and, therefore, completely avoidable. The main purpose of this essay is to, at minimum, acknowledge the existence of one of these discrepancies: the differences between research and monitoring. At most, the purpose of this work is to identify both the unifying and discriminating characteristics of research and monitoring useful across multiple programs with the proximate goal of fostering the alignment of ideas, objectives, and approaches within a program and the ultimate strategic goal of enabling more effective environmental decision making.

Defining research and monitoring

As already described, sometimes the differences between research and monitoring are clear, but in others, they are less so. To begin the deeper discussion, we have devised narrow and practical definitions of research and monitoring to separate when the two interact. First, we consider research as a broad activity encompassing observation, experimentation, and all forms of scholarship used to learn (Stem et al. 2005). Environmental research can include formal observational studies (e.g., Yuan et al. 2016; Dunnett et al. 1998), manipulative experiments (e.g., Michelsen et al. 2012; Chapin and Shaver 1996), and reviews of information (Roberts et al. 2022). Professional scientific research often involves pursuing novel (and often multiplying) questions progressing toward a potentially vague, expanding (or shrinking), esoteric, or otherwise undefined or generic goal but also often with universally unknown answers, implications, and future paths. In other words, research has an inherent instability and operates at the bounds of existing knowledge, and failure is an option (although most of the failures are not widely broadcasted—unless they are funny). The types of activities we consider environmental research are, for example, studies on the investigation of ecological relationships or chemical processes in the environment, method developments for chemical measurements (Barrow et al. 2015), studies to determine optimal sampling times (Barrett et al. 2015), or ecosystem-level manipulations (Kidd et al. 2007).

In contrast to research, we consider environmental monitoring a more focused, stable, and often regulatory activity. Environmental monitoring is the pursuit of a defined outcome and is easily defined by its practices of routinized measuring and occasional (or regular) analysis of data (e.g., Summers and Tonnessen 1998; Messer et al. 1991), such as assessing the exceedance of an environmental quality guideline. Monitoring includes other defining characteristics: it usually has direct and legal links to regulatory instruments used to limit harms and is used to instigate management decisions, document the current system status of state variables in the environment, and assess changes over time (McDonald-Madden et al. 2011; Yoccoz et al. 2001), as well as many other specific objectives (Marsh and Trenham 2008; Hellawell 1991).

What criteria may distinguish research and monitoring?

Separating research and monitoring can often be difficult but also necessary to satisfy the environmental management objectives and to successfully operate a given program. The discussion above alludes to some initial criteria highlighting potential distinctions but also the clear areas of overlap. Very broadly, while many of the described activities could easily be construed as research, common features in all of the approaches used in monitoring include some level of confidence in the tools being used (including the known relevance of the answers) and the pursuit of practical and tractable questions (Anderson et al. 2012; Stephenson 2019), although monitoring programs still often must contend with uncertainty of estimates (Witmer 2005). Additionally, within research, all questions initially have at least some relevance, whereas, within a monitoring framework, some questions may be (at least initially) ignorable.

As we have added more criteria to the definitions presented above, the conflation should become apparent. For example, research can also pursue practical and tractable questions. This overlap tells us that further criteria are required. And, we suggest, there is a shift in the types of criteria used to differentiate research from monitoring. Conceptual criteria may be used to differentiate between the ideas of research and monitoring (Stem et al. 2005), concrete criteria can also be defined to differentiate between the implementation of research and monitoring (Table 1).

Table 1 Proposed conceptual and concrete criteria to distinguish environmental research and environmental monitoring; μc = mean of control; μt = mean of treatment

Questions and audience

Foremost, determining “who asks the questions?” and “who is the primary audience?” may distinguish research from monitoring (Table 1). In a research program, principal investigators (PIs) are typically responsible for developing specific questions. In monitoring efforts, this responsibility tends to fall to stakeholders, managers, regulators, or other government or civic agencies or groups, such as non-governmental organizations. This is not to say that PI questions cannot be the same as or be embedded in stakeholder (e.g., management) questions, that the two are unrelated, or that stakeholders do not also ask research questions. Instead, (as alluded to above) monitoring and research questions are closely associated and interactive and can be easily conflated. And, if they are conflated, the program may fail to meet its objectives; however, the failure is not certain, and any risk depends on the configuration of the program and its ability to identify, accommodate, or otherwise rectify such discrepancies.

While not the only possible configuration, a typical association among the types of questions in a monitoring program may serve as an illustrative example. Broadly, stakeholders are often interested in knowing the state of “ecological health” (e.g., Rapport et al. 1998; Bunn et al. 2010) or in the safety of any harvested foods: “can we eat the fish” is a common example (Beausoleil et al. 2022). Providing an answer to this question requires translation into what gets measured but also where, when, and how. However, how the translation of questions happens is not always universally accepted. While translating some objectives from conceptual management questions into monitoring practices may be relatively straightforward, such as comparisons of contaminant concentrations to consumption guidelines (although even guidelines can also be problematic; (Bilotta and Brazier 2008)), more complex scenarios are also common. For example, nebulous or holistic concepts like ecological health include many different definitions, assessment approaches, anthropomorphisms, and targets, making it challenging to universally operationalize (O’Brien et al. 2016; Wicklum and Davies 1995; Scrimgeour and Wicklum 1996). In both the straightforward and complex translations, monitoring is often used to ask if “the system has changed beyond some predetermined limits of acceptable change,” if “the system has changed according to some predetermined management objectives and is within the acceptable limits,” and/or if “the perturbation of concern has had no impact on the system, and all observed changes to the system can be attributed to other causes” (Legg and Nagy 2006). While these translations may be unsatisfactory, variations on these themes are implemented using indicators of the state of ecosystem health, such as the status of specific taxa and their physical distributions, biological communities, or the physicochemical environment and its integrity (e.g., Karr 1981). While some of these questions may also be addressed in research programs (and may become more common as data accessibility widens; Lindenmayer et al. 2015), these questions are typically in the purview of monitoring. But achieving this practical and technical proficiency is often supported entirely by research (again highlighting the tight coupling of monitoring with research).

Future work

Further criteria may also be necessary to separate research and monitoring. For example, subsequent work and future directions may also be helpful in distinguishing features. In research, the next steps may be initially (and specifically) unknown, but in monitoring, the next steps are usually known and often prescribed (Table 1). Research studies may reveal previously unknown areas of research or may provide new monitoring tools. In contrast, the outcome of monitoring is generally a management decision, such as altering the pace of monitoring or its spatial scope, fines, or the instigation of a research program (Environment Canada 2010).

Additionally, research is more likely to be additive (and/or multiplicative or even exponential) and progress away from an origin, while monitoring is more likely to be recursive (suggesting monitoring may often be more operationally and administratively complex than research). Research studies are often conducted by small teams on a focused topic in a few locations over a few years (Roberts et al. 2018), whereas some monitoring programs can include multiple stakeholders, industry representatives, government scientists, and program administrators. Monitoring studies are also often conducted over large spatial scales and may include many indicators and many years of stable data (Lindenmayer and Likens 2009). Similarly, research products are often seen as monolog, whereas the products and process of monitoring are dialog (Parr et al. 2003) including stakeholders (Conrad and Hilchey 2011).

Roles of participants

Additional criteria originating from the roles of various participants involved in any study may also be used to differentiate the work. For example, judging the relevance of the results and conclusions and deciding when it is complete can be used to separate research from monitoring. In research, PIs assess the relevance and completeness of the work along with (usually) anonymous peer reviewers and journal editors. In contrast, stakeholders, managers, and policy advisors generally govern the scope, direction, relevance, and completeness of monitoring efforts while regarding the inputs of technical staff. However, other factors may also contribute more to monitoring vis-a-vis research, such as a social milieu, zeitgeist, and other social pressures, but research is also affected by these influences.

Defining the relevance of observed changes

Embedded in the discussion above, including satisfying stakeholder demands and deciding the completeness of the work, is another feature separating research and monitoring: defining acceptable and unacceptable changes or differences. Just as questions in monitoring are often defined by a consortium of interested parties (e.g., stakeholder groups, government, industry), the acceptable environmental condition may be the product of consensus-driven processes and/or the adoption of a threshold derived from research (Bilotta and Brazier 2008). By contrast, researchers are more often concerned with like-to-like comparisons and may not be constrained by the ecological, political, or social relevance of their work; often, a statistically improbable result is enough. Consequently, research studies are rarely (but might be) concerned with acceptability, whereas monitoring programs must specifically use or establish the thresholds for acceptability.

Other (less reliable) distinguishing features

Other features may be used to distinguish between research and monitoring, but they are likely less globally reliable (Table 1). For example, the pace of reporting from research is typically fast but slow for monitoring. Risk tolerances for failure to deliver on promised results among the audience (and the practitioners) may often be high in research but low in monitoring. The same tolerance for failure may also apply to the need for formal hypothesis testing (low need in monitoring but high need in research). Indeed, monitoring can suffer from an absence of clearly defined and formal statistical (or even generic) hypotheses, leading to a data collection approach that has been termed “doing science backward” characterized as first measuring many parameters, for example, and then defining the questions (Lindenmayer and Likens 2009). While examining existing data sets in this way is not absolutely incorrect and can yield valuable and testable hypotheses (Arciszewski et al. 2017), it can be an issue when employed as a singular approach or promoted as yielding conclusive proof.

How a given program may address or react to different types of ignorance may also differ among research and monitoring. The acceptance of unknowns differs markedly when unknowns are defined compared to when they are undefined (known unknowns vs. unknown unknowns; Wintle et al. 2010; Gross 2007). For example, many researchers may be unlikely to be satiated, while known (and unanswered) questions remain soluble (Medawar 2021), while others are drawn to the challenges of transforming unknown unknowns into known unknowns. However, many may also temporarily accept the existence or potential influence of unknown drivers of uncertainty while developing research plans to address the new questions. Contrast this to monitoring where some known knowledge gaps may be accepted (e.g., indicators that are known to respond but are not directly monitored due to their overlap with existing indicators) if participants are already satisfied. However, monitoring programs are often sensitive to completely missing potentially high-consequence and unknown effect pathways, even if they are low probability (e.g., Miall 2013). The aversion created by susceptibility to these unknown unknowns can drive efforts unique to monitoring, such as perpetual surveillance of relatively stable indicators, opportunistic collection of data in anticipation of future needs, or the addition of more indicators or sites which may strain funding envelopes (Lindenmayer and Likens 2009; Wintle et al. 2010).

A final point related to the acceptability of changes and which groups decide on the completeness of the work is where the information from research and monitoring typically appear. Monitoring results typically appear in technical reporting, while results of research more often occur in the peer-reviewed literature. While the distinction is not absolute, these different venues can have important implications for the progression of various programs. For instance, monitoring studies are responsible for reporting all results, including null responses, whereas peer-reviewed literature has a widely acknowledged tendency (some call it a bias) to publish and promote positive results (Hanson et al. 2018; Button et al. 2016; Rosenthal 1979; Mahoney 1977; Lima and Wrona 2019).

Where does this leave us?

Among the criteria examined above, the patterns suggest that research programs are typically dominated by principal investigators (often one or few), and monitoring programs are dominated by stakeholders (often many). However, another theme in the above discussion is that the criteria can be insufficient when applied to specific cases, such as environmental programs which periodically shift the emphases of the work and combine the attributes of both research and monitoring, either in series or in parallel (e.g., Hewitt et al. 2008). Similar overlaps we described above between research and monitoring also occur in other closely related activities or pursuits, such as overlaps and interactions of science and engineering (e.g., Petroski 2011) or, more generally, pure and applied research (Lucier 2012). Arguably, the distinction between research and monitoring is one modern expression of historical differences between pure and applied research and highlights how attributes such as outcomes rather than processes can be used to separate closely related (and often interacting) activities (Lucier 2012). This also suggests that some working in environmental science and management may spend much of their time on monitoring, and others may emphasize research, but most will likely constantly oscillate between these two constructs suggesting there are few purely research and monitoring programs; instead, many are a likely a mixture and are better (and deliberately) described as “research and monitoring.” While each individual program must find the proportion of each that best fits its current goals, additional terminology may also be needed to further separate parts of the management suite of activities, such as the act of measurement (e.g., surveys) separated from the evaluation of those data (e.g., monitoring; Parr et al. 2003; Hellawell 1991; Burt 1994). Finally, while many may disagree with our taxonomy, two things are certain. Jargonization can be very troublesome when it is unacknowledged (Hassol 2008; Gibbs and Gibbs 2015; Stem et al. 2005; Salafsky et al. 2008) and when it is identified and needs to be rectified as soon as possible. The more quickly participants can agree on the configuration of the program and its operational parameters, including the differences or roles for monitoring and research, the more quickly it can also progress toward its goals. As the Berra-ism says it best, “if you don’t know where you’re going, you’ll end up someplace else.”