Introduction

A great deal of enthusiasm over “digital agriculture” and “big data” in agriculture has emerged among industry, venture capital, and an eager farm press. Several large-scale acquisitions suggest the value of farm data to be vast. Monsanto described their purchase of The Climate Corporation for over $900 million in 2013 as their “entry ticket into a $20 billion market opportunity,” while John Deere believed precision agriculture could help increase their company’s value by $25 billion (Plume, 2014). Similar enthusiasm has also permeated the academic literature in recent years (e.g., Coble et al., 2018; Weersink et al., 2018; Woodard et al., 2018).

Despite widespread interest, there is a noticeable deficit of objective research on how commercial farms actually use data and data services. For example, what kinds of data are farmers actually collecting? What software platforms do producers subscribe to and what features do these software solutions offer? Do producers share their data with outside service providers? Most importantly, how do these data and software services integrate into existing production systems to impact decision making and, subsequently, farm outcomes (e.g., yield, efficiency, profit)? There is a clear need to better understand how commercial farms in the United States use the data they collect and the value that data brings to their operations relative to the technical, and sometimes impractical, offerings of data service providers. Researchers and extension educators can better serve agricultural producers, service providers, and developers by addressing these questions.

Previous research has largely focused on adoption of precision agriculture hardware (e.g., Schimmelpfennig, 2016; Zhou et al., 2017; Lowenberg-Deboer & Erickson, 2019). Hardware adoption is an important component of a digital agriculture system and is a necessary step for implementing advanced data-intensive technologies (Khanna et al., 1999). However, hardware adoption by itself does not directly measure farmers’ use of the data they collect. Some studies have implicitly sought to quantify data use by evaluating the adoption of precision agriculture technology bundles (Lambert et al., 2015; Schimmelpfennig & Ebel, 2016; Griffin et al., 2017; Miller et al., 2017, 2019). That is, combinations of precision agriculture technologies likely serve as a proxy for the transition from data collection to data use (e.g., yield monitor and variable rate technology [VRT]). While evaluating the adoption of technology bundles lends some insight into the progression from data collection to data use, it still focuses on hardware adoption, and only implies data use.

Schimmelpfennig (2016) characterizes this relationship between data collection and data use through the concept of “information flows.” For example, producers must first collect geo-referenced yield data using a yield monitor; that information is then used to create a yield map using farm software or a service provider; the yield map is then used to create prescriptions for VRT input applications, such as seed or fertilizer. Using USDA Agricultural Resource Management Survey (ARMS) data, Schimmelpfennig (2016) shows that in 2010 48% (70%) of U.S. corn farms (acres) owned a yield monitor, but just 25% (44%) made a geo-referenced yield map and only 19% (28%) applied inputs using VRT. It is important to point out that yield monitor data can be used for purposes other than making yield maps and VRT prescriptions. For example, USDA ARMS data indicate that in 2010 U.S. corn farmers used yield monitor data to monitor crop moisture (52% of corn acres), conduct on-farm experiments (20%), and document yields (28%) (USDA Economic Research Service [ERS], 2020). In addition, in some cases yield monitor data may not indicate that VRT input applications are warranted. In any case, it is evident that not all farms collecting data are using it to make actionable decisions on their farms. The question that remains is to what extent has the gap between data collection and data use narrowed and what factors are influencing progress?

It is also important to point out that the 2010 ARMS data referenced here are the most recent ARMS data publicly available for corn via the ARMS Tailored Reports (USDA ERS, 2020). The 2016 ARMS survey is the most recent iteration of the ARMS survey for corn, but data from this survey are not yet publicly available. Lowenberg-DeBoer and Erickson (2019) summarize some of the general precision agriculture adoption rates for the 2016 ARMS corn survey.

The current research seeks to move beyond the study of hardware adoption towards better understanding of the farm data lifecycle. Instead of focusing on the adoption of hardware, this study employs a survey instrument where respondents are explicitly asked what types of data they collect (i.e., data collection), the degree to which the data they collect influences their farm management decisions (i.e., data use), and finally, their perception of the outcomes from their data-informed decisions (i.e., data impact). The objective of this study is to identify where commercial U.S. corn and soybean farms lie in the farm data lifecycle and to determine the factors associated with producers’ progress in the collection, use, and impact of farm data on their operations. The research hypothesis motivating the current study is that data collection is common among commercial farms, but the extent to which farm data is used to make actionable decisions lags behind data collection. It is also hypothesized that farm/farmer demographics, data management and analysis resources, technology use, and data sharing will be associated with progression within the various stages of the farm data lifecycle.

Materials and methods

A data lifecycle provides a high-level framework for representing the stages of data throughout their life (Demestichas & Daskalakis, 2020). Adapting and extending Schimmelpfennig’s (2016) concept of “information flows,” the farm data lifecycle described here is simplified into a three-stage process (Fig. 1):

Fig. 1
figure 1

Farm data lifecycle

  1. 1.

    Stage #1: Data collection—farms choose to collect data or not.

  2. 2.

    Stage #2: Data use—farms that collect data decide the extent to which their data will influence their decision making.

  3. 3.

    Stage #3: Data impact—farms whose data influences their decisions evaluate the impact of data-informed decisions on farm output, efficiency, profitability, etc.

In practice, there are a variety of farm data types that can be collected in stage #1. For simplicity, this research focuses on three types of data: yield data, grid or zone soil sample data, and aerial or satellite imagery data. It is important to point out that different farm data types may have different lifespans. For example, soil organic matter (SOM) changes slowly. Therefore, SOM data may be collected once and that data may influence decisions for several years. Conversely, satellite/aerial imagery data may be used to detect and manage in-season crop deficiencies, but may be irrelevant to next year’s management decisions. Further, other data types, such as yield monitor data, may be collected annually for several years and layered to enhance its decision value. While the issue of data lifespan and the role of data collection frequency in decision making is not taken up directly in this study, it is an important aspect of the farm data lifecycle to be explored in more detail in future research.

Similar to the various data types in stage #1, a number of decisions could be influenced by data in stage #2 of the farm data lifecycle. Respondents indicating they collected one or more of the queried types of data were asked in stage #2 about the extent to which that data influenced three different decisions: seeding rate decisions, nutrient management decisions, and drainage investment decisions. In practice, there is a continuous scale of the degree to which farm data can influence decision making. However, for simplicity, respondents were given three potential levels of impact: not at all, somewhat, or a lot.

Finally, in stage #3 of the farm data lifecycle the impact of data-informed decisions on farm output was assessed for each of the three possible decisions in stage #2 (seeding rate, nutrient management, drainage investment). For example, if a respondent indicated the data they collected influenced their seeding rate decisions somewhat or a lot, they were asked whether their data-informed seeding rate decisions decreased, did not change, or increased their farm yield. Again, the range of yield impacts associated with data-informed decisions span a continuous scale, but discrete yield impacts (decrease, no change, or increase) were used to simplify the survey. Notice that the third and final stage of the farm data lifecycle described here does not necessarily imply that the data “die.” Instead, stage #3 initiates the next iteration of the cycle as data impacts can only be evaluated through additional data collection (stage #1). Death of data relates to data lifespan, which was not examined in this research.

While a variety of measures could be used to evaluate the impact of data-informed decisions, yield was used in this study as it is the value for which respondents were expected to have the easiest recall. An economic measure, such as profit or efficiency, would be preferred. However, few farms have directly estimated the return on investment for precision agriculture technology or data investments (Pope & Sonka, 2020). Therefore, it was expected that asking farmers to recall or mentally account for the profit impact of their data-informed decisions would be laborious and subject to significant measurement error. Conversely, farms are often conversant regarding yield impacts of their decisions and can easily recall these values. Nonetheless, yield impacts ascribed to data-informed decision making reported here likely represent respondent perceptions of yield impacts, as opposed to actual yield impacts, given the complexity associated with causally identifying the impact of these data-informed decisions on farm outcomes. Nonetheless, farmer perceptions are often what determine adoption (or disadoption) decisions (Pannell et al., 2006), and despite their limitations, are an important part of examining the current state of the farm data lifecycle.

In addition to questions regarding the farm data lifecycle, respondents were asked several questions regarding data management and sharing practices to gauge the farm’s investment in a data strategy. Demographic questions were included in the survey to collect information about the farm.

A phone survey of U.S. commercial corn and soybean producers conducted from August 5, 2019, to August 30, 2019 was used to collect the data for this study. Institutional Review Board (IRB) approval was obtained for the study from Purdue University (IRB Protocol #1906022382). The survey list frame of commercial corn and soybean producers was purchased from Farm Journal, and the survey was administered by PRISM Marketing Group by phone. All enumerators read from the same survey script to ensure consistency. Given that data collection was done by phone, the survey was designed to be completed by respondents in less than 10 min. Questions were made to be short and easy to understand to encourage producer responses. Participation was voluntary and no compensation was provided for participating. A copy of the survey is available in the supplementary appendix (Appendix A).

The survey was intentionally targeted toward commercial U.S. corn and soybean farms defined as operations with farmland of 1000 acres or more. This targeted approach is taken given the propensity of large operations to collect these types of data. To ensure operation size diversity within the sample, quotas were imposed for survey sampling procedures. The USDA’s 2017 Census of Agriculture reported 172,793 farms with more than 1000 acres of farmland operated in the United States (USDA NASS, 2020). Notice, this size classification is based on farmland acres operated which is distinct from cropland acres or corn and soybean acres. Given this population, a survey sample size of 383 is necessary to ensure a sample with a confidence level of 95% and a margin error of 5%. However, over half of farms with more than 1000 acres of farmland operated (87,666 farms) have less than 2000 acres of farmland operated (USDA NASS, 2020). To ensure that the sample was representative of larger-scale farms, and not just those operating less than 2000 acres of farmland, quotas were imposed on the data collection process that required the final sample to include at least 400 respondents farming between 1000 and 1999 acres of farmland and at least 400 responses farming 2000 acres or more.

The list frame was filtered based on farm size information available in the frame to help reach respondents that met the size criteria. Observations were randomly drawn from the filtered list frame, and farm size quotas were implemented using a screening question at the beginning of the survey, “How many total acres do you operate? Less than 1000 acres, 1000–1999 acres, 2000–4999 acres, or 5000 or more acres.” In total, 7841 farmers were contacted that met the size criteria. Of those, 934 started the survey and 800 completed the survey for a response rate of 10% – 400 in the 1000–1999 acre category and 400 in the ≥ 2000 acre category.

Data are summarized and Fisher’s (1922) exact test, a variation of the χ2 test, is used to determine if the responses across subsamples are statistically different. Significant relationships identified using Fisher’s exact test represent associations between variables. However, it is important to point out that these relationships do not imply causality. Rather, the Fisher (1922) approach can reveal useful correlations between a farm’s stage in the data lifecycle and the farm data tools and practices employed by the farm, as well as key farm and operator characteristics.

Results and discussion

Summary statistics for the full sample are reported in Table 1. Per sampling quotas, half of the farms surveyed operate between 1000 and 1999 acres, and half operate 2000 acres or more (36% are between 2000 and 4999 acres and 14% have 5000 acres or more). It is important to reiterate that the sample is intentionally not representative of all U.S. farms. Instead, the sample is specifically focused on commercial-scale farms (defined as farming 1000 acres or more) given their propensity to be involved in digital agriculture and this study’s objective to learn more about the gap between data collection and data use and factors influencing progress in narrowing the gap between the two. More than 80% of respondents are over the age of 50, and 35% are over the age of 65. For comparison, the average age of U.S. farmers is 59 (USDA NASS, 2020). Nearly half of respondents have a bachelor’s degree or higher.

Table 1 Summary statistics (n = 800)

In terms of technology use, 80% of the farms surveyed have access to high-speed internet, compared to 74% of rural America as a whole (Wilmoth, 2019). Less than half of respondents reported using either Microsoft Excel or a farm data software product to store, manage, or analyze data. Finally, adoption rates of common precision agriculture technologies are relatively high among survey respondents. GPS guidance/autosteer is used on more than 90% of surveyed farms. 71% of farms surveyed use variable rate fertilizer and 59% use variable rate seeding. Drones/UAVs are used by 26% of the farms surveyed. While these adoption rates are higher than in the general population of U.S. farms (e.g., Lowenberg-DeBoer & Erickson, 2019), they are close to Thompson et al. (2019), who use a similar sampling method focused on commercial sized farms.

Data collection

Data collection was common with 82% of the 800 respondents colleting yield data, 77% collecting grid or zone soil sample data, and 47% collecting aerial or satellite imagery data. When aggregating across all three data types, 93% of respondents collected at least one of the queried data types.

Comparing demographic variables and data resources/technology use for non-data collectors and data collectors yields several interesting relationships (Table 2). For example, larger farm size is significantly associated with data collection for each of the queried data types (Fig. 2). This relationship is consistent with previous research indicating larger farms are more likely to adopt precision agriculture hardware (e.g., Daberkow & McBride, 2003; Fernandez-Cornejo et al., 2001; Roberts et al., 2004; Schimmelpfennig, 2016; Schimmelpfennig & Lowenberg-DeBoer, 2020). In addition, higher levels of educational attainment are also significantly associated with data collection (Fig. 3), consistent with previous hardware adoption literature (e.g., Fernandez-Cornejo et al., 2001; Roberts et al., 2004).

Table 2 Comparison of data collectors and non-data collectors (n = 800)
Fig. 2
figure 2

Proportion of respondents who collected satellite/drone imagery, soil sample, and yield monitor data by farm size

Fig. 3
figure 3

Proportion of respondents who collected satellite/drone imagery, soil sample, and yield monitor data by farm education attainment

Data resources and technology use variables were also statistically related with data collection (Table 2). Again, decisions to adopt many of these resources and/or technologies are likely correlated with the decision to collect data. Therefore, it is important to reiterate that the relationships identified in this study are not causal. Moreover, in many cases the direction of the causal pathway cannot be determined as these decisions are made simultaneously. For example, this result does not indicate that farm data software use leads to data collection as it is just as likely that the decision to collect data leads the farm to invest in a data software package. Instead, the associations identified here simply reject the null hypothesis that software use and data collection are independent practices.

Examining the 800 survey responses more closely, just 7% of respondents did not collect yield monitor, soil sample, or satellite/drone imagery data. These non-data collectors identified cost and uncertainty about how to use the data as the two most common impediments to data collection (Fig. 4). Interestingly, privacy concerns (10%) were the least oft cited impediment to data collection among these farms despite well publicized privacy concerns (e.g., American Farm Bureau Federation 2015). When asked about their likelihood to begin collecting data in the future, few non-data collectors indicated a strong propensity to start collecting data in the future suggesting that the move to collecting data is approaching maturity among commercial-scale farms (Fig. 5).

Fig. 4
figure 4

Reason for not collecting farm data (n = 59)

Fig. 5
figure 5

Likelihood of non-data collectors to start collecting yield monitor, soil sample, and satellite/drone imagery data in the future (n = 59)

Data use

Respondents collecting one or more of the queried data types (yield monitor, grid or zone soil sample, and/or aerial or satellite imagery data), indicated that data they collected commonly influences their seeding rate, nutrient management, and drainage investment decisions (Fig. 6). Nutrient management decisions are the decisions most commonly influenced by farm data, with 93% of data collectors indicating that their data influenced their nutrient management decisions at least somewhat, and more than half indicating that data they collected influenced their nutrient management decisions a lot. Seeding rate decisions are also commonly influenced by farm data, with 81% of those who collected data indicating their data influenced their seeding rate decisions at least somewhat. Of the queried decision types, drainage investment decisions are the least likely to be influenced by farm data, although use is still common with 71% of those who collected data indicating it influenced their drainage investment decisions.

Fig. 6
figure 6

Extent to which data collected influenced seeding rate, nutrient management, and drainage investment decisions for data collectors (n = 741)

Relationships between farm demographics and technology use variables, and the extent to which farm data influences seeding rate, nutrient management, and drainage investment decisions are explored in Tables 3, 4, and 5, respectively. Notice that Tables 3, 4 and 5 include additional variables beyond what is found in Tables 1 and 2. Tables 3, 4 and 5 include variables that are only relevant to the subsample of data collectors. For example, only farms that actually collect data would choose whether or not to designate an individual on their farm to be primarily responsible for collecting, managing, and analyzing data. Similarly, only farms that collect data would face the choice to create GPS maps, layer data, or share the data they collect. Hence, while these variables are irrelevant to the data collection decision, they may be associated with the extent to which the farm uses the data it collects or the impact of farm data on farm outcomes.

Table 3 Comparison of extent to which farm data influences seeding rate decisions for data collectors (n = 741)
Table 4 Comparison of extent to which farm data influences nutrient management decisions for data collectors (n = 741)
Table 5 Comparison of extent to which farm data influences drainage investment decisions for data collectors (n = 741)

Demographic variables are associated with the extent to which farm data influences seeding rate and nutrient management decisions. Most notably, larger farm sizes, higher educational attainment, and designation of an employee responsible for collecting, managing and analyzing data are positively related to the level of influence of farm data on seeding rate and nutrient management decisions. However, none of the demographic variables were significantly associated with the extent to which farm data influences drainage investment decisions. Nutrient management and seeding rate decisions are different from drainage investment decisions in several ways that may influence this result. Most notably, nutrient management and seeding rate decisions are annual crop management decisions that must be made on all crop acres. Conversely, drainage investment decisions are capital intensive, long-term investment decisions that are typically only evaluated on a small portion of farm acres each year.

Data resources are also associated with the extent to which farm data influences seeding rate, nutrient management, and drainage investment decisions. Farms using Microsoft Excel or farm data software have significantly higher levels of data use. Additionally, creating GPS maps, ability of farm software to layer data and generate recommendations, and following those recommendations all accompany higher levels of data use. This highlights the importance of resources for managing and analyzing farm data. While the landscape of farm data software products is currently vast without a clear-cut market leader, these results suggest that using some form of data software aids farmers in gleaning usable information from their farm’s data.

Technology use is also associated with the extent to which data influences farm decisions. VRT fertilizer applications, VRT seed applications, and drone/UAV use are all positively and significantly associated with higher levels of data use. These relationships are unsurprising given that technologies such as VRT require some level of data use to generate VRT prescriptions.

Finally, data sharing is also associated with the extent to which data influences farm decisions. On one hand this is not surprising given the additional value that can be added to farm data by an outside service provider with expertise in managing and analyzing farm data. On the other hand, potential privacy concerns are often an impediment to data sharing (e.g., American Farm Bureau Federation, 2015; Sykuta, 2016; Ferrell, 2017; Miller et al., 2018). While this research does not address these privacy concerns directly, this relationship suggests that when consulted, third-party farm data service providers are adding value to the farm data lifecycle by helping farmers use their data.

Data impact

Respondents who indicated that farm data influenced their decisions overwhelmingly reported that data-informed decisions increased yield, regardless of the decision (seeding rate, nutrient management, or drainage investment) or the extent to which data influenced the decision (somewhat or a lot) (Fig. 7). It is important to reiterate that yield impacts ascribed to data-informed decision making reported here likely represent respondent perceptions of yield impacts (as opposed to actual yield impacts) given the complexity associated with causally identifying the impact of these data-informed decisions on farm outcomes. Nonetheless, farmer perceptions of these yield impacts are likely to influence future farm decisions (Pannell et al., 2006), and therefore, are an important part of examining the current state of the farm data lifecycle.

Fig. 7
figure 7

Perceived impacts of data-informed seeding rate, nutrient management, and drainage investment decisions on yield outcomes by the extent to which farm data influenced decisions

It is not particularly surprising that so few respondents indicated that their data-informed decisions decreased yield. On one hand, there is likely survivorship bias associated with this response (Elton et al., 1996). That is, farms that previously perceived yield to decline as a result of data-informed decisions are unlikely to continue using their data to make decisions. There may also be social desirability bias in this distribution of yield impact responses (Fisher, 1993). That is, respondents have the tendency to answer questions in a way that they perceive will be viewed favorably, even if it is not true. Hence, it is possible that some respondents indicated yield increases not because it was reflective of reality, but because they felt that it would be viewed poorly if they indicated that their data-informed decisions did not increase yield. In any case, it is clear that farmers who collect and use farm data perceive their data strategy to be associated with more favorable yield outcomes.

Demographic variables were generally not associated with perceived yield impacts from data-informed seeding rate (Table 6), nutrient management (Table 7), or drainage investment (Table 8) decisions. The one exception was the designation of an employee responsible for collecting, managing, and analyzing data, which was positively and significantly associated with the perception that yield increased as a result of data-informed nutrient management and drainage investment decisions.

Table 6 Comparison of data-informed seeding rate decisions on perceived yield outcomes for those whose data influenced their seeding rate decisions (n = 601)
Table 7 Comparison of data-informed nutrient management decisions on perceived yield outcomes for those whose data influenced their nutrient management decisions (n = 688)
Table 8 Comparison of data-informed drainage investment decisions on perceived yield outcomes for those whose data influenced their drainage investment decisions (n = 524)

Use of data management resources were commonly associated with the perceived impact of data-informed decisions on yield outcomes. Microsoft Excel and farm management software were both associated with perceived yield increases. Similarly, creating GPS maps and the ability of farm software to layer data are both significantly associated with favorable yield impacts.

Relationships between technology use and perceived impacts of data-informed decisions on yield outcomes were mixed. Farms using VRT fertilizer and VRT seed applications were generally more likely to report favorable yield outcomes. However, it is interesting to note that while VRT seed applications are correlated with yield increases from data informed seeding rate decisions, VRT fertilizer applications are not associated with yield increases from data informed nutrient management decisions.

Finally, sharing data and following the recommendations provided by third-party data service providers is also significantly associated with more favorable perceptions of yield outcomes associated with data-informed decisions. Again, it is not surprising that outside service providers with expertise in managing and analyzing farm data would be able to add value to the farm data lifecycle, especially for annual crop management decisions. Data sharing is not statistically related to perceptions of yield outcomes resulting from data-informed drainage investment decisions.

Conclusions and implications

The objective of this study is to better understand the current positioning of U.S. commercial-scale corn and soybean farms within the farm data lifecycle. Respondents were explicitly asked about what types of data they collect (i.e., data collection), the degree to which the data they collect influences their farm management decisions (i.e., data use), and finally, how they perceive their data-informed decisions have impacted yield on their farm (i.e., data impact). Results indicate that the majority of commercial-scale U.S. corn and soybean farms collect data, indicate that the data they collect influences their decisions, and perceive yield increases as a result of their data-informed decisions. However, farms vary in intensity of their data usage. When interpreting these results, one must keep in mind that the sample of U.S. commercial corn and soybean farms in this study is intentionally not representative of all U.S. farms, although it is representative of farms producing the majority of these two crops. Further, responses at various levels of the farm data lifecycle may be influenced by selection bias, survivorship bias, and/or social desirability bias. Therefore, results are not generalizable to the broad population of U.S. farms.

Previous research indicated that bundling precision agriculture hardware products provides the best opportunity to maximize the economic returns to a precision agriculture system (Lowenberg-DeBoer, 2003; Lambert et al., 2015; Schimmelpfennig & Ebel, 2016; Miller et al., 2017; Thompson et al., 2019). Results of this analysis suggest that it may be important to look beyond hardware when considering the bundle of resources that will maximize the returns to a precision agriculture system. For example, investments in farm software products that can be used to manage and analyze data, including creating GPS maps, layering data sources, and providing recommendations, are associated with progression within the farm data lifecycle. In addition, investments in human capital, either in on-farm employees with designated data responsibilities or in trusted off-farm data services providers that farms are willing to share data with, are also associated with progression within the farm data lifecycle. Therefore, farms wanting to develop and implement a successful farm data strategy should consider the combination of hardware, software, and human capital resources that best fit their farm.

The data revolution in agriculture is fully in motion among U.S. commercial corn and soybean farms as the majority of these farms collect and use data on their farms. Farms that have not yet invested in data management and data analysis resources may be missing out on potential benefits associated with using their farm’s data to improve on-farm decision making. It is important to point out that the benefits are only observable for those that actively manage their data and cannot be assumed as the counterfactual for those that have not invested in data management. Nonetheless, the potential benefits are real. Previous research has shown varying levels of economic benefits associated with precision agriculture use, with small positive economic benefits being the most common conclusion (Griffin et al., 2004; Schimmelpfennig, 2016, 2018; Schimmelpfennig & Ebel, 2016; McFadden, 2017; DeLay et al., 2020; Dhoubhadel, 2021). Even small benefits associated with collection and use of farm data can have big implications for farm structure, as many of the resources discussed here (hardware, software, and human capital) embody fixed costs. Therefore, they are more likely to be adopted on larger farms, where fixed costs can be spread out over more acres. As a result, precision agriculture is hypothesized to have spurred further increases in farm size in recent years (MacDonald et al., 2018). As farms continue to develop and hone their data strategies and product/service providers continue to improve their offerings, digital agriculture could play an important role in continued consolidation within the U.S. farm sector.