Introduction

The Food and Agriculture Organization (FAO), the United Nations institution that supports global food security, has a clear vision for sustainable food and agriculture: food should be nutritious and accessible for everyone, and natural resources should be managed in a way that maintains ecosystem functions to support current, as well as future human needs. The key principles of sustainability for food and agriculture in the FAO vision include increasing productivity, employment, and value addition in food systems; protecting and enhancing natural resources; and improving livelihoods and fostering inclusive economic growth. In contrast to the complex and multi-functional concept reflected by the FAO, for many people, “sustainable agriculture” and “regenerative farming” imply, at least to some extent, a return to traditional farming methods. On the other hand, the applications of digital agriculture technologies are increasing rapidly, with increased interest from the new generation of farmers to use digital solutions (Kayad et al., 2022).

A series of workshops was held in 2022 between technology, research, and business stakeholders from Israel and the UK focusing on data-driven agriculture in the world of sustainable farming resulting in this brief communication, reflecting long discussions and careful thought. This communication will argue that sustainability in our food and fiber agriculture systems cannot be achieved without using all the knowledge, technology, and resources available, including data-driven agricultural technology and precision agriculture methods. Evidently, data collected by sensors and digested by artificial intelligence (AI) can guide farmers to precisely and rationally apply external inputs, e.g., water, fertilizer, pesticide for crops, and nutrients and medicine for livestock. Moreover, they can be used to learn about synergies between the domains of natural systems that are key to simultaneously achieve sustainability and food security. These synergies include interactions between plants, the environment, beneficial insects and fungi, grazing animals, the digested plant and nutrient returns from animals, and the health of soil and crops. This communication will summarize key characteristics of sustainable agriculture, outline the benefits of data-driven agriculture for adopting the principles of sustainable agriculture, outline constraints and challenges to using data-driven agri-tech to achieve sustainability, and identify priority research to address the challenges of creating data-driven sustainable agriculture. Figure 1 illustrates how public funding for research on those high-payoff topics is expected to break through the various barriers, one by one, and facilitate the adoption of data-driven sustainable farming practices. It is hoped that this communication will be of interest to advocates of sustainable agriculture from all perspectives, including agricultural researchers and policymakers.

Fig. 1
figure 1

Breaking through the barriers to adopting data-driven sustainable agriculture practices requires public investment in research of priority topics. Funding to back up research in critical areas is expected to yield a high payoff

Key characteristics of sustainable agriculture

The Brundtland report (1987) defined sustainability as 'the ability to meet the needs of the present without compromising the ability of future generations to meet their own needs'. Sustainability, in a more pragmatic sense, can be defined as improving a system's productive performance without depleting the resources upon which its future performance depends (Jones et al., 2011; Turner et al., 1994). The purpose of agricultural production is to sustainably provide food and fiber for human consumption; as such, sustainable agriculture's focus must also consider its role beyond the management of crops or livestock within a field or even a farm.

A sustainable agricultural system’s complex and multi-functional attributes need to utilize data and understanding at many levels within a global and complex food and fiber production system. Precision agriculture is one strategy to realize these goals. Sustainable agriculture concepts overlap substantially with the principles of “conservation agriculture” and more recently with “regenerative agriculture.”

Agricultural systems can be considered through the lens of five forms of capital: natural, social, physical, financial, and human (Goodwin, 2003). Sustainable agricultural systems aim to ensure that capital in any form is not eroded (i.e., strong sustainability in that there are no trade-offs between different forms of capital indefinitely) while providing production, consumption, and distribution objectives within the farm and across society. Long-term economic wealth from farming without trading off system resilience can be achieved by relying on key principles to achieving sustainable agriculture (ARO, 2018):

  1. 1.

    Reduce external inputs (pesticides, fertilizers, water, and energy).

  2. 2.

    Recycle all organic wastes (“zero waste”).

  3. 3.

    Conserve soil and water.

  4. 4.

    Develop a system that sustains and supports agriculture, organismal biodiversity, and local habitats, and

  5. 5.

    Improve animal and human/social welfare.

Agricultural systems are those where complex trade-offs exist between different farm resources (e.g., land, labor, physical and financial resources). Moreover, all agricultural systems are inevitably exposed to external factors such as climate, markets, and regulatory environments influencing and increasing uncertainty of their long-term success. There are various metrics for farming system success, but in subsistence farming, household food security is the primary indicator for long-term profitability. The foremost challenge with sustainable farming is integrating both internal systems of production with external factors to enhance timely whole-farm decision-making. In addition, we need to consider farmer behavior and their values, e.g., risk aversion and satisficing (Behrendt et al., 2014; Hardaker et al., 2015), or preferences for developing different forms of capital, as these, in combination, determine their preferred choice of action from alternatives. There are additional challenges with monitoring the success, or otherwise, of implementing alternative strategies in achieving the objectives of sustainable farming. This is especially the case with potentially slow-changing variables that are not easily detectable (e.g., soil carbon, soil compaction, soil biodiversity and soil health, plant spices composition change in pastures) but potentially have significant impacts on the long-term sustainability of agricultural systems.

Benefits of data-driven agriculture for adopting the principles of sustainable agriculture

What is agricultural data? In this communication, “agricultural data” is any data associated with or useful to farming practice, farm economics, or farm environmental impacts. While there has been a focus in the recent research literature on “Big Data” which exceeds the capacity of traditional data processing methods (e.g., Kayad et al., 2022), this communication encompasses all data used in making agricultural decisions, from small data sets to Big Data. There are a number of ways to collect such data, including remote sensing imaging (e.g., Altzberger, 2013), by networks of weather, soil, plant, animal and farm machinery sensor data, also known as the “internet of things” for agriculture (Muangprathub et al., 2019), from in farm management information systems on many mechanized farms, and finally from the documentation of farming practice on individual farms.

A data-driven approach to sustainable agriculture allows one to incorporate all the knowledge, technology, and resources available to decision-makers. It provides the opportunity to deal with what are usually intractable environmental, social and economic problems in a meaningful timeframe. It enables inter-temporal risk management and trade-offs within and between different levels of the food and fiber production system. The principles of data-driven agriculture will facilitate adopting predictive and prescriptive management that considers greater complexity with higher accuracy than heuristic decision-making. Data-driven agriculture has the potential to be part of the solution to achieving sustainable agriculture for food and fiber production systems.

Data-driven methods have great potential to enhance the sustainability of food systems in four main areas. The first is the automation of data collection, including the ability to develop and deploy field and animal sensors, the creation of practical robotic systems, and the improvement of earth observation satellite systems, enabling the collection of high-quality and more accurate data. The second is big data processing by integrating machine learning and deep learning approaches in agriculture. These tools focus on developing learning systems and algorithms to study specific phenomena. Artificial intelligence is a highly interdisciplinary field based on different areas such as computer science, optimization theory, information theory, statistics, cognitive science, and optimum control (Cravero et al., 2022). Artificial intelligence approaches are revolutionizing almost every scientific domain and have created a data industry in a short time, making them significantly impactful for science and society due to their ubiquity and diverse applications. This is applied to recommendation systems, computer vision object recognition, informatics, data mining, and autonomous control for agriculture. An additional aspect of the data value is understanding the study of complex phenomena and system behaviors better through using new technologies. The third is the development of human–computer interfaces, improving the ease and use of insights through voice, text, and images, making the data and information accessible to farmers for decision support. However, many challenges remain in the application and implementation of data-driven sustainable agriculture due to the complexity of agricultural data with volume, variety, velocity, veracity, and tailoring relevant information creation itself. Several studies have highlighted these challenges of using a data-driven agriculture approach (e.g., Demestichas et al., 2020; Kayad et al., 2022; Zhang et al., 2014). A crucial question is how and to what degree data-driven agricultural systems can lead to future sustainable agriculture. Despite the considerable amount of literature dealing with the issue today, our understanding of using data-driven agriculture to ensure sustainability is still at an embryonic stage (Lioutas et al., 2019). The fourth, from a management point of view, data curation can also act as the “organizational memory” on a farm by preserving the knowledge implicitly present in past decisions. In our opinion, this role has been mostly overlooked so far. However, this aspect is increasingly important as traditional farmers, i.e., farmers who accumulated knowledge and expertise after the mid-twentieth century “green revolution”, are reaching retirement. As the practice of farm handover to the next generation is no longer the norm, retaining this generational knowledge is a challenge of critical importance that transcends cultural and regional boundaries. Thus, from a global perspective, such data curation would also serve the role of documenting the collective cultural knowledge of farmers and diverse farming systems.

Extensive use of data in agriculture promises to revolutionize not only farming practices but also facilitate a paradigm shift in academic research and knowledge exchange. From a scientific point of view, the ever-increasing abundance of data enables the investigation of increasingly complex relationships and, in particular, the investigation of the interactions between processes that occur at different spatial and temporal scales. Reliable data, i.e., validated and curated data, is a prerequisite for developing the type of models necessary to predict trends and, in particular, to investigate the expected impact of climate change on agricultural production. Data can also act as a bridge between scientists from different and contrasting disciplines (engineering, natural and social sciences) and facilitate collaborations centered around data interpretation. However, there is still a great challenge in deciding and designing data collection practices that address specific questions with the broadest impact, which rely on high information density, data standardization, and data access.

Constraints and challenges to using data-driven agriculture to achieve sustainability

While a data-driven approach in agriculture has the potential to be part of the solution to achieving sustainable agriculture for food and fiber production systems, it suffers from legal barriers, technical challenges, and economic and social constraints. All of these challenges impede the ability to share data to derive widespread benefits from it.

Legal barriers

Agricultural data is collected by and in demand from different sectors. Diverse stakeholders may claim ownership on the one hand and have different needs and interests on the other hand. Further, there are unequal benefits and, thus, adoption barriers to sharing data amongst the different sectors (e.g., Janssen & Charalabidis, 2012).

The principal stakeholders in farm data are the data producers, i.e., farmers themselves. Benefits to farmers from data sharing may include decision support for farming, benchmarking performance against competitors, or early warning for the risk of a pest or disease outbreak, amongst many others. However, these potential benefits may scale differently in different countries or farming systems (e.g., Sekhar & Sekhar, 2017), and there may be a reluctance to share data amongst data producers because of effort or cost of data curation, the effort in terms of time, standardization and cost required for the data sharing itself or perceived (lack of) benefits for doing so.

Agriculture companies are another large stakeholder, with agents across many different sectors developing so-called “data products” at a large scale (Bronson & Knezevic, 2016). Farmers may be concerned about data ownership and the cost of paying for the data they generate. Farmers worldwide often may feel that farming data such as inputs, agronomy decisions, proximate sensor measurements, yields, and individual farm accounting clearly belong to an individual farming entity (e.g., Castle et al., 2016; Jakku et al., 2018; Zhang et al., 2021). The agri-business information systems industry seeks to leverage these data to provide automated data capture as a service for farmers and agronomists. The value offered by these farming data tools is efficiency and context for farmers. Legislation is less clear, suggesting a distinction between data production per se and intellectual property ownership for information systems based on data production (e.g., Ellixson & Griffin, 2017; Wiseman et al., 2019). If the ownership of a resource is unclear, buying, selling, sharing, and managing that resource becomes problematic. In the specific case of remote sensing, for example, there are several arrangements for the use and ownership of data. Images from publicly owned satellites are largely released as open information. Those from privately owned satellites belong to the companies. The ownership of data from aerial photography or drones depends on the agreement between the farmer requesting that service and the provider. When governments perform aerial surveying, public policy dictates usage rights. These arrangements may be broadened to other data types.

Alongside the aspect of data ownership is also the aspect of privacy. The boundary between commercial farming data (e.g., growing conditions, input use, yields, equipment functions) and private data is unclear. Family businesses still dominate farming worldwide. Even where it is legally structured in limited partnerships and corporations for tax reasons, these entities are often family-owned businesses. Consequently, private information is often mixed with biological, physical, and business data. For example, personal financial records are often comingled with business records. The relationships between spouses, parents, children, and other family members are often discernible in field time logs, credit card and checking accounts, and telephone bills. Further, by the nature of the land-based enterprise, a lot of agricultural data, like remote sensing data and location of sensors, has a spatial component in the geolocation of the data collection, which is necessary and can add value but could reveal confidential information about individual farms. Finally, there is a potential stakeholder role for the government and society relating to farm data. Here, there is a balance between data supporting food security at national and international levels (Godfray et al., 2010) and anthropogenic negative impacts on the environment due to farming activity. Governments, therefore, should be increasingly interested in offering positive data-sharing incentives. Advancing the legislation would enable the utilization of these massive data continuously accumulated over time in the public interest.

Economic and social constraints

Even if ownership of agricultural data is clarified, privacy issues resolved, and data integration standardized, economic and social constraints to wider use of agricultural data will remain, including lack of demonstrated value, mistrust of data aggregation organizations, and the cost to adopt new technology. The economic value of information technologies depends on decisions changed by access to that new information. If changed decisions increase profitability, some portion of that increased return is attributable to the information and thus has value. But it is often difficult to track what decision would have been made without the new information to make such a comparison. Demonstrating the value of information technology is often easiest for specific problems. For example, weed, pest, or disease identification systems paired with effective management strategies. With automated information, the problem may be addressed and resolved; without the information, the problem would be addressed late if at all.

Demonstrating the value of system-level information is often more difficult because many more factors are involved. For example, information about yield differences between conventional and no-till systems may be confounded by weather, soil, agronomy, seed genetics, and the specific type of no-till equipment implemented. It may require detailed data from many farms over a long period to provide enough data to make a purely data-driven decision on no-till versus conventional tillage. In the meantime, the farm manager will continue to make decisions based on the usual mix of intuition and logic.

Achieving the full potential of data-driven sustainable agriculture will require pooling data over many farms. For most system-level decisions, aggregated data from many individual farms are required to make data-based decisions. But pooling farm data has proven difficult. Agricultural Big Data media coverage often focuses on the reluctance of farmers to share data, and a few more robust academic studies have confirmed that lack of trust (e.g., Castle et al., 2016; Jakku et al., 2018; Zhang et al., 2021). Farmers often worry that competitors will use their data to outbid them in the markets for land and other resources, by agribusinesses to target marketing, and by governments to impose even more onerous regulations. In other sectors of the economy (e.g., medical care), anonymization has facilitated data sharing. Anonymization would be useful for farm management, financial, and intensive livestock production data, but unfortunately, anonymization of farm field data would be very difficult. Soil type, yield maps, and other field spatial information provide a unique field signature that is easily searchable even if spatial coordinates are scrambled. Anonymization by a trusted organization is essential, but financial or other incentives would be needed to motivate data collection. Many farmers often indicate that data aggregation by a university, research institute, or cooperative would be most acceptable because of the perceived lack of motive to misuse the data.

Automated data-based decision support systems are commonly viewed as the means to increase the speed and effectiveness of the user’s ability to extract information. However, such systems may not be trusted, which poses a challenge to adoption by end users. Trust influences reliance on technology. Users may either place inordinate dependence on automated decision support to the point they misuse it, or else reject such decision support and disuse it (Lee & See, 2004). Lee and See (2004) suggested that properly conveying system capabilities, training users, and demonstrating how the decision support systems meet user goals, can facilitate trust. However, complex solutions such as deep learning solutions thwart understanding of how the algorithm functions. To counter this, Dorton and Harper (2022) proposed involving end users in developing systems and involving developers in training end users.

In most industrialized societies, farmers do not choose their careers because they want to spend hours in front of computer screens trying to interpret data. In most cases, at least part of their career choice is led by the desire for active, outdoor employment. The history of technology adoption indicates that farmers will increasingly use computer decision-making tools if they result in more profitable decisions and if they are easy to use. Generations who have grown up with information technology may adapt to agri-tech innovations more easily.

Technical barriers to broader use of agricultural data

While the promise and importance of large-scale data captured in agriculture systems are well known in academia and the agri-tech industry, several key challenges remain to accomplish this potential. In less than two decades, agriculture has gone from a field that suffered from a lack of data to a data-intensive field. A major concern is data quality. Historically, farmers have lacked the incentive to collect high-quality data or to store it in a standard format. The focus is on physically “getting the work done”, not on data. Consequently, yield monitors and other sensors may often be calibrated irregularly. Gaps in data occurred when sensors or positioning systems were not functioning, and field operations continued manually. Monitoring data about pests, water, and nutrition status are not recorded regularly, nor are pesticide, irrigation, and fertilization applications. Thus, the historical documentation that exists suffers from gaps and lacks standardization.

Another major challenge is how to bring all this data together. Farm data usually comes from heterogeneous sources. Some data is machine-generated (e.g., tractor engine work cycles, combine concave settings, planter seed drop). Some data is collected by remote and proximal sensing (e.g., satellite and drone images and atmospheric, soil, and plant sensors). Some data come from traditional farm record keeping (i.e., so-called “process mediated” data), and some data are human-sourced, including the increasing proportion shared on social media. For any given farm decision, all those data sources may be relevant, but combining them in a single data framework is a challenge. There have been recent attempts to envision a “standard system” for farm data (e.g., Bacco et al., 2019; Kamienski et al., 2019; Otto & Jarke, 2019). However, there has been an emphasis on systems for data harvest (e.g., EU Commission, 2021) and a proliferation of potential technological solutions for empowering farmers to access information derived from the agricultural data they generate. There is an opportunity to focus on commonalities in data collection across different agriculture sectors.

Artificial intelligence (AI) has recently been perceived as a solution for various data-oriented challenges. AI has grown to be a significant force in many sectors. In healthcare, AI algorithms can analyse large amounts of patient data and medical research to identify potential risk factors. In education, AI-powered chatbots can answer students' questions and provide feedback in real-time. In agriculture, there are many attempts to incorporate AI capabilities to develop decision support systems. Decision support systems based on supervised algorithms require robust and reliable training data sets. However, one must first know what data is critical to the particular data-driven solution and if and how needed data are collected. In the agriculture domain, data collection may be complicated or unavailable. Thus, data collection means must be developed. Developers and data underlying decision support tools are susceptible to bias. They are more likely to rely on data that is relatively easily acquired by existing systems and ignore or lack awareness of other essential features, i.e., they are prone to ‘searching under the spotlight’. If an automatic system does not currently support the critical features, they need to be collected manually until it is decided to put effort into automating their collection. However, manual data collection is laborious and methodologically heterogeneous by nature, hindering the development of transferable data ready for algorithms that can be applied to different conditions.

Another technical challenge that is often neglected is rural internet service. Internet access is essential for most data-driven agriculture technologies, but rural internet access is patchy in most of the world. Even in countries like the United Kingdom, where 98% of farmsteads have internet access, connectivity blind spots are common where there is only sporadic internet or even cell phone signal in fields and pastures.

High payoff research to address the challenges and resolve barriers

In just a few years, agriculture has moved from being a data-scarce sector to one of data abundance. Agriculture data opens many research opportunities, and discussion at the Israel-UK workshops identified the following high-impact research areas (not in order of priority):

  • Automated animal intake measurement. Accurate, cost-effective individual animal intake measurement would free constraints on feed efficiency research and application in commercial settings. There is a research opportunity in this area because feed companies do not fund it, and public funding has historically been limited.

  • Soil sensors to reduce the cost of soil nutrient information—One of the key reasons variable rate fertilizer adoption has been modest is the cost of manual soil sampling and laboratory testing. On-the-go soil sensing would eliminate that constraint.

  • Robot obstacle avoidance in crop and animal facilities. Automating avoidance of field obstacles will greatly decrease the costs involved with human supervision tasks for crop and livestock robots.

  • Combining remote sensing and crop and soil models for early detection of plant diseases and pests—Calendar-driven whole-field prophylactic pesticide application could be radically reduced with widespread, reliable, site-specific early warning systems for plant diseases and pests.

  • Research on extension methods for data-driven agriculture to improve food security and reduce the ecological footprint of agriculture. In particular, good examples are needed for the benefits of pooling data.

  • New methodologies to exploit on-farm genetic variation and local knowledge. This is the so-called Genetics by Environment by Management (GxExM) puzzle.

  • Business models of gathering and sharing farm data. The full potential of data-driven agriculture will only be achieved with pooled data.

  • Development of a trust-reinforcing regulatory framework for farm data gathering, sharing, and analysis is needed, along with appropriate business models to achieve the full potential of data-driven agriculture.

  • Development of decision support systems. There is a great need to transform Big Data into meaningful information that can support intelligent decisions that lead to more sustainable and profitable agriculture that are site-specific

Conclusions

Achieving sustainable agriculture is inherently knowledge intensive. Traditional agriculture relied on the limited capacity of the human brain to observe, analyze and remember the multitude of interactions and synergies that can make biological systems sustainable. Data-driven technology gives farmers, agribusiness, and researchers the tools to observe, record, and understand more of those interactions than human brain power allows. Examples of high-payoff data-driven agriculture research include technical topics like measuring livestock feed intake and soil sensors, new methods for collecting and using the data, and management innovations in business models for data sharing and developing a trust-reinforcing regulatory framework. Public funding for research is lacking in several critical areas identified in this paper as expected to generate high payoffs.