1 Introduction

Efforts to address equity and inclusion in agricultural data infrastructures face numerous challenges. People and networks are widely distributed geographically. This means some solutions to data problems may arise regionally and independently, yet many people are not easily able to engage with their distant colleagues to learn about them or collaborate. In general, constraints on funding for such projects are often national rather than international, and travel funding is not equally distributed. Finally, the breadth of activity means interdisciplinary communication is important but difficult and hard to sustain.

This chapter describes the ongoing transition of the Research Data Alliance (RDA) Interest Group on Agricultural Data (IGAD) into a Community of Practice. With practical examples, it explains how IGAD has helped identify and promote awareness of efforts around the world that may currently be restricted to one region but that have the potential to democratize participation in agricultural data management infrastructure initiatives and generally improve capacity for managing and leveraging agricultural data.

1.1 A Brief Introduction to the Research Data Alliance (RDA) and the Interest Group on Agricultural Data (IGAD)

The Research Data Alliance (RDA) is a community-driven initiative that was launched in 2013 by the European Commission, the United States Government’s National Science Foundation and National Institute of Standards and Technology, and the Australian Government’s Department of Innovation as a neutral space where its members could come together to develop and adopt infrastructure that promotes data-sharing and data-driven research (Berman & Crosas, 2020). As for today, the RDA has attracted over 12,000 members from 145 countries. The vision is: “researchers and innovators openly share and re-use data across technologies, disciplines, and countries to address the grand challenges of society” (Research Data Alliance, 2021).

The work of the RDA is conducted through self-organized Interest Groups (IGs) and Working Groups (WGs) that discuss solutions to real-world problems. Participation in one of the 97 existing groups is open to anyone who agrees to the RDA’s principles – usually experts from academia, private sector and government, who are attracted to these groups as a means to identify and build the infrastructure that is needed to overcome their research data management challenges.

The Interest Group on Agricultural Data (IGAD)Footnote 1 was formed in 2013, as a forum for sharing experience and providing visibility to research and work with agricultural data. Since then, it has grown in community strength to over 260 members, becoming one of the RDA’s most prominent Thematic Groups, serving itself as a platform to the creation of specific Working Groups. In keeping with RDA’s strategy, IGAD has supported the creation of five WGs: Wheat Data Interoperability, Rice Data Interoperability, Agrisemantics, On-Farm Data Sharing, and Capacity Development for Agricultural Data WGs.

2 Examples of Global Coordination in Previous IGAD Activities

The RDA holds a global plenary meeting every 6 months, in which the IGs and WGs participate to display and engage the wider community around their work, deliverables and outcomes. The IGAD and its associated WGs have played an active role at the RDA Plenaries, as a means to reach out and forge new alliances with other groups, as well as to create new offshoot groups aimed at specific challenges and solutions. During the plenary sessions, the IGAD has hosted a wide array of speakers and discussions, seeking to work alongside major international initiatives in agricultural research data management and interoperability from private and public organizations such as GODAN, CGIAR, FAO of the UN, INRAe, and Syngenta, among others. Prior to each of the RDA Plenaries, IGAD has also successfully organized pre-meetings to engage the agricultural data community in taking stock of existing issues and laying the groundwork for concrete future action.

To sustain engagement even through the Covid-19 pandemic, IGAD has conducted several webinars and virtual events. One of them focused on the theme ‘IGAD/RDA: Sharing Experiences and Creating Digital Dialogues’. The week-long event (25–28 May 2020) brought together 350 IGAD members to discuss semantics, crop data interoperability and experiences and lessons learnt from Asia, Europe, Africa and Americas, producing many interesting results and interactions. In 2021, IGAD promoted 30 min ‘Coffee Break’ Webinars, a new kind of webinar series to support the exchange of experiences within the agricultural data community, which consisted of virtual 15-min presentations on topics of interest, followed by 15 min of discussion. With presentations coming from participants all over the world to share their experiences, the sessions were also recorded for those who could not attend live. Virtual meetings have the advantage of allowing anyone to participate from anywhere and helps inclusion as there are no travelling costs involved. In fact, the events attracted many hundreds of interested people that approached the IGAD community for the first time.

From all WGs that have been created under the IGAD umbrella, the Agrisemantics and the Wheat Data Interoperability (WDI) Working Groups were particularly successful, with consensus recommendations being approved for implementation (Caracciolo et al., 2020; Yeumo et al., 2016). The Agrisemantics Working Group produced a set of recommendations to facilitate the adoption of semantic technologies and methods for the purpose of data interoperability in the field of agriculture and nutrition. To achieve so, between 2016 and 2019 the group gathered researchers and practitioners to study all aspects in the life cycle of semantic resources: conceptualization, edition, sharing, standardization, services, alignment, long term support (Caracciolo et al., 2020). Beginning with a landscape study, a number of use cases for the exploitation of agricultural semantic resources were analyzed. The outputs of the WG were synthesized into 39 ‘hints’ for users and developers of semantic resources, and providers of semantic resources’ services. A wide range of applications of the recommendations of the Agrisemantics WG followed – AgroPortal, for example, represents the importance of domain-specific repositories and tools for mappings, and VocBench offers a web-based platform for the creation and maintenance of semantic resources according to best practices.

With regards to the WDI Working Group, by the time it was created, in 2014, the goal was to make the best use of existing genetic, genomic, and phenotypic data in fundamental and applied wheat science. Given the ever-growing data deluge coming from modern technologies such as DNA (Deoxyribonucleic acid) and RNA (Ribonucleic acid) sequencing, high throughput genotyping and phenotyping, high throughput imaging and satellite monitoring, data interoperability became a priority for the wheat research community (Yeumo et al., 2016).

The WDI WG was formed by data and information practitioners and scientists from different organizations and countries, with a clear standpoint, which was to avoid the creation of new standards, but to provide a common framework for describing and representing data with respect to existing open standards. In order to converge and agree on specific recommendations, the WDI WG began by surveying the practices of the wheat research community. The proposed guidelines were then endorsed by the RDA and early adopted by organizations such as the Australian Center for Plan Functional Genomics, the French Institute National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAe), and the English Rothamsted Research. Recommendations are frequently revised to consider the evolving landscape of data practices and standards.

Replicating the methodology used in the WDI in the context of other crops was a challenge, though. It was noted that institutional support and the pre-existence of a well-structured and vivid community is an important prerequisite for the success of the WGs. The Rice Data Interoperability WG, for instance, had to be cancelled for not being able to sustain the effort needed to develop recommendations. The group is now in the process of being replaced by a more general Crop Data Interoperability WG. Soil experts are also committed to partnering with the IGAD.

3 Transitioning to a Global RDA Community of Practice

IGAD has helped create awareness about research data management within the food and agricultural community, linking with other communities to facilitate the adoption of RDA recommendations, inviting experts from different fields of expertise to join and enrich the dialogue and the sharing of knowledge, and encouraging researchers to share their experiences.

As a form of recognition of the IGAD’s role to promote the RDA within the food and agricultural data community, the interest group was the first to become a Community of Practice (CoP) under the formal structure of RDA. Although the RDA is not particularly concerned with establishing a single unified concept for a ‘community of practice’, the notion clearly draws from the original work by Lave and Wenger (1991, p. 98), where a community of practice is “a set of relations among persons, activity, and world, over time and in relation with other tangential and overlapping communities of practice”. The community of practice, according to the authors, would provide a proper social context for learning to take place.

In practice, an RDA CoP offers a discipline or domain the opportunity to create an open forum for the discussion, development and maintenance of specific and generic solutions to the data challenges faced by that community. By offering a forum to discuss data-related trends and challenges, CoP members will learn from one another experiences and collaborate on implementing solutions. It supports the RDA to attract new individual, organizational, and regional members, including researchers and stakeholders from low and medium-income countries, establishing connections with other international initiatives.

On a logistical level, one of IGAD’s chief roles has been to serve as a platform that leads to the creation of domain-specific Working Groups. As a CoP, this role is strengthened, providing a neutral space for networking and blending ideas related to data management and interoperability. The IGAD CoP can use community building and capacity support as a means of ensuring working groups’ success.

Recently approved by the RDA Technical Advisory Board, the CoP will maintain the IGAD acronym, which now stands for ‘Improving Global Agricultural Data’. Each year, one specific objective or priority theme will be added as ‘sub headers’, for example: IGADs (Semantics), IGADm (Management), IGADw (Workforce), IGADs (Sovereignty), IGADc (Capacity Building), and IGADi (Infrastructure), and so forth. From a community perspective, agricultural data practitioners and the organizations they work for will benefit from participating in the IGAD CoP due to a better alignment with global practice, identifying opportunities to form partnerships on specific projects, better ability to impact stakeholders via improved data systems and practices, and mutual learning from exchanging experiences.

As to the operational mechanisms, the IGAD CoP will be coordinated by at least three professionals from the global agricultural data community, drawn from different geographic regions, whose role is to plan and operate by consensus. A communication plan will be developed to keep the community updated on the several engagement opportunities within the CoP, such as in-person or virtual RDA meetings, monthly webinars or longer events to happen at least annually.

The philosophical approach behind the IGAD CoP is to represent all geographic regions and increase the participation of the global south. Leadership and a process of chair rotation is expected to reflect this. Some of the challenges are related to inclusiveness. For instance, the times at which plenaries and meetings are often scheduled do not favor the engagement of participants from the global south. Recording sessions and varying the times for the virtual encounters has proved to be a reliable method for wider engagement.

Members of the IGAD CoP include practitioners of agricultural data management in academia, government and industry, engaged in large part via the regional or disciplinary organizations that they have formed to support their efforts. They usually have skills in both their domains and in relevant aspects of data management, whether for research or for agricultural activities, but because the community expects to enhance skills and knowledge, no specific requirements are expected. Engaging key members of other relevant networks is expected to act as liaisons with their larger communities, in line with the participative approach of a community of practice.

3.1 Farmer Research Data Framework

The IGAD Community of Practice offers a valuable forum for sharing approaches to difficult issues such as how to protect data generated by farmers while ensuring that valuable research can be conducted to improve agricultural practices for both economic and natural resources stewardship. A recent workshop held in the United States, Big Data Promises and Obstacles: Agricultural Data Ownership and Privacy, was inspired by work in Europe on codes of conduct.

According to Zampati (2021), codes of conduct emerged to fill the legislative void and to set common standards for data sharing contracts. Farm data would be an example of sensitive data, which flows from the farm to many other actors (such as extensionists, agri-tech companies, farmers’ associations, financial service providers, etc.) to be usually aggregated and combined in the form of services and sent back to the farm.

These topics have also been discussed at the RDA 11th Plenary. However, to become truly part of an actionable global framework accessible to everyone, the ideas will need to be brought again to the IGAD CoP. The participants from regional networks can discuss and consider how to reground and modify them to suit cultural and legal practices elsewhere.

3.2 CARE Indigenous Data Governance Principles

The FAIR (Findable Accessible Interoperable and Re-usable) data principles (Wilkinson et al., 2016) are becoming increasingly important in several disciplines, including within agricultural data. Devare et al. (this volume) advocate that FAIR agricultural data assets should be the norm rather than the exception, to foster a transition towards ‘translational agriculture’, a new agricultural system that would make use of powerful technologies to enable more effective data mining and use, making agrifood research and business more agile and responsive to user needs. The FAIR principles were extensively discussed at the IGAD meeting prior to the RDA Plenary 11 and are now being put into practice by many IGAD participants.

A new, complementary set of data governance principles has recently emerged to balance indigenous rights and interests in data with the desire to honor the FAIR principles of supporting open, machine-readable data (Carroll et al., 2020). These CARE principles (Collective Benefit, Authority to Control, Responsibility, and Ethics) hold great promise for indigenous and local communities with agricultural data, ensuring that its re-use benefits those communities, but they have not yet been widely shared among IGAD participants. They take a very user and usage-centered approach, to complement the very data-focused FAIR principles. The new community of practice and its network of regional networks should provide a very effective means to increase awareness and discussion so that adoption of or refinement of these principles can happen more quickly.

3.3 Taxonomic Plant Data Linkage

IGAD would do well to engage a related community of practice, the Biodiversity Informatics StandardsFootnote 2 community, previously known as the Taxonomic Databases Working Group (TDWG). While this group has already co-sponsored activity with the Research Data Alliance there is untapped opportunity to engage with the IGAD Cop. Biodiversity informaticists are acutely aware that linking plant data across datasets requires effective identification of the organism from which the data derives. A series of recent Biodiversity Informatics Standards symposia on agricultural biodiversity have made clear that standards must accommodate a wealth of valuable information about crop wild relatives and land races in agrobiodiverse regions. For example, in India, typical biodiversity data standards must be able to accommodate local names and smallholder cultivation practices in order to support analysis of crop phenotypes, genotypes, and their environmental influences and impacts beyond industrial western farming operations (Arnaud et al., 2016; Rajagopal et al., 2017).

Another relevant TDWG group, the Species Interaction Data GroupFootnote 3 was established for developing a data standard to allow universal exchange of data and information that is relevant not only to biology but also to agriculture and ecosystems services such as pollination. Connecting both the IGAD and TDWG communities can increase awareness of the existence of such standards efforts, and the broad geographic representation in both communities can ensure that diverse use cases and cultural differences are accommodated in these standards.

3.4 IGAD’s Regional Outreach Efforts: The Brazilian Experience

It is noteworthy that the IGAD activities have contributed to the implementation of good data management policies and practices within agricultural research institutions all over the world. Very often, these actions are in support of openness and the adoption of standards to data repositories.

An example are the recent efforts by the Brazilian Agricultural Research Corporation (Embrapa) to incorporate the FAIR principles into its research data management processes and practices. Embrapa is a public agricultural research institution whose mission is to “provide research, development and innovation solutions for the sustainability of agriculture and for the benefit of Brazilian society” (Embrapa, 2021). Structured in 43 research centers geographically distributed throughout the country, the company generates a large volume of research data on the various strategic themes of agricultural research.

Aware of the volume, speed, variety and value of research data produced in the development of its activities, Embrapa has mobilized efforts to properly govern and manage these assets throughout their life cycle, in order to and to make them findable, accessible, interoperable and reusable. Among these efforts is the publication of the company’s ‘Data, Information and Knowledge Governance Policy’, which establishes the principles, guidelines, attributions and responsibilities that will strengthen the mechanisms of generation, organization, treatment, access, preservation, recovery, disclosure, sharing and reuse of Embrapa’s information assets. 

The document is based on the premise that well-organized, documented, accessible and verified data are more easily shared and reusable, with several advantages to the organization. Knowledge exchange within IGAD informed the content of Embrapa’s Data Governance policy, drawing upon other research institutes’ experiences and guidelines, such as INRAe’s Open Access and Open Data Policy (INRAe, 2016). Another important reference to Embrapa’s policy is the FAIR principles (Wilkinson et al., 2016), a central pillar of the corporate ‘Research Data Management Program’.

Adherence to the FAIR principles is crucial when data services are discussed, as interoperability plays a key role. Embrapa has implemented data services through APIs for different purposes that allow users from companies, startups, universities and students, among others, to solve real-world and real-time problems in agriculture. The AgroAPI PlatformFootnote 4 offers Agritec API, for instance, which gathers useful information for crop production management. It includes data and models on: (i) ideal planting time for dozens of crops, based on agricultural zoning of climatic risk; (ii) ratio of the most suitable cultivars for 12 different crops (Rice, Cotton, Peanuts, Barley, Beans, Cowpeas, Sunflower, Castor, Maize, Soy, Sorghum and Wheat); (iii) indication of fertilization and soil correction as a result of previous soil analysis, productivity forecast and climatic conditions before and during the harvest for five crops (Rice, Beans, Maize, Soy and Wheat). These inform decision making on defining planting season with less risk of loss and fittest cultivars, productivity forecasts and water balance and climatic conditions before and during harvest. Another example is SATVeg API, which is derived from the Temporal Vegetation Analysis System (SATVeg), a web tool developed by Embrapa Agricultura Digital, aimed at generating and viewing temporal profiles of the NDVI and EVI vegetative indices for Brazil and all of South America, with the objective of supporting activities of territorial management and agricultural and environmental monitoring. Vegetative indices are generated from multispectral images provided by the MODIS sensor, on board NASA and Terra and Aqua satellites, covering data produced from 2000 until the last date then made available by its official repository, with a 16-day temporal resolution and spatial resolution of 250 m. SatVeg is being expanded to cover Sentinel products that will also be offered as a machine-to-machine data service through APIs.

The experience of Embrapa is serving as a basis for the construction of the GO FAIR Agro Implementation Network, benefiting the whole national agricultural Research, Development & Innovation system. The Brazilian regional GO-FAIR office is structured following the international GO-FAIR initiativeFootnote 5 and currently embraces 7 thematic implementation networks. The regional office produced a letter of principles agreed by the participating organizations which exposes its functioning and rules of engagement.Footnote 6

The agricultural data implementation network is coordinated by Embrapa and is supported by other relevant research institutions in the country. It is in the early stages of a bottom-up community effort and the experience of IGAD activities inspire its construction, considering the different approaches within agricultural data science, community facilitation tools, inclusivity regarding gender and minorities, and regional diversity in a continental country of great importance for food production. A manifesto was constructed by the agricultural data community in Brazil and was launched in November 2021, during the XIII Brazilian Conference on Agroinformatics. Its mission is to work in an articulated and collaborative way to encourage the sharing and reuse of data produced in the context of agricultural production systems and also those arising from research in agricultural sciences in Brazil, supported by the FAIR principles. It includes objectives related to agricultural data science, cultural change towards FAIR good practices, training activities, articulation and collaboration with the other GO FAIR Brazil National Thematic Implementation Networks and with the Food Systems International Implementation Network. The network was launched in April 2022 during a virtual event that brought together 130 professionals from the agricultural sciences, information science and information and technology domains, representing more than 40 public and private institutions such universities, research and development institutes, companies and startups.

Communities of practice in agriculture can encompass a multitude of subjects and one of them is related to preserving cultural and biodiversity heritages. Diverse agrifood products traditionally grown by local populations are also getting more attention worldwide and also in Brazil. Agrobiodiversity data standards are needed to properly represent and make sense of such data and that is being improved by collaborative work from several organizations. Collaboration is also the motivation behind the creation of a national GO-FAIR implementation network focused on agriculture in Brazil. All of this work will benefit if the IGAD CoP can include new voices from the field.

4 Concluding Remarks

Communities of practice in agriculture need to share information about regional developments in the use of data intensive activities such as Internet of Things embedded in agricultural machinery or irrigation devices, and the development of decision-making support tools that rely on climatic and remote sensing data sources. A community of practice can ensure that these developments are informed by local farmers’ traditional knowledge and that they preserve and protect cultural and agrobiodiversity. The FAIR and CARE guiding principles help us to move forward towards linked data and bridging gaps that will allow many diverse communities to connect and share experiences for a more sustainable food production environment.

Addressing these challenges, the Research Data Alliance (RDA) has been a home for the Interest Group on Agricultural Data (IGAD) since 2013. This chapter reflected on the lessons learnt from the IGAD community of practice in its attempts to include new voices from around the world. As in Lave and Wenger (1991, p. 100), the focus of the community of practice is to provide the members with “access to a wide range of ongoing activity, old-timers, and other members of the community; and to information, resources, and opportunities for participation”.

The convening power of the RDA provides many advantages, such as the ability to sustain multiple threads of interdisciplinary work, and worldwide networking. Several important working groups have been supported by IGAD such as an emerging crop data interoperability working group.

IGAD regularly convenes some meetings outside the RDA Plenaries to allow for participation from practitioners with fewer resources. FAIR data (Findable, Accessible, Interoperable, and Reusable) has been a frequent topic of discussion. In recent years, virtual sessions have expanded the conversations even more to enable global participation. For example, in the US, several workshops have addressed the need for progress on issues relating to farmer data ownership and privacy; these are informed by work happening in Europe, but ideas will need to be re-grounded and modified to cultural and legal practices elsewhere. For plant data in particular, ideas about land races and nomenclature from the Biodiversity Information Standards (TDWG) could be combined with the work of the CGIAR institutes to provide more seamless access to indigenous knowledge.

In Brazil, several efforts to support data driven decision-making in the field could serve as models for other IGAD members. For instance, as we have discussed, the Brazilian Agricultural Research Corporation (Embrapa) has implemented data services through APIs that provide real-time data on climate, productivity and most favorable days for planting different crops. Diverse agrifood products traditionally grown by local populations are also getting more emphasis in Brazil and agrobiodiversity data standards are being improved by collaborative work from several organizations.

Collaboration is a keyword behind the creation of a Brazilian GO-FAIR Implementation Network focused on agriculture. Like the Brazilian example, geographic barriers should not prevent the global agricultural research data community from actively participating in the IGAD CoP.