Keywords

Introduction

Every citizen science project is unique in terms of its participants, governance model, scientific methodology, measures of quality control, and campaigns conducted, as well as the data and knowledge it generates. It is necessary to determine the current status and trends of citizen science in order to inform relevant decision-makers and to increase the impact of citizen science projects by coordinating their efforts. It is a significant challenge to collate and analyse the fragmented and diverse citizen science data that is generated (e.g. records and observations). The COST Action Working Group 5, tasked with improving data standardisation and interoperability, sought solutions to these challenging tasks. The resulting modelling effort was closely linked to the larger objectives of the international Data and Metadata Working Group of the US Citizen Science Association (CSA),Footnote 1 which includes members of the European Citizen Science Association (ECSA), and the Australian Citizen Science Association (ACSA). In this chapter, we introduce a model of core citizen science concepts, which is one of the major outcomes from the COST Action working group. This conceptual model is implemented using formal and standardised knowledge representation techniques and allows both human interpretation and computer-based processing.

Such a conceptual model fosters the representation of citizen science globally by:

  • Enabling a common understanding of the terminology, for example, for indexing literature, outreach and education, and delimiting the field within the generic domains of IT, scientific projects, and data standards

  • Forming a basis for facilitating the alignment and integration of data produced in citizen science projects by fostering standardisation and interoperability (being able to share information seamlessly across activities)

  • Facilitating the creation of software, database schemas, and data interchange formats for the development of new citizen science applications

  • Supporting potential project participants and other stakeholders to better understand the tasks involved in a particular citizen science project

This chapter first briefly introduces its approach to a conceptual model for citizen science, the stakeholders concerned, and the methodology used. It also defines the concepts, relations, and constraints (axioms) of a volunteer participation conceptual model. It then explores connections between these and a traditional scientific activity conceptual model that includes the project, funding, outcomes, datasets, and domain. Next, this chapter provides a detailed description of the conceptual model providing the basic concepts about participants and their activities. The conceptual model links to existing standards by adopting and unifying suitable top-level concepts that appear in those data models. The chapter finally demonstrates the applicability of the conceptual model based on case studies before turning to a roadmap for future use and research.

Towards a Conceptual Model for Citizen Science

A conceptual model for citizen science needs to cover three main aspects (and their corresponding metadata):

  • Information about citizen science projects

  • The people involved

  • Project outcomes, typically data and publications

When we refer to citizen science as a domain, we follow the definition outlined by Haklay et al. (this volume, Chap. 2).

Project metadata includes general information such as project name, aim, runtime, the topic or field of science addressed, a contact person or contact point, the organisations involved, and funding sources. In addition, metadata includes information which is specific to citizen science, for example, about the participants (their motivations, skills, knowledge level, and training undertaken). This also includes information that might be important to interested citizens, for example, how to participate and the type and difficulty level of volunteer tasks required.

In addition to project-related metadata, a conceptual model for citizen science needs to provide descriptive elements for project outcomes, which are typically data and publications. Data records are usually bundled into datasets following a certain data schema. Typical information about datasets includes name, license, access rights, geographic coverage, access information, submission date, creator, data quality requirements (see Balázs et al., this volume, Chap. 8), information on how data was collected, by whom and with which skills and expertise, and how quality was assessed and verified. Citizen science projects differ from other types of projects in that they employ novel ways of collecting data (e.g. a mobile app specifically designed for a project) and employ data collection protocols that are not common in traditional scientific research projects.

The major difference between a traditional scientific research project and a citizen science project is the participation of non-professionals in scientific activities. Therefore, our formal description of citizen science projects (project metadata) focuses on the representation of the people involved, their motivations and skills, the tasks they perform, how they were recruited, how their privacy is protected, how they collect data, and how the quality of their contributions is assessed.

Stakeholders

The spectrum of stakeholders (as identified by Göbel et al. 2017) who require reliable information about citizen science projects includes:

  1. 1.

    Participants

  2. 2.

    Academic and research organisations

  3. 3.

    Government agencies and departments

  4. 4.

    Civil society organisations, informal groups, and community members

  5. 5.

    Formal learning institutions

  6. 6.

    Businesses or industry

The requirements of the stakeholders listed above vary; for example, a certain level of interoperability is essential for government agencies as well as academic and research organisations. However, in the case of community-driven citizen science projects, the stakeholders are participants or informal groups who do not prioritise interoperability but need data to be provided in a user-friendly format.

Methodology

In this chapter, we define a conceptual model as a representation of a knowledge domain or system, with which people can understand the meaning of its underlying concepts and which can be used by computer software to meaningfully process its related data. There are a variety of conceptual models, ranging from simple mind maps and concept maps (Novak and Cañas 2008) to complex ontologies (Simperl and Luczak-Rösch 2014). Commonly, concepts are described in terms of their definitions and the (labelled) relationships between them. In formal models, concepts are often called classes (e.g. ‘project’), and classes have specific examples, called instances (e.g. ‘OpenStreetMap’). All those elements can be represented visually (for human understanding) and in formal computer language (for data integration). In this chapter, we apply commonly used techniques from ontology engineering and concept map construction.

The core conceptual model elements and associated metadata presented here draw on previous research and existing vocabularies. In particular, they utilise the Public Participation in Scientific Research (PPSR) Common Conceptual Model (described in Bowser et al. 2017) and the core requirements in the associated conceptual model PPSR-Core.

The conceptual model developed in this chapter is intended to fulfil the needs of different stakeholders, as shown in several case studies. To address this requirement, we refined core elements of the PPSR model based on existing case studies; these informed the identification of additional core concepts.

The conceptual model presented is not the only model that suits the field of citizen science, but it provides a view of the technical aspects of the discipline in order to help stakeholders understand the domain and foster interoperability across applications. It is an evolving model that is becoming established via an international consensus process.

Related Conceptual Models

Conceptual Models of Projects and Participants

A number of models that allow projects to be described in general and scientific projects to be described specifically have been previously developed outside the citizen science community. Those models aim to represent knowledge about a subject domain such as relevant concepts and relationships between those in a very formal way (e.g. in terms of an ontology) or less formally by means of a controlled vocabulary. The following table gives an overview of these models and summarises which facets of projects and their participants they cover. The models listed were carefully considered when designing our conceptual model for the citizen science domain.

We will now summarise the models listed in Table 9.1. FRAPO describes projects and their outputs in terms of publications and datasets. SCoRO models the roles of project participants and their contributions. It allows the linking of individuals’ contributions to project outputs. PROV-O can be used to model projects, their outcomes, and how the outputs are produced and by whom. The Project Description Ontology extends PROV-O and is an attempt to model projects in a domain-agnostic way. FOAF can be used to characterise participants of a citizen science project. The FaBiO model is discussed in the next section.

Table 9.1 State of the art of conceptual models of projects and participants

Conceptual Models of Project Outcomes

FaBiO models published or publishable project outcomes such as scientific publications. The Project Documents Ontology (PDO) describes other project-related documents such as minutes and status reports.

A number of models provide descriptive elements for datasets. This includes the World Wide Web Consortium (W3C) Recommendation Data Catalog Vocabulary – Version 2 (DCAT)Footnote 2 that enables the description of datasets and data services in catalogues. More general specifications, such as Dublin Core,Footnote 3 define elements for the description of arbitrary resources, not just publications.

Several conceptual models have been developed for the formal description of observational data and measurements as common outcomes of scientific projects, for example, in the life sciences and geosciences, but also in citizen science. A number of standards with overlapping semantics have emerged: the Semantic Sensor Network (SSN) Ontology,Footnote 4 a joint standard of Open Geospatial Consortium (OGC) and W3C, that specifies the semantics of sensors and their observations, and its proposed extensions;Footnote 5 the OGC/ISO Observation and Measurement (O&M) conceptual model;Footnote 6 and the W3C Data Cube Vocabulary,Footnote 7 focusing specifically on the representation of multi-dimensional data. The data model of OGC’s SensorThings APIFootnote 8 is based on the OGC/ISO O&M model and closely resembles it. Although several ongoing community-driven attempts aim to harmonise the description of observational data in order to facilitate data integration, none of the existing data models have been adopted by a scientific community as a whole. However, attempts have been made to link coexisting models by establishing mappings to align different models, for example, the SSN Ontology offers alignments to the OGC/ISO O&M model. An OGC discussion paper (Simonis and Atkinson 2016) gives a helpful overview of standardised information models with relevance to citizen science data and describes a data model for the exchange of citizen science sampling data based on existing standards.

In parallel, practitioners such as data managers of research data infrastructures have developed their own vocabularies and models that do not rely on existing standards. In the biomedical domain, several domain-specific data models have been developed. Those include the Extensible Observation Ontology (OBOE)Footnote 9 (Madin et al. 2007) and the Biological Collections Ontology.Footnote 10 There are hundreds of domain-specific metadata standards and data models facilitating the description of scientific data in specific scientific domains, for example, BioPortalFootnote 11 currently lists 838 ontologies in the biomedical domain. Finally, the catalogue of the Digital Curation CentreFootnote 12 lists numerous disciplinary metadata standards.

The Proposed Conceptual Model for Citizen Science

As a starting point, we considered the top-level model of the CSA report (Bowser et al. 2017) (see Fig. 9.1), which proposed a grouping of the existing attributes into a set of modules. The titles of the modules were adapted by Working Group 5 (see COST Action CA15212 Working Group 5 2018a). The Project Metadata Model describes the key components of a citizen science project. The Dataset Metadata Model characterises a dataset as an output of a project and describes its geographic coverage, data collection method, and access rights. The Observation Data Model contains a detailed description of the data elements that are used in a dataset, for example, the meaning of specific sensor observations (such as nitrogen/nitrate concentration in a water quality measurement).

Fig. 9.1
A diagram depicts the P P S R-Core models. 1. Project metadata model. 2. Dataset metadata model. 3. Observation data model.

The PPSR-Core conceptual model adapted from the Public Participation in Scientific Research (PPSR) Common Conceptual Model (Bowser et al. 2017). The 0:n (and the dashed arrow) means that a Project Metadata model may have zero or more Dataset Metadata Models. The 1:n (and the solid arrow) means that a Dataset Metadata Model will have one or more Observation Data Models

Project Description

The development of the PPSR-Core model was driven by the requirements of the implementations available at the time. As a consequence, it is tied to these implementations, and a conceptual model allowing for better project content representation is still not available. In addition, PPSR-Core still includes some domain-specific properties, especially from the biodiversity domain. Since citizen science activities take place in different disciplines and focus on specific aspects that vary across activities, a model that tries to capture everything in the domain can become complicated and difficult to manage.

In order to exploit the citizen science knowledge encoded in PPSR-Core and, at the same time, overcome the above-mentioned drawbacks, we have developed a modular conceptual model for the representation of citizen science knowledge (see COST Action CA15212 Working Group 5 2018b). This model comprises different modules that are all linked to the Project.Core module that captures essential project information (see Fig. 9.2 for an overview of the structure of our conceptual model). The Project.Core module includes many properties imported from PPSR-Core, like project name, website, start and end date, etc., and unifies the other modules. These modules include:

  • The Project.MetadataRecord module, which captures general information about the project, including its provenance

  • The Project.Annotation module, which captures information, like tags, used for annotating project descriptions

  • The Project.Funding module, which captures project funding information

  • The Project.Infrastructure module, which captures information about project infrastructure (hardware, software, services, etc.)

  • The Project.Geography module, which captures geographical information about the project

  • The Dataset module, which captures information about project datasets

  • The Project.Participant module, which captures information about project participants and their activities within a project

Fig. 9.2
A 7-phased diagram depicts the project. core. 1. Project. Metadata record. 2. Project. geography. 3. Project. Participants. 4. Dataset. 5. Project. Infrastructure. 6. Project. Funding. 7. Project. Annotation.

An overview of the structure of the main conceptual model, highlighting the different modules. The arrows indicate the dependency between the modules. The connection between the Dataset and Project.Participant modules indicates that there are relationships between concepts across these modules

Due to the wide scope of the main conceptual model for citizen science, it was developed in phases. In this chapter, our attention is focused on participation and participant activities in citizen science projects. Related initiatives from CSA, ECSA, ACSA, and OGC are accounted for and the model is tested with case studies.

Since the role of the citizens as participants is the main difference between citizen science projects and traditional research projects, in the following section, we will discuss the Project.Participant module in more detail.

Together, Table 9.1 and Figs. 9.2 and 9.3 outline all the concepts and relationships in the conceptual model related to the Project.Core and Project.Participant modules. Here, we describe a selection of the concepts; a full list of descriptions is currently under development and available in the model repository.Footnote 13

Fig. 9.3
A flow chart. It has Project, data collection, data analysis, activity, participation task, training, tool, skill, knowledge item, and execution plan.

Excerpt (part a) of the conceptual model on citizen participation. The different boxes represent concepts; the arrows represent relationships

Participation and Activity Description

At the heart of the Project.Participant module lie the relationships between the participants, their activities, their outputs, and the skills, knowledge, and tools required to perform them. A project has one or more activities, and these are performed by participants with a variety of roles and motivations, during a specified time range.

In the model, the Activity concept (see Figs. 9.3 and 9.4) represents activities that belong to a Project. A general activity, such as ‘Collecting data about bird migration’, may contain a number of tasks. A task is an activity with a specific goal and a limited duration (a kind of transaction), such as ‘Taking a picture of a bird and storing it in an image collection’ or ‘Validating a bird identification’. The description of a task includes details of the knowledge, skills, and tools required as well as the training available and its execution plan.

Fig. 9.4
A flow chart of citizen participation. Output, observation, activity, agent, data collection method, organization, person, and sensor are depicted.

Excerpt (part b) of the conceptual model part on citizen participation

The Agent concept in Fig. 9.3 generalises the idea of participants to groups of people in particular organisations and to machines, such as sensors. An instance of the Agent concept represents a type of agent, for example, ‘registered Zooniverse user’ or ‘mapping agency’.

The Activity description includes its output (e.g. dataset, publication, software) that can be composed of a number of output items. A project may acknowledge the participation of an actor in the production of an output item. In this case, the description of an output item includes a link to the role played by the actor its production. The description of a Project also includes its participant recruitment technique and its privacy protection policy. The dataset as an entity is handled in our model as a specific type of output. Its details are described in a separate module (see Fig. 9.1), and although they are required for interoperability, they are beyond the scope of this chapter. The same holds true for the semantics of a dataset’s content, which is described in the Data Model (Fig. 9.1). Here we make use of existing standards, such as the underlying data models of the SensorThings API (Footnote 8) and the SSN Ontology (Footnote 4).

The concepts depicted in Fig. 9.4 cover participation and its requirements. The model does not claim to be exhaustive, but rather serves as a backbone. Each of the branches, such as tools and skills, can themselves be described by external models. The subclassification is also not exhaustive. Part a (Fig. 9.3) and part b (Fig. 9.4) are connected through the Activity concept.

Application in Case Studies

This section explains how the conceptual model for citizen science can be used in specific case studies, that is, how the different characteristics of a project – its participants, its data, etc. – can be described by using the model. The case studies represent projects with different domains, community sizes, and types of participation in order to demonstrate the breadth of citizen science applications that the model can accommodate. The first sub-section highlights four different projects. Here we demonstrate how they can be described with the help of our model in order to understand project content and metadata. The second sub-section illustrates another use for our model: the application of its concepts and structure for (1) creating project descriptions in a specific inventory and (2) structuring data collection.

Instantiation of Projects

After providing a short introduction to the four selected citizen science projects, we use our conceptual model as a skeleton for each specific project. Where applicable, the concepts (as depicted in Figs. 9.3 and 9.4) have been instantiated for each project; see Tables 9.2 and 9.3. In other words, a concept is assigned a project-specific value where possible and applicable. This means that specific projects, their activities, participants, data outputs, etc., are described with the help of the conceptual model. Using this common model allows the projects to be compared and combined, thus increasing interoperability between the projects and their elements. It should be emphasised that in the tables only a few examples are provided and that each entry in the table corresponds to a concept in the model, which is more than just a flat table. For example, a project can have multiple participation tasks, each using different tools; and a project can produce multiple, different datasets, and so on. We will now introduce our case studies.

Table 9.2 Instantiation of the conceptual model with OSM, Bash the Bug, Mars in Motion, and MICS
Table 9.3 Instantiation of the conceptual model with the JRC Citizen Science Project Inventory and the Participatory Toponym Handling Project
Table 9.2 Instantiation of the conceptual model with OSM, Bash the Bug, Mars in Motion, and MICS
  • OpenStreetMap. OpenStreetMap (OSM) is a well-known crowdsourcing project in which thousands of volunteers maintain an online map of the world. OSM has all the characteristics of participation and data handling we see in many other citizen science projects. In addition, OSM is an essential geographical reference for many citizen science projects.

  • Bash the Bug (Zooniverse). The objective of the Bash the Bug project is to improve tuberculosis diagnosis. The task of the volunteers is to accurately determine which antibiotics are effective for each of the collected tuberculosis samples. This is carried out by analysing pictures of plates showing the effects of several antibiotics on the tested sample.

  • Mars in Motion (Zooniverse). Mars in Motion was created to look for and identify geological changes on the surface of Mars over time by gathering in-depth data on the type of features that are detected. It is part of the i-Mars.eu project, which includes several European partners, and is focused on developing tools and datasets to increase the exploitation of space-based data from the US National Aeronautics and Space Administration (NASA) and the European Space Agency (ESA) Mars mission beyond the scientific community.

  • MICS. The MICS project provides an integrated platform of metrics and instruments to measure both the costs and the benefits of citizen science. These metrics and instruments consider the impacts of citizen science on the following domains: society, governance, the economy, the environment, and science.

Deployment of the Conceptual Model

In addition to the basic metadata provision outlined in the previous section, the conceptual model can be used as a structure for project-related activities. Two case studies are provided here.

JRC Citizen Science Project Inventory

The European Commission Joint Research Centre (JRC) has developed a multidisciplinary data infrastructure (Friis-Christensen et al. 2017) to facilitate open access to its research data, in line with the recent open data trend (Trojan et al. 2019). The JRC Data InfrastructureFootnote 14 has helped establish requirements for dataset metadata. The JRC datasets are published in the JRC Data Catalogue and are described by metadata that follow a modular metadata schema. The schema consists of (1) a core profile which defines the common elements of metadata records, based on the reference standards DCAT-AP (ISA DCAT-AP 2015) and DataCite (2016), and (2) a set of extensions, which defines elements specific to given domains (geospatial, statistical, etc.), based on existing metadata standards.

In addition, the JRC Citizen Science Project Inventory has supported the JRC in describing projects. The JRC Citizen Science Project Inventory was initially developed as one of the outcomes of the study Citizen Science for Environmental Policy: Development of an EU-wide Inventory and Analysis of Selected Practices (Bio Innovation Service 2018; Turbé et al. 2019). This project was executed by the European Commission (DG Environment), with the support of the JRC. The project also included additional contracted partners: the Bio Innovation Service (France), the Fundacion Ibercivis (Spain), and the Natural History Museum (UK). The main objective was to build an evidence base of citizen science activities to support environmental policies in the European Union (EU). Specifically, the goal was to develop an inventory of citizen science projects relevant to environmental policy and assess how these projects contribute to the United Nations Sustainable Development Goals (SDGs). To this end, a desk study and an EU-wide survey were used to identify 503 citizen science projects of relevance to environmental policy. The resulting project inventory has been published in the JRC Data CatalogueFootnote 15 and is updated on a regular basis (it also considers new entries suggested via an online survey).Footnote 16

The Citizen Science Explorer,Footnote 17 a dynamic catalogue provided as part of the JRC GitHub space, has been developed to provide more visibility to the JRC Citizen Science Project Inventory and to showcase the opportunities for knowledge sharing and management. The inventory is available in the form of comma-separated values (CSVs),Footnote 18 JSON,Footnote 19 and JSON-LD.Footnote 20 Therefore, the conceptual model described in this chapter does not allow us to represent all the information available in the inventory but does allow us to structure its core entities in a standardised way.

There are other initiatives which can be considered as case studies for identifying stakeholders needs. These include activities covered by Earthwatch (e.g. the MICS project, in which the impact of citizen science projects is measured) and COST Actions throughout Europe.

Participatory Toponym Handling Project

One application case where the citizen science conceptual model had a direct influence, and which in turn can be used to shape future developments of the conceptual model, concerns the collection and maintenance of place names (or toponyms) in Indonesia.

This particular case study was motivated by the fact that many national mapping agencies (and agencies responsible for the naming of places in databases and gazetteers) have scarce or insufficient resources. At the same time, many citizens have rich local and traditional knowledge of toponyms. Indonesia, in particular, has many regional and local languages and a varied topography. Including local and traditional knowledge is also relevant from a research point of view, because it can, for example, uncover yet unwritten histories.

The Geospatial Information Agency of Indonesia (Badan Informasi Geospasial, BIGFootnote 21) is responsible for toponyms in Indonesia. BIG conducted two pilot projects in 2015 (Yogyakarta) and 2016 (Lombok) on the involvement of citizens in toponym handling. The Indonesian approach includes many stakeholders, combining both top-down and bottom-up elements: national legislation provides regulations and procedures, while their implementation relies on local actors. However, local governments tasked with the implementation often lack the capacity to provide the required skills and resources.

The pilot projects led to the development of a participatory toponym handling framework (Perdana and Ostermann 2018). More importantly for this chapter, the framework adopted several concepts from an early version of Working Group 5’s citizen science conceptual model. Thus, although the framework has been subsequently improved and significantly expanded through collaborative learning, including focus group discussions with stakeholders and workshops (Perdana and Ostermann 2019), this example shows the utility of an early version of the conceptual model for designing a project involving citizens.

The concrete participatory toponym handling approach that was developed is also expected to influence ongoing legislation processes. Furthermore, it resulted in three experimental toponym collection projects in late 2018 (their outcomes will soon be published).

Using this chapter’s conceptual model, we can describe the participatory toponym handling. The main Activity is the collection of place names, either entirely new ones or updating existing ones. The Agents carrying out this activity are citizens, local government officials, experts from the national mapping agency, and academics/researchers. The DataCollectionMethod is field surveys using tablets, supplemented by office-based processing. The created Datasets are initially forms completed by participants (Observations) with multimedia elements (e.g. audio recordings of pronunciation) and ultimately enriched gazetteers. Therefore, the ParticipationTask is to provide place names and related information. The Motivation is to contribute toponymic data, preserve embedded knowledge on toponyms, and collect toponyms in their surrounding areas.

Roadmap for Future Research and Use

The benefits of using the conceptual model presented in this chapter are twofold: human understanding of citizen science project characteristics and machine processing of these characteristics. Further technical development and documentation of best practices will be required to support the model in use. Humans wishing to discover, evaluate, and contribute to projects will require intuitive visualisation of the conceptual model and well-designed tools for search and query. Machines that use the model for data alignment will require well-designed APIs, and repositories of standards, schemas, and agreed terms, with reliable access mechanisms.

An example of the context in which this conceptual model could be used is the EU Horizon 2020 Framework Programme project EU-Citizen.Science, which aims ‘to build a central platform for citizen science in Europe, a place to share useful resources about citizen science, including tools and guidelines, best practices and training modules’.Footnote 22 By utilising a metadata schema such as this conceptual model, a greater understanding of data types, their structure, and their relationships can be achieved. Adopting the conceptual model will also ensure that the tools, guidelines, and training developed are as widely applicable and usable as possible.

The following recommendations are designed to foster the uptake of the conceptual model by the citizen science community in order to increase citizen science interoperability:

  • Develop procedures to respond to existing regulatory or legal frameworks related to citizen science, such as the implementation of the INSPIRE Directive (in Europe) and the provision of related best practices and tools.Footnote 23

  • Involve the ECSA, CSA, ACSA, and the Citizen Science Global Partnership (CSGP) in the definition of an agenda for the model’s practical implementation and possibly as hosts for interoperable catalogues of citizen science projects and data. They could also provide guidelines on the use of existing solutions.

  • Include a dedicated section on ECSA, CSA, ACSA, EU-Citizen.Science, and CSGP websites to explain the conceptual model and provide introductory information.

  • Develop extensions related to more diverse outcomes, such as mathematical theorems, hardware, and policy and societal impacts.

  • Develop communication approaches to help practitioners navigate through the various standards and concepts (e.g. a ‘choose your own adventure’ approach; see also the Digital Curation CentreFootnote 24 for additional ideas).

Implementing the proposed recommendations will take some time and also require collaboration across communities. The publication of the conceptual model outlined in this chapter should support this process. In addition, some of the work needed to fulfil the recommendations is already in progress and will ultimately be disseminated through citizen science community channels.