1 Introduction

Systematic literature reviews and meta-analyses in particular are a scientific method used in a number disciplines ranging from (bio-)medical to social sciences to summarise vast amounts of research outputs on a specific topic [27]. In a nutshell, a meta-analysis exploits statistical models to quantify, aggregate and compare evidence from a set of (dozens, sometimes hundreds) experimental studies addressing the same research question, in order to derive generalisable conclusions [11]. In this way, meta-analyses offer a snapshot of a research topic, supporting research transparency, reproducibility and re-usability – a more and more urgent topic in various research disciplines [19].

Performing meta-analyses is a knowledge-intensive process that can take months, sometimes years, due to methodological and technical barriers. And with the volume of research outputs growing exponentially every year, this problem is becoming more and more difficult [3]. A new meta-analysis can require authors to spend a significant amount of time and effort to find studies that meet their criteria, identify the evidence in them, annotate their contents and statistically aggregate their results, before reaching any significant conclusion. Moreover, meta-analyses can be large in scope, and planning for time and human resource allocation can be a hard task. This calls for new methods to help researchers in summarising scientific evidence in a more automated way, and more in general to facilitate publication and sharing of large bodies of scientific findings.

Our motivation stems from the COoperation DAtabankFootnote 1 (CODA or DataBank henceforth), a large data repository aiming at analysing the entire history of laboratory and field research on human cooperation using social dilemmas. The goal of the DataBank is to encourage and facilitate sharing experiments as well as null findings (that tend to be hardly ever published), and consequently reduce the publication bias that currently affects the area [15]. Over the last 5 years, a small pool of domain experts manually annotated approx. 3,000 studies collecting 60 years of research publications with experimental settings, measured/manipulated variables of observation, and quantitative results, with the goal of establishing an open access database that researchers worldwide could consult to identify studies to include in their systematic literature reviews, as well as to directly conduct their own statistical (meta-)analyses.

In this work, we show how semantic technologies, which provide support for scaling, reuse, and interoperability, can be exploited to tackle the scalability, methodological and technical issues of conducting meta-analyses. Using a social science scenario, we show how the content of research outputs can be represented using semantic descriptions, and how to leverage this structured, domain-specific knowledge to facilitate search, analysis and synthesis of research outputs. Our main contributions are (1) the first structured representation of the field of human cooperation, that researchers from the field can easily reuse and extend; and (2) a Science of Science application to help experts in performing meta-analyses semi-automatically, supporting the correct evaluation and interpretation of research conclusions. We discuss on the multiple benefits of our approach using few use-cases that demonstrate how the various phases of the meta-analytic process can be facilitated and, more in general, how this can significantly contribute to research replication and automated hypotheses generation.

2 Background and Related Work

We introduce here the basic notions of scientific meta-analyses, the best practices and current applications, and overview the semantic approaches supporting scientific research.

Principles of Meta-analysis. A meta-analysis is a process used to synthesise knowledge from studies addressing the same research question by using comparable methods. Each study may observe a relation between two (one independent, one dependent) variables, which can be quantified as an effect size. Meta-analytic techniques are then used to estimate an overall effect size average through aggregating the effect sizes observed in single studies [22]. Effect sizes can represent the differences observed between the experimental variations (treatments) of the independent variable and the dependent variable (such as a standardised difference between means d), or could also be the relation between two measured variables (such as a correlation coefficient \(\rho \)). In order to derive the overall estimate, a researcher first frames a problem statement, defining the research question, inclusion criteria, independent and dependent variables of observation etc., and then collects relevant studies (both published and non-published material) across scientific sources. Conducting a meta-analysis then consists in: (1) Coding, i.e. annotating the studies with the relevant characteristics, including independent and dependent variables and effect sizes; (2) Analysis, i.e. estimating the overall effects using fixed and random effects models, determining heterogeneity in the studies, assessing publication bias, conducting moderator analyses through meta regression, performing statistical power analysis; (3) Interpretation, i.e. the presentation of the obtained results along with conclusions and graphical support, often including graphs such as forests, funnel, violin/scatter-box plots. These steps make a meta-analysis significantly different from a literature review conducted, for instance, in computer science, as the researcher numerically “pools” results from the studies (i.e. the effects sizes) and arrives at a statistical summary that can be integrated into the narrative of a publication.

Methods and Applications. While meta-analyses are now established in the field, they are still seen as a controversial tool, as even small methodological violations can lead to misleading conclusions [20]. Researchers have argued that significant conclusions can only be derived from meta-analyses with large number of studies, while smaller meta-analyses can only support framing new research hypotheses [14]. In response to this, a number of methodologies across disciplines have been published to assist experts in deriving reliable conclusions, e.g. the Cochrane Handbook by the Campbell organisation [23], the York Centre for Reviews and Dissemination guidelines for health care [29], the Evidence for Policy and Practice Information and Co-ordinating CentreFootnote 2. A considerable amount of statistical expertise is also needed to avoid deriving incorrect conclusions. A number of statistical tools are now available to overcome the technical barriers, resulting in tools and libraries such as RevMan [8], Comprehensive Meta-Analysis [4], Stata, and R packages such as meta, rmeta and metafor. Finally, with the volume of research outputs growing exponentially, identifying relevant studies and annotate the evidence can require significant efforts. As a result, a number of research projects emerged in the latest years with the goal of making large bodies of research findings openly available, offering summaries of scientific evidence and, more in general, automating meta-analyses [1, 5].

Supporting Science with Semantic Technologies. Semantic web technologies have been used to provide an interoperable and machine-interpretable infrastructure for scientific enquiry in the context of Semantic e-Science [7].

Ontology engineering techniques have widely been employed to formally organise domain-specific knowledge and provide interactive, semantically-enhanced data querying, exploration, and visualisation in a number of cross-disciplinary settings [12, 24, 25]. A large number of vocabularies to semantically describe research outputs have been proposed, e.g. document publication [9, 16]; provenance, versioning, attribution and credits through research objectsFootnote 3 or nanopublications [21]; description of research hypotheses [18] and scientific datasets [6]. The benefit of using controlled vocabularies to describe research outputs guarantees them to be findable, exchangeable and interpretable across different applications (interoperability).

Another strand of research has focused on capturing knowledge from scientific processes, in order to support design and management of workflows [26], identify common patterns and motifs in them [13, 17], or recommend activities to tackle the cold start problem of experimental design [10]. These works demonstrated that research reproducibility and support to frame research hypotheses can be supported by semantically describing and mining workflow components.

A semantic approach could be used support conducting meta-analyses. Domain vocabularies and descriptions could express the scientific knowledge contained in research outputs to facilitate search, analysis and synthesis. At the same time, replication of published results and support to derive the correct conclusions could be offered by the relying on semantic technologies, that enable scalability, interoperability and reuse. In the following, we show how our hypothesis was tested to support research replication and automated hypotheses generation in a social science scenario.

3 Motivating Scenario and Contribution

The COoperation DAtabank (2015–2020) is a large-scale effort involving a trained team of international researchers working with the goal of representing and publishing an open-access repository of over 60 years of research on human cooperation using social dilemmas. Social dilemmas are social situations that involve a conflict of interests and people must choose between doing what is best for themselves or what is best for the collective, either a dyad or group [30]. In these situations, there is always one choice that results in the best outcome for each individual, regardless of what others choose to do. However, if everyone decides to behave this way, then the entire group receives a worse outcome, relative to when everyone decides to do what is best for the group. Common social dilemma paradigms used to study cooperation include the prisoner’s dilemma, public goods dilemma, and the resource dilemma. Cooperation in these situations is operationalised as deciding to do what it best for the collective.

In the DataBank, around 3,000 studies from the social and behavioural sciences published in English, Chinese, and Japanese were annotated with more than 60 cooperation-related features. These features can be grouped in three categories, i.e. (a) characteristics of the sample participating in the study (e.g. sample size, average age of sample, percentage of males, country of participants), (b) characteristics of the experimental paradigm (structure of the social dilemma, incentives, repeated trial data, etc.), and (c) quantitative results (e.g., mean levels of cooperation, variance in cooperation, and effect sizes with cooperation). In this scenario, the CODA experts are required to annotate and include new data (i.e. findings gathered from researchers worldwide) to the dataset in a continuous way. This can be inconvenient, costly and time-consuming, especially as data are not always directly accessible [2]. In the long-term, we aim at supporting CODA’s maintainers in capturing and sharing knowledge more efficiently. Starting with the assumption that scholars that consult the repository online act as domain experts, the solution we target is to crowdsource the meta-analyses that users conduct online to automatically enrich, fix and update the dataset. The procedural nature of the meta-analyses allows in fact to model them as scientific workflows of sequential activities, that we wish to capture and use to update the dataset, in a way that data maintainers do not have to input new data themselves. Besides relieving the workload of the dataset maintainers, collecting workflows could benefit data consumers, as the expertise of previous scholars could support new users when performing their own analyses.

In order to achieve this, our first goal is to make the DataBank available to the field to allow exploring data and conducting meta-analyses with it. Following similar approaches, we use an ontology engineering approach to represent the DataBank as a structured dataset, describing both the bibliographic information and the domain-specific knowledge from the collected studies, and then to build semantically-enhanced application to perform meta-analyses. Our work has two contributions, namely (i) we provide a detailed semantic description of the domain of human cooperation, so far never published in a structured way, that researchers from the field can easily reuse and extend; and (ii) we build a tool to conduct meta-analyses semi-automatically, reducing the time researchers need to test their new hypotheses. More in general, our work shows how semantic technologies can be used to tackle the limitations of meta-analyses at different levels (search, analysis and synthesis), fostering research reproducibility while facilitating the framing and testing of research hypotheses.

4 Performing Meta-analyses over Knowledge Graphs

In order to allow conducting meta-analyses over a knowledge graph, we follow two simple steps: first, we deal with the generation of the DataBank, by describing the research studies and their content and generating a knowledge graph from this; second, we focus on building the application conduct meta-analyses on it. In the following, we will imagine the example of a meta-analysis performed to study the impact (effect) of using communication (independent variable) over cooperation (dependent variable) in a control group.

4.1 DataBank Generation

The first step is gathering the raw data annotated by the CODA team and build a structured knowledge graph from it. The dataset consists a series of CSV tables, roughly divided by topic, where published studies are annotated according to the features described in Sect. 3, including both generic information (study characteristics such as country or year of publication) and specific characteristics (information relevant to cooperation games, e.g. types of priming or incentives given to the study participants). We therefore divide this task in three steps, i.e. establishing a general schema for papers, DOIs, authors, experiments (domain-independent knowledge), providing a more fine-grained model to describe the cooperation situations (domain-specific knowledge), and populating the knowledge graph accordingly.

Fig. 1.
figure 1

Domain-independent schema for data annotation.

Modelling Domain-Independent Knowledge. Figure 1 presents the domain-independent schema we usedFootnote 4, where a publication consists of a cdo:Paper that includes an arbitrary set of cdo:Study, i.e. experiments performed in different settings and with different goals. In the example of Listing 1.1, for instance, resource :ENG00073 represents a paper written by H. Varian in 1999 reporting his experimental study :ENG00073_1 where 48 students from the US played a prisoner’s dilemma game. Additional metadata about the paper such as publication date, authors etc. are collected directly by dereferencing the paper’s DOIs and by including the collected triples as properties of a cdo:DOI class. We then define these properties as cdo:scholarly_prop. In our example, the DOI allows to gather the paper’s scholar information, such as his author H. Varian and its year of publication (1999).

Each cdo:Study has also specific properties, which we divided in cdo:sample_prop and cdo:quantitative_prop depending if they represent information about the study sample settings or the experimental quantitative/statistical information. For example, cdo:country, cdo:sample_size, cdo:country, cdo:studentsOnly and cdo:game are subproperties of cdo:sample_props, while cdo:overall_coop (that measures the overall participants’ cooperation rate across different tests in an experiment) is a quantitative property defined as subproperty of cdo:quant_prop.

As said, a cdo:Study reports one or more tests, modelled as cdo:Observation. The significance of the tests is estimated in terms of statistical cdo:quantitative_prop of an observation, e.g. effect size values, standard errors, variance, etc. In our example, study :ENG00073_1 reports one observation called :ENG00073_1., reporting a measured effect size value of \(\sim \)0.60 and a standard error of \(\sim \)0.24.

figure a

Note that the current work only focuses on using a semantic-based approach as a mean to simplify meta-analyses. In other words, vocabulary and data model are not finalised, and while alignments at schema and instance level are already under way, they remain out of this paper’s scope.

Modelling Domain-Specific Knowledge. In order to allow experts to understand the factors affecting cooperation, the next step is describe the content of each study in a fine-grained way. We model observations as comparisons of one or two different cdo:Treatment, consisting in the experimental settings that an experimenter modifies with the goal of assessing how and if the cooperation between participants of a game varies significantly. For example, observation :ENG00073_1. compares a treatment in which participants were not allowed to communicate (line 24–25) with a second treatment, in which participants were playing with real partners (line 28) and could only exchange promises about future game behaviours (line 29). These experimental settings, modified across different treatments of the same independent variable (IV), are fundamental to perform meta-analyses. An experimenter could be interested in observing the effects of allowing or denying communication within participants of a game, or on the impact of specific characteristics (called moderators) such as age, gender or personality of the participants, type of communication exchanged.

We therefore take all RDF properties whose domain is the class cdo:Treatment, and organise them in a domain-specific taxonomy of information relative to cooperation in social dilemmas. The resulting taxonomy, shown in Fig. 2, was built in a bottom-up fashion, i.e. (i) an initial list of key variables and definitions was drafted by the CODA’s team given their extensive expertise in the domain; (ii) the list was used to perform an initial annotation of \(\sim \)1k studies across universities, to report potential problems and additions, and (iii) further revised by a scientific advisory board of 12 senior domain experts; (iv) existing papers were revised and new ones were annotated accordingly.

Fig. 2.
figure 2

Property taxonomy for annotation of domain-specific knowledge (simplified).

All properties are by definition subproperties of a generic rdf:Property called cdo:iv_props, and can be either cdo:measured_iv or cdo:manipulated_iv, depending if it consists in a discrete (e.g. type of communication) or continuous (e.g. amount of money incentive) variable. Additionally, up to four categories of properties can describe the cooperative game in a treatment, namely:

  • participant variables (cdo:participant_iv), i.e. all variables related to the people taking part of a cooperation, including personal background (age, ethnicity, education), stable personality traits (e.g. HEXACO/Social Value Orientation), dynamic psychological states (e.g., emotions, moods);

  • decision variables (cdo:decision_iv), i.e. all variables related to the decisions that people take during the game, e.g. intrapersonal features (priming, time constraints), or interpersonal features (communication, gossip);

  • game structure variables (cdo:game_structure_iv), consisting in all variables related to the structural aspects of the game, e.g. payment of the participants, protocol for forming teams, composition of the team etc;

  • institution variables (cdo:institution_iv), involving the rules and norms for participants, such as punishment, reward or taxation during the game.

Taking back the example of Listing 1.1, cdo:communicationBaseline, cdo:realCommunication, cdo:communicationContent are subproperties of cdo:communication _iv, indicating that in his study, the experimenter only manipulated communication as an independent variable. Of course, this is a rather simplified example, and treatments describe on average multiple IVs.

Knowledge Graph Population and Storage. Once defined the two parts of the schema, we create statements with the support of Python’s RDFlib library, additionally dividing them across a number of named graphs (i.e. studies, papers, observations, treatments, vocabulary descriptions) for storage and querying convenience. The generated dataset is hosted as a triplyDB instanceFootnote 5, allowing to easily upload and update datasets and expose them through APIs such as SPARQL, RESTful, and textual search. While the complete dataset is in the process of being iteratively published, its preliminary online version currently includes 330,655 statements, including approx. 1.1k studies and 61 specific independent variables (cfr. Table 1).

Table 1. DataBank status, as of Dec. 2019. Observations are still being computed.

4.2 Conducting Meta-analyses with the DataBank

In the second phase, we build a web-based interface allowing experts to conduct their meta-analysis over the generated knowledge graph. The application, shown in Fig. 3, is accessible through the main website, and allows to (i) explore the DataBank; (ii) select relevant studies and (iii) performing meta-analyses online.

Data Exploration. At first, users are presented a global overview of the data. Studies and the effect sizes they report can be explored using a number of criteria, e.g. year of publication, sample size, country of publication (as in Fig. 3). At the bottom, a tabular condensed view of the data is also offered, and users are given the possibility to click on studies, papers and observations to explore their properties in-depth directly from the triplyDB instance. An additional panel at the top offers the possibility of visualising the taxonomy of independent variables described in the previous section.

Search&Selection. The left panel allows users to select the desired studies before starting their statistical computations. The selection can be performed based on independent variables that are manipulated during treatments of a study (cfr. Fig. 2). In the example of Fig. 3, the user selected observations from studies manipulating some cdo:communication_iv, and specifically studies manipulating the properties cdo:realCommunication and cdo:communicationContent, resulting in a selection of 21 observations from 6 different studies.

Fig. 3.
figure 3

Main view of the CODA web-app, allowing data visualisation (bottom), search&selection (left), meta-analytic activities (top, and Fig. 4).

Multiple selection is allowed, e.g. one could additionally include cdo:personality _iv or cdo:emotion_iv variables. Studies can be additionally filtered based on sample and quantitative properties, e.g. one could choose to include in the meta-analysis only observations from studies published between 2000 and 2010, or those with at least 100 participants. In order to foster data sharing and reuse, the portion of desired data can be downloaded in tabular format. In this way, as authors are relieved from the coding step, as the provided data are already annotated, and the model can be extended according to the meta-analysis’ purposes.

Meta-analytic Activities. Once selected the desired data, the user can perform his meta-analysis using the tabs in the app. Due to space restrictions, Fig. 4 shows a simplified view of the meta-analytic steps, but the reader is invited to consult the online version to replicate our example. Typical meta-analysis steps include:

Fig. 4.
figure 4

Example of a meta-analysis: (a) fitted models to estimate the global effect size; (b) linear regression to assess the relation between the moderator and the effect size; (c) forest plot to determine heterogeneity of effect sizes (X-axis) per study (Y-axis); (d) violins to visualise the studies distribution in details; (e) funnels to assess symmetry of the results (X-axis) based on their error degree (Y-axis); (f) power analysis to estimate the required sample size in future experiments.

  1. 1.

    Fitting a meta-analytic model (Fig. 4a). This operation consists in choosing the model type (fixed, random and mixed effects), the method (maximum or restricted likelihood estimators), and the variables (single, or aggregated) to obtain the estimate of the overall population effect size. Models can also be fitted by specific moderators (e.g. mean age of the sample as in Fig. 4b), corresponding to the IVs and study characteristics in the KG schema.

  2. 2.

    Exploring the heterogeneity of single studies. Using forest plots (Fig. 4c), a meta-analysts can illustrate the effect sizes of the studies ordered by year of publication, using confidence intervals that reflect the precision with which effects were estimated (the narrower the confidence interval, the greater precision). Effect sizes can also be plotted to check their distribution and density using violin plots (Fig. 4d), where relevant statistics such as median and its 95% confidence interval, the quartiles and outliers are also shown.

  3. 3.

    Checking for publication bias. Using funnel plots (Fig. 4e), the user can plot effects sizes against the sample sizes in a symmetrical inverted funnel centred on the average effect, in a way that asymmetries in the distribution of the data should be revealed. An additional data-augmentation method called “trim&fill” can also be selected in order to estimate the number of studies that are missing from the meta-analysis.

  4. 4.

    Computing power analysis. This activity (Fig. 4f) allows to derive the optimal sample size for a desired effect size (either obtained by the fitted model, or specified by the user) with a given power and p-value. Determining the optimal effect size given a desired sample size and p-value is also possible. With this operation, researchers can calculate the required sample size necessary to obtain high statistical power in future studies.

Similarly to the data selection, all meta-analytic results can also be comfortably downloaded through the interface. This is particularly beneficial to less experienced meta-analysts, as they can be relieved from the often tedious and time consuming task of writing efficient code. Additionally, all statistical computations and activities are presented sequentially in order to support methodological design – thoroughly crafted using meta-analytic experts. Finally, by allowing to compute meta-analyses online, published and unpublished meta-analyses can also be easily reproduced, benefitting study reproducibility and transparency for the whole research field.

Implementation. The above web-app is implemented using R ShinyFootnote 6, a package for dashboard development straight from R code. The advantage of using Shiny lies mostly in the fact that we can exploit the large variety of statistical techniques (linear/nonlinear modelling, classical statistical tests etc.) and their graphical outputs (funnel, forests and violin plots) to manipulate, interact and visualise data from the DataBank knowledge graph. Data are selected through SPARQL queries stored as APIs on the triplyDB instance, allowing to further decouple the application from the dataset.

5 Usability Assessment via Use-Cases

Since neither the CODA app nor the dataset in its entirety have officially released at the time of writing, we focus here on a qualitative assessment with the domain experts using few use-cases, discussing how our approach support the various phases of the meta-analysis. A usability testing with users is under preparation, and will be held further the official release.

Offloading Data Maintainers. Here, we are interested in reducing the workload of the experts in maintaining and updating the dataset. We therefore asked the CODA team to quantify the time it takes the editorial board to include a new study in the dataset. Table 2 shows the main activities they need to perform, i.e. searching studies, skimming publications to assess if they fall under the eligibility criteria, coding studies based on the chosen characteristics, and computing the aggregate effect sizes if a work reports separate splits (e.g. a paper reporting 10 different cooperation rates, because 10 rounds of a game were experimented). We report answers of three experts E\(_n\) supervising the DataBank, as well as one annotator A of more limited expertise. These are then compared with the data provided by [28], an established reference that analysed the problem of performing meta-analyses in the medical domain.

Table 2. Time for data maintenance. Aggregation (*) is not always performed. The ranges relate to the difficulty of the studies, which can go from easy studies (small analyses from psychology/economics) to complex meta-analyses that require additional computations.

The table illustrates that data maintainers invest most of their efforts in searching, skimming and annotating papers, reported as the most time-expensive activities needed to be performed. A straightforward conclusion here is that the this workload could be significantly reduced by allowing users consulting the DataBank to upload their studies using the same annotation schema. This would allow maintainers to focus only on the light-weight refinement and validation of the uploaded data. Finally, the disproportion in time between activities in the social science and medical domain suggests that a substantial difference lies in the type of studies that experts need to analyse (e.g. lengths, regression analyses).

Improving Study Exploration. Here, we look at estimating how our approach supports meta-analysts in searching data. We asked the experts to write a set of competency questions, which can indicate how well the knowledge graph support the search phase through requirement fulfilment.

Table 3. Competency questions to assess data exploration.

Table 3 shows that a number of requirement, particularly related to the selection of studies with multiple characteristics or using effect sizes, could not be answered with the original the dataset. This means that a user would mostly have to go through the process of extracting and re-coding the studies of interests, before performing a new meta-analysis. Additionally, the DataBank now includes over 86 independent variables (vs. the original dataset with no controlled vocabulary for independent variables), which can be used both for the study selection and the moderator analyses at finer or coarser grain (e.g. users can decide to simply consider all communication variables, or to some specific communication characteristics). This is a great opportunity for the behavioural science community, which can easily investigate different research questions in a more automated and assisted way.

Support in Performing Statistical Analyses. We are interested in understanding if our approach facilitates users performing the statistical computations needed to finalise and publish a meta-analysis. Table 4 provides an estimate of the resources necessary to compute meta-analytic models in a normal setting (i.e. when not supported by the CODA application) based on the answers of two experts that recently run a meta-analysis (ma\(_1\), ma\(_2\)), as well as the information provided by [28]. We used lines of code as a measure for data preparation, model fitting and result plotting to show that users might be relieved from writing a significant part of code when using a semantically-enriched system, as opposed to a database of meta-analyses.

Of course, resource allocation is highly dependent on the type meta-analysis performed (i.e. number of studies analysed, complexity of the question framed, number of moderator analyses...), and the same would lie when conducting meta-analyses with the support of our framework. Yet, users would be relieved from the data preparation and programming tasks, offered by the CODA app as interactive activities to be performed in a customised way. To give a baseline over the current application, a simple sequence of model fitting, heterogeneity analysis and moderator analysis takes on average 5 to 10 min.

Table 4. Resource allocation for manually running a meta-analysis.

Fostering Reproducible Research Through Recommendations. Finally our approach offers quality improvement for the research field in terms of (1) reproducibility, (2) domain description and (3) best practice workflows. First, we offer a dataset of annotated and reproduced experimental studies openly available for consultation and download (both studies and the meta-analyses computed online). Secondly, by relying on a taxonomical representation of the domain, recommendations of variables to explore and analyse can be offered to users as “if you are looking at studies where [specific_var\(_1\)] variables were manipulated, you might want to explore other [parent_var] as [specific_var]\(_2\), \(_3\) ...”, where specific_var\(_n\) are siblings of a parent variable (e.g. cdo:communication_iv and cdo:priming_iv are children of cdo:decision_iv). Additionally, by relying on SPARQL queries over the dataset, we can monitor the users’ activities, and offer recommendations of new variables to meta-analyse based on popularity (i.e. suggesting the most popular moderators for specific variables) or anti-patterns (i.e. suggesting to choose less popular and unexplored queries for the meta-analysis). Finally, by describing meta-analysis as scientific workflows that manipulate data with specific parameters, we can collect and offer them as recommended practices to users. In this sense, the expertise of previous users could be leveraged to offer inexperienced practitioners a more automated way of performing meta-analyses – tackling to the cold start problem of designing experiments.

6 Conclusions

In this work, we have shown how to use semantic technologies to support researchers in conducting meta-analyses and summarise scientific evidence in a more automated way. Meta-analysis are a Science of Science method for research synthesis, which tend to suffer from scalability, methodological and technical issues also due to the exponentially growing volume of research. Using a social science scenario, we showed how the content of research outputs can be semantically described and used to build an application to help meta-analysts in searching, analysing and synthesising their results in a more automated way. The use-cases we discussed have shown that the approach is beneficial at several levels of the meta-analysis, and has the potential of fostering research replication and facilitating the framing and testing of research hypotheses in the field of behavioural science.

Future work will be focused on: (i) publishing the DataBank following the FAIR principlesFootnote 7, which will require alignment to existing vocabularies (e.g. Data Cubes) and linking instances to available datasets (e.g. Microsoft Academics); (ii) improving the web application with more advanced analyses, e.g. dynamics of citations, multivariate analyses, integration of cross-societal moderators from the linked open datasets (e.g GDP or GINI indices from Eurostats); (iii) implementing the collection and documentation of meta-analytic workflows using PROVFootnote 8; (iv) evaluation through user-testing, quantifying the time they take to perform a meta-analytic tasks with and without the support of the knowledge-based recommendations, workflows and documentation.