A consensus score to combine inferences from multiple centres

Haselimashhadi, Hamed; Babalola, Kolawole; Wilson, Robert; Groza, Tudor; Muñoz-Fuentes, Violeta

doi:10.1007/s00335-023-09993-0

A consensus score to combine inferences from multiple centres

Open access
Published: 08 May 2023

Volume 34, pages 379–388, (2023)
Cite this article

Download PDF

You have full access to this open access article

Mammalian Genome Aims and scope Submit manuscript

A consensus score to combine inferences from multiple centres

Download PDF

Hamed Haselimashhadi¹^na1,
Kolawole Babalola¹,
Robert Wilson¹,
Tudor Groza¹ &
…
Violeta Muñoz-Fuentes¹^na1

1064 Accesses
Explore all metrics

Abstract

Experiments in which data are collected by multiple independent resources, including multicentre data, different laboratories within the same centre or with different operators, are challenging in design, data collection and interpretation. Indeed, inconsistent results across the resources are possible. In this paper, we propose a statistical solution for the problem of multi-resource consensus inferences when statistical results from different resources show variation in magnitude, directionality, and significance. Our proposed method allows combining the corrected p-values, effect sizes and the total number of centres into a global consensus score. We apply this method to obtain a consensus score for data collected by the International Mouse Phenotyping Consortium (IMPC) across 11 centres. We show the application of this method to detect sexual dimorphism in haematological data and discuss the suitability of the methodology.

Collaborative Cross and Diversity Outbred data resources in the Mouse Phenome Database

Article Open access 19 August 2015

A practical solution to pseudoreplication bias in single-cell studies

Article Open access 02 February 2021

Mouse phenome database: curated data repository with interactive multi-population and multi-trait analyses

Article Open access 15 August 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Measuring response to a treatment based on data collected from multiple resources, such as multicentre clinical trials or animal experiments, benefits from (1) lower noise level, because results are not strongly resource-dependent (Karp et al. 2014), and (2) effectiveness, because they apply to a broader population (Rashid et al. 2012; Karp et al. 2017). In these experiments, obtaining a global consensus in the statistical inference across resources is desired. However, even in highly controlled experiments, it is not always possible to control for all sources of variation across all resources. This makes aggregating statistical results from multiple resources challenging because the results may be vulnerable to biases, which lead to inconsistent inferences. The design of the study, sample size, power of the analysis, variation across centres or over time (Haselimashhadi et al. 2020a) and unknown errors are examples of factors that pose a challenge to obtaining a global statistical conclusion across resources (Chung et al. 2010; Hu et al. 2022; Knatterud et al. 1998). Other confounders are the equipment that is used to perform the measurements in different resources (e.g., centres, laboratories, etc.), the level of experience of the staff and more complex environmental factors that typically arise in animal tests, such as diet, litter, handling, circadian rhythm, housing and husbandry. Therefore, in multi-resource experiments, it is crucial to control for as many variables as possible, to be able to reach global agreements (Haselimashhadi et al. 2020a; Chung et al. 2010; Chalmers and Clarke 2004; Hogg 1991). Table 1 shows some examples of possible outcomes when an experiment is conducted in 4 centres.

Table 1 Examples of possible outcomes when a global inference from statistical results obtained from multiple centres is desired

Full size table

In this paper, we present a methodological approach which seeks to find a solution to the problem of multi-resource consensus with a focus on multicentre experiments. The proposed method allows calculating a global consensus score for the effect of interest (i.e., research questions, e.g., genotype, sexual dimorphism, bodyweight effect) in multicentre studies. The method takes into consideration the number of centres where the test of interest is performed at, the direction and magnitude of the effect size and the significance level obtained from individual centres and combines the values into a global consensus score. We apply our method to data obtained by the International Mouse Phenotyping Consortium (IMPC), a transnational multicentre endeavour that screens the phenotypes of single-gene knockout mouse lines and wild-type mice to understand gene function (Koscielny et al. 2014).

Method

There are several approaches typically used to aggregate inferences from multicentre data. Among them, three major methods involve adjusting for centres using fixed and random models; or analysing each centre separately and then combining the results using meta-analyses (Rashid et al. 2012; Basagaña et al. 2018; Burke et al. 2017; Bowden et al. 2011; Stewart et al.2012). Other methods are utilising group decision-making processes, such as the DELPHI method (Ven and Delbecq 2017; Dalkey and Helmer 1963); or using a simple majority rule criteria, such as all centres agree versus at least one centre disagree; or employing simple statistics or probabilistic criteria, such as more than half/mean/median centres/results agree or simple statistical tests such as T-test or ANOVA (Mlecnik et al. 2020). Latter approaches may suffer from insufficient power, individual bias (such as misjudgements or making decisions based on insufficient information) and may have strong underlying assumptions as well as require a large M, the total number of centres, to converge to the true inference (Rashid et al. 2012; Using the Delphi method 2022).

Here we propose an alternative approach which combines the corrected p-values (q-values), which we obtained using the FDR (Controlling the False Discovery Rate 2022; Wright 1992; Hochberg 1988), and the effect sizes from individual centres and compares them with a set of expected values as below:

$$\mathrm{Consensus score }\left(\mathrm{s}\right)= \left\{\begin{array}{c}\frac{{\sum }_{i}\left({q}_{i}\times \sqrt{\left|{\rho }_{i}\right|}\right)}{\overline{M}{ }^{2}\times \widehat{q}\times \sqrt{\widehat{\rho }}}\times Max\left(\frac{M}{2},\overline{M }\right) , \overline{M }\times P>c\\ 1 \,\,\,\,\,\,\,\,\,\,\,\ , o.w\end{array}\right.$$

(1)

where i = 1, 2,…,M represents the i^th centre from a total of $M$ centres, $\overline{M }$ the total number of centres where the test is performed at ($M$ is not necessarily equivalent to $\overline{M }$ in multicentre multi-test studies where the aim is to compare several measurements across centres while fixing the number of centres), ${q}_{i}$ the corrected p value (q-value) from the statistical test performed in centre $i$ for the effect of interest (e.g. sex, genotype, body weight effect, etc.), ${\rho }_{i}$ the estimated standardised effect size from the statistical test that is performed in centre $i$, such as Cohen’s $d$ effect size (Ellis 2010) and $P=|{\sum }_{i}\mathrm{Sign}\left({\rho }_{i}\right)/\overline{M }|$ is a penalty term to control for the directionality of the results, and the $\mathrm{Sign}\left(\rho \right)$ is the sign function defined by

$$\mathrm{Sign}\left(\rho \right)=\left\{\begin{array}{c}1 \rho >0\\ 0 \rho =0\\ -1 \rho <0\end{array}\right..$$

Finally, $c$, $\widehat{q}$ and $\widehat{\rho }$ are the minimum required number of centres for the analysis, the expected q-value and effect size from the prior information, respectively. We recommend $c=3$, $\widehat{q}=0.05$ and moderate expected effect size $\widehat{\rho }=0.5$ (Karp et al. 2017; Sullivan and Feinn 2012; Sawilowsky 2009) as the preliminary values for high-throughput experiments, such as in the IMPC. We stress that the choice of these parameters should be based on prior information. The choice of the expected q-value or the minimum number of required centres should take into account the context of the study, the sensitivity of the results or expert knowledge in the field; the expected effect size can be set from prior studies, simulations or empirical results, as we show in Fig. 1. This figure shows the distribution of the standardised effect sizes for the IMPC haematological traits and empirical mean ($10\%$ trimmed) from the data and the recommended expected effect size, $\widehat{\rho }$ = 0.5. We further assume that (1) there is no unusual temporal variation in the data (Supplementary Fig. 1), (2) the statistical tests are consistent and sufficiently powerful and adequate for the data under study, (3) the method to adjust the p-values is adequate (e.g. FDR); and (4) the effect sizes are estimated from the normalised data. Here normalising data refers to performing the statistical analysis on the standardised data as below:

$$\mathrm{standardised data for centre} i=\frac{{x}_{i}-{\mu }_{xi}}{{\sigma }_{xi}}$$

where ${x}_{i}$, ${\mu }_{xi}$ and ${\sigma }_{xi}$ are the raw values, mean and standard deviation of the data from centre $i$ respectively. The resulting scores from Eq. 1 range in the $\left(0,+\infty \right)$ interval and the agreement of the multicentre statistical results can be evaluated by using $-\mathrm{log}\left(\mathrm{s}\right)$ so that

$$\left\{\begin{array}{c}Consensus across centres if -\mathrm{log}\left(\mathrm{s}\right)>0 \\ Not enough consensus across centres if -\mathrm{log}\left(\mathrm{s}\right)\le 0\end{array}.\right.$$

The magnitude of $-\mathrm{log}\left(\mathrm{s}\right)$ from Eq. 1 is not bounded. As a result, a larger value in the positive (or negative) direction reflects a stronger agreement (or lack of agreement) among resources. For the special case where $-\mathrm{log}\left(\mathrm{s}\right)=0$, one can conclude that either there is not enough information in the data to calculate the scores or there is not enough agreement across centres. Throughout this paper, we use the term “not enough agreement” in contrast to “disagreement” to emphasize the difference between strong detection of consensus and not finding enough evidence to establish consensus among centres. Table 2 shows several scenarios as well as the inferences from the scores in Eq. 1. This table shows that the most ambiguous scenario happened when all centres achieved the same effect size and q-value to the expected values (scenario 2) or the centre achieved a range of opposite (in sign) effects so that $M\times P\le 3$ (scenario 3). Because ${\mathrm{q}}_{\mathrm{i}}$ and ${\mathrm{p}}_{\mathrm{i}}$ are continuous real values, ${\mathrm{q}}_{\mathrm{i}},|{\mathrm{p}}_{\mathrm{i}}|\in [0,\infty )$, scenario 3 happens with an extremely low chance that can be safely neglected.

Table 2 The demonstration of the scores calculated from Eq. 1 in a set of scenarios with 3 or more centres when the proposed scoring method in Eq. 1 leads to different values and inferences

Full size table

Results

In this section, we show the application of the proposed scoring method along with two methods from the literature, precisely global consensus and metadata analysis, to identify sexual dimorphism in the IMPC haematological data collected from wild-type (WT) mice, with an average age of 16–18 weeks, over a 3-year period from 1st January 2018 to 31st December 2020, with a minimum required threshold of 50 mice per sex. Our choice of data is inspired by the importance of the haematology parameters reflecting overall health. The data used in this study can be accessed via the IMPC web portal under the URL www.mousephenotype.org (data release 15.1—October 2021).

The IMPC is a global effort aiming to generate and characterise knockout mouse lines for every protein-coding gene in mice (Dickinson et al. 2016; Bradley et al. 2012; Brown and Moore 2012; Hrabě de Angelis et al. 2015). The IMPC data are collected from several independent centres worldwide (Koscielny et al. 2014). Every centre contributes to the data collection by adhering to a set of standardised phenotype assays defined in the International Mouse Phenotyping Resource of Standardised Screens (IMPReSS—www. mousephenotype.org/impress). Although all centres follow the same Standard Operating Procedures (SOPs), there may be unavoidable or necessary variations in the implementation of the experiments (such as mouse age or time of the day when the test is performed), equipment (such as manufacture, model and kits) as well as the level of expertise and experience of staff (experimenter effect), in addition to variations in inbred mouse strain (Table 3) (Bryant et al. 2008). This may lead to differing results across centres, which makes a global inference from the results challenging.

Table 3 Mouse strains that are used by the IMPC centres for the haematological data collected from 1st January 2018 to 31st December 2020

Full size table

IMPC haematology

The IMPC haematology procedure encapsulates 22 measurements of blood properties such as counts and concentrations (white blood cell count, red blood cell count, haemoglobin concentration, platelet counts, etc.), as well as additional and derived haematological parameters (haematocrit, mean red blood cell volume, mean red blood cell haemoglobin, mean red blood cell haemoglobin concentration, etc.). Figure 2 (top) shows red blood cell counts, (middle) the haemoglobin concentration and (bottom) the monocyte cell counts collected by IMPC centres. The shifts in the means are most likely due to differences in the equipment used to take the measurements and can be removed by normalising the data. The top plot shows consistently higher red blood cell counts in males than females across centres, whereas there is not a clear pattern for the haemoglobin concentration. For the monocyte counts, males present consistently higher values, except for one centre, which shows the opposite.

Consensus score

In line with (Karp et al. 2017), the sexual dimorphism effect is tested for all 22 haematology traits, independently for WT mice from individual centres, corresponding to the same mouse strain and metadata group split. We used a linear mixed model described in Haselimashhadi et al. 2020b; Gałecki and Burzykowski 2013) and implemented in the software R (Team RC-VRC 2013) and packages OpenStats (Mashhadi 2023). As in Karp et al. (2017), $Sex$ and $Body Weight$ in the fixed effect terms

$$Response=Sex+BodyWeight+e,$$

and Batch (the date when the test is performed on mice) in the random effect term. We then apply the scoring method to obtain a consensus global inference from the multicentre results, following the logic described in the flowchart below (Fig. 3). We further compare our method with the global consensus criteria (all centres agree vs at least one centre disagree) and the random effects metadata analysis approach described in Cooper et al. (2009) (page 295–315) and (Stewart et al. 2012), implemented in the R package metafor (Viechtbauer 2010).

Table 4 shows the outcome of the scoring method for the 22 haematological parameters measured by the IMPC, as well as the comparison with a consensus method based on all centres agreeing on a significant sex effect and the meta-analysis method. Using the method proposed here, there is consensus among 11 IMPC centres for 14 traits with $-\mathrm{log}\left(s\right)>0$, with males on average higher than females for 9 traits (red blood cell count, red blood cell distribution width, haematocrit, platelet count, white blood cell count, lymphocyte cell count, neutrophil cell count, monocyte cell count, eosinophil cell count) and females on average higher than males for 5 traits (mean cell volume, mean corpuscular haemoglobin, mean cell haemoglobin concentration, mean platelet volume, and lymphocyte differential count). For 8 traits, the scoring method leads to zero or negative values, reflecting a lack of consensus (6 traits), or does not reach the minimum threshold of three centres providing measurements for the results to be processed (lack of information in the data—2 traits). The meta-analysis method shows consistent results with the scoring method, however, does not obtain the homogeneity of the statistical results across the centres for the monocyte cell count (also shown in Fig. 2 bottom), lymphocyte differential count and a borderline p-value for the eosinophil cell count (p value = 0.069) and the neutrophil differential count (p value = 0.048). Visual inspection of the data shows that the meta-analysis has a better performance for identifying the lack of agreement in lymphocyte differential count whereas the scoring method outperforms this method for the monocyte cell count. In contrast with the two methods above, the global consensus method shows the agreement across centres for the neutrophil cell count and Large Unstained Cell (LUC) count where the latter does not reach the requirement of a minimum of 3 centres.

Table 4 The outcome of applying the scoring method to 22 haematological measurements collected by 11 IMPC centres compared with outcomes by the individual centre (first three columns) and a method based on measuring the heterogeneity of the SD estimates across the centres using random effects metadata analysis (last column). The traits are shown in rows followed by the counts for the centre-based statistical test results, the mean effect size for the 11 centres, the consensus score and the inference, which is based on the -log(score) and the sign of the mean effect size. The scoring method identifies consensus in sexual dimorphism across centres for 14 traits (green and red rows), no agreement for 8 traits (blue rows) and 2 traits which do not meet the minimum requirements for the calculation of the score (yellow rows). Only in 2 cases, all centres agree (in bold)

Full size table

Conclusion and future work

Collecting data from multiple resources such as, in the case of this study, mouse phenotyping centres, benefits from a higher signal-to-noise ratio and a broader representation of a population. However, extra attention is required in the design and implementation of the experiments and statistical analysis to be able to make a global consensus inference from the aggregated results from individual resources (Rashid et al. 2012; Karp et al. 2017; Haselimashhadi et al. 2020a; Chung et al. 2010; Hu et al. 2022; Knatterud et al. 1998; Chalmers and Clarke 2004; Hogg 1991; Basagaña et al. 2018; Burke et al. 2017; Bowden et al. 2011; Stewart et al. 2012; Viechtbauer 2010; Bierer et al. 2017; Devereaux et al. 2016). Due to unavoidable, uncontrolled and unobserved factors, the results from all resources may only partially agree and a metric of consensus is required. In this paper, we propose a novel method which combines several aspects of multicentre experiment results including the corrected p-values, the magnitude and direction of effect sizes and the number of centres into one global consensus score.

We applied this method to identify sexual dimorphism in 22 haematological measurements collected from wildtype mice in 11 globally distributed centres forming part of the International Mouse Phenotyping Consortium (IMPC). We compared the results of this method to those obtained by the meta-analysis as well as by applying a binary method based on the agreement of all centres on the detection of sexual dimorphism. While the binary method found 2 traits reaching consensus across all IMPC centres, the method presented here allows to conclude sexual dimorphism in 14 traits, with males on average higher than females for 9 traits and females on average higher than males for 5 traits. Further, comparing our method with the meta-analysis method shows a high degree of overlap between the two $(\frac{16}{20}=80\%)$ for the haematological traits. Our method shows better performance for monocyte cell count ($-\mathrm{log}(\mathrm{score})=2.28$ versus meta-analysis p-value $= 0.131$) and eosinophil cell count ($-\mathrm{log}(\mathrm{score})=1.08$ versus meta-analysis p-value $=0.069$). However, a challenging case for the interpretation of the results is presented in comparing the outcome of the scoring method versus the meta-analysis method for lymphocyte differential count ($-\mathrm{log}(\mathrm{score})=0.13$ versus meta-analysis p-value $=0.138$). This study has focused on the IMPC haematology traits, but we believe the approach could be applied more generally and would be suitable to assess other IMPC parameters in the future.

Future studies

In this study, we showed the application of our scoring method to IMPC haematological data. In future studies, we will investigate the performance of the method when applied to other IMPC procedures as well as obtaining the statistical properties of the test statistic. This will allow assigning a probability of consensus to the scores (in particular when they are close to 1 or -log(score) is close to zero) that contributes to the reliability of the method.

Data availability

All data used in the study correspond to the IMPC data release 15.1 (October 2021) and can be retrieved from the IMPC data repository under the URL https://www.mousephenotype.org/help/non-programmatic-data-access/. A copy of the data, results and source codes are publicly available from www.doi.org/10.5281/zenodo.7704684.

References

Basagaña X, Pedersen M, Barrera-Gómez J, Gehring U, Giorgis-Allemand L, Hoek G et al (2018) Analysis of multicentre epidemiological studies: contrasting fixed or random effects modelling and meta-analysis. Int J Epidemiol 47:1343–1354. https://doi.org/10.1093/IJE/DYY117
Article PubMed Google Scholar
Bierer BE, Crosas M, Pierce HH (2017) Data authorship as an incentive to data sharing. N Engl J Med 376:1684–1687. https://doi.org/10.1056/NEJMSB1616595
Article PubMed Google Scholar
Bowden J, Tierney JF, Simmonds M, Copas AJ, Higgins JP (2011) Individual patient data meta-analysis of time-to-event outcomes: one-stage versus two-stage approaches for estimating the hazard ratio under a random effects model. Res Synth Methods 2:150–162. https://doi.org/10.1002/JRSM.45
Article PubMed Google Scholar
Bradley A, Anastassiadis K, Ayadi A, Battey JF, Bell C, Birling MC et al (2012) The mammalian gene function resource: the international knockout mouse consortium. Mamm Genome 23:580–586. https://doi.org/10.1007/s00335-012-9422-2
Article PubMed PubMed Central Google Scholar
Brown SDM, Moore MW (2012) The International mouse phenotyping consortium: past and future perspectives on mouse phenotyping. Mamm Genome 23:632–640. https://doi.org/10.1007/s00335-012-9427-x
Article CAS PubMed PubMed Central Google Scholar
Bryant CD, Zhang NN, Sokoloff G, Fanselow MS, Ennes HS, Palmer AA et al (2008) Behavioral differences among C57BL/6 substrains: implications for transgenic and knockout studies. J Neurogenet 22:315. https://doi.org/10.1080/01677060802357388
Article CAS PubMed PubMed Central Google Scholar
Burke DL, Ensor J, Riley RD (2017) Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ. Stat Med 36:855–875. https://doi.org/10.1002/SIM.7141
Article PubMed Google Scholar
Chalmers I, Clarke M (2004) Commentary: the 1944 patulin trial: the first properly controlled multicentre trial conducted under the aegis of the British Medical Research Council. Int J Epidemiol 33:253–260. https://doi.org/10.1093/IJE/DYH162
Article PubMed Google Scholar
Chung KC, Song JW, group W study (2010) A guide on organizing a multicenter clinical trial: the WRIST study group. Plast Reconstr Surg. 126:515. https://doi.org/10.1097/PRS.0B013E3181DF64FA
Article CAS PubMed PubMed Central Google Scholar
Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing on JSTOR. [cited 21 Oct 2022]. Available: https://www.jstor.org/stable/2346101
Cooper H, Hedges LV, Valentine JC (2009) The Handbook of Research Synthesis and Meta-Analysis. Russell Sage Foundation
Dalkey N, Helmer O (1963) An experimental application of the DELPHI method to the use of experts. Manag Sci 9:458–467. https://doi.org/10.1287/MNSC.9.3.458
Article Google Scholar
Dickinson ME, Flenniken AM, Ji X, Teboul L, Wong MD, White JK et al (2016) High-throughput discovery of novel developmental phenotypes. Nature 537:508–514. https://doi.org/10.1038/nature19356
Article CAS PubMed PubMed Central Google Scholar
Ellis P. The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. 2010. Available: https://books.google.com/books?hl=en&lr=&id=UUcgAwAAQBAJ&oi=fnd&pg=PR13&dq=The+Essential+Guide+to+Effect+Sizes+&ots=-d7gkrhpeO&sig=xjGU7RQ1tikVViYt6QlI7LdtbQg
Gałecki A, Burzykowski T (2013) Linear mixed-effects model. Springer. https://doi.org/10.1007/978-1-4614-3900-4_13
Article Google Scholar
Haselimashhadi H, Mason JC, Munoz-Fuentes V, López-Gómez F, Babalola K, Acar EF et al (2020a) Soft windowing application to improve analysis of high-throughput phenotyping data. Bioinformatics 36:1492–1500. https://doi.org/10.1093/bioinformatics/btz744
Article CAS PubMed Google Scholar
Haselimashhadi H, Mason JC, Mallon AM, Smedley D, Meehan TF, Parkinson H (2022) OpenStats: a robust and scalable software package for reproducible analysis of high-throughput phenotypic data. PLoS ONE. https://doi.org/10.1371/journal.pone.0242933
Article Google Scholar
Hochberg Y (1988) A sharper bonferroni procedure for multiple tests of significance. Biometrika 75:800. https://doi.org/10.2307/2336325
Article Google Scholar
Hogg RJ (1991) Trials and tribulations of multicenter studies. Lessons learned from the experiences of the Southwest Pediatric Nephrology Study Group (SPNSG). Pediatr Nephrol. 5:348–351. https://doi.org/10.1007/BF00867501
Article CAS PubMed Google Scholar
Hrabě de Angelis M, Nicholson G, Selloum M, White JK, Morgan H, Ramirez-Solis R et al (2015) Analysis of mammalian gene function through broad-based phenotypic screens across a consortium of mouse clinics. Nat Genet. 47:969–978. https://doi.org/10.1038/ng.3360
Article CAS Google Scholar
Hu M, Shi X, Song PX-K (2022) Collaborative causal inference with a distributed data-sharing management. arxiv preprint arXiv. https://doi.org/10.48550/arxiv.2204.00857
Article Google Scholar
International Consortium of Investigators for Fairness in Trial Data Sharing, Devereaux PJ, Guyatt G, Gerstein H, Connolly S, Yusuf S (2016) Toward fairness in data sharing. N Engl J Med. 375:405–7. https://doi.org/10.1056/NEJMp1605654
Article Google Scholar
Karp NA, Speak AO, White JK, Adams DJ, de Angelis MH, Hérault Y et al (2014) Impact of temporal variation on design and analysis of mouse knockout phenotyping studies. PLoS One. https://doi.org/10.1371/JOURNAL.PONE.0111239
Article PubMed PubMed Central Google Scholar
Karp NA, Mason J, Beaudet AL, Benjamini Y, Bower L, Braun RE et al (2017) Prevalence of sexual dimorphism in mammalian phenotypic traits. Nat Commun 8:15475. https://doi.org/10.1038/ncomms15475
Article CAS PubMed PubMed Central Google Scholar
Knatterud GL, Rockhold FW, George SL, Barton FB, Davis CE, Fairweather WR et al (1998) Guidelines for quality assurance in multicenter trials: a position paper. Control Clin Trials 19:477–493. https://doi.org/10.1016/S0197-2456(98)00033-6
Article CAS PubMed Google Scholar
Koscielny G, Yaikhom G, Iyer V, Meehan TF, Morgan H, Atienza-Herrero J et al (2014) The International Mouse Phenotyping Consortium Web Portal, a unified point of access for knockout mice and related phenotyping data. Nucleic Acids Res. https://doi.org/10.1093/nar/gkt977
Article PubMed Google Scholar
Mashhadi HH (2023) OpenStats: A Robust and Scalable Software Package for Reproducible Analysis of High-Throughput genotype-phenotype association. R package version 1.12.0. https://git.io/Jv5w0. https://doi.org/10.18129/B9.bioc.OpenStats
Mlecnik B, Bifulco C, Bindea G, Marliot F, Lugli A, Lee JJ et al (2020) Multicenter international society for immunotherapy of cancer study of the consensus immunoscore for the prediction of survival and response to chemotherapy in stage III colon cancer. J Clin Oncol 38:3638. https://doi.org/10.1200/JCO.19.03205
Article CAS PubMed PubMed Central Google Scholar
Rashid MM, McKean JW, Kloke JD (2012) R estimates and associated inferences for mixed models with covariates in a multicenter clinical trial. Stat Biopharm Res 4:37–49. https://doi.org/10.1080/19466315.2011.636293
Article Google Scholar
Sawilowsky SS (2009) New effect size rules of thumb. J Mod Appl Stat Methods 8:597–599. https://doi.org/10.22237/jmasm/1257035100
Article Google Scholar
Stewart GB, Altman DG, Askie LM, Duley L, Simmonds MC, Stewart LA (2012) Statistical analysis of individual participant data meta-analyses: a comparison of methods and recommendations for practice. PLoS One. 7:e46042. https://doi.org/10.1371/JOURNAL.PONE.0046042
Article CAS PubMed PubMed Central Google Scholar
Sullivan GM, Feinn R (2012) Using effect size—or why the p value is not enough. J Grad Med Educ 4:279. https://doi.org/10.4300/JGME-D-12-00156.1
Article PubMed PubMed Central Google Scholar
Team RC-VRC, 2013 undefined. R: A language and environment for statistical computing. yumpu.com. [cited 18 Oct 2022]. Available: https://www.yumpu.com/en/document/view/6853895/r-a-language-and-environment-for-statistical-computing
Using the Delphi method | IEEE Conference Publication | IEEE Xplore. [cited 7 Nov 2022]. Available: https://ieeexplore.ieee.org/abstract/document/6017716
van de Ven AH, Delbecq AL (2017) The effectiveness of nominal, Delphi, and interacting group decision making processes1. Acad Manag J. https://doi.org/10.5465/255641
Article Google Scholar
Viechtbauer W (2010) Conducting meta-analyses in R with the metafor package. J Stat Softw 36:1–48. https://doi.org/10.18637/JSS.V036.I03
Article Google Scholar
Wright SP (1992) Adjusted p-values for simultaneous inference. Biometrics 48:1005. https://doi.org/10.2307/2532694
Article Google Scholar

Download references

Acknowledgements

We thank Helen Parkinson for her feedback on this manuscript. The research reported in this publication was supported by the European Molecular Biology Laboratory (EMBL-EBI) core funding and the National Human Genome Research Institute of the National Institutes of Health under Award Number 2UM1HG006370-11. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Funding

Open Access funding enabled and organized by Projekt DEAL. European Bioinformatics Institute, 2UM1HG006370-11, 2UM1HG006370-11, 2UM1HG006370-11, 2UM1HG006370-11, 2UM1HG006370-11

Author information

Hamed Haselimashhadi and Violeta Muñoz-Fuentes have mainly contributed to this work.

Authors and Affiliations

European Bioinformatics Institute, European Molecular Biology Laboratory, Hinxton, UK
Hamed Haselimashhadi, Kolawole Babalola, Robert Wilson, Tudor Groza & Violeta Muñoz-Fuentes

Authors

Hamed Haselimashhadi
View author publications
You can also search for this author in PubMed Google Scholar
Kolawole Babalola
View author publications
You can also search for this author in PubMed Google Scholar
Robert Wilson
View author publications
You can also search for this author in PubMed Google Scholar
Tudor Groza
View author publications
You can also search for this author in PubMed Google Scholar
Violeta Muñoz-Fuentes
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.H. and V.M. contributed to the development of the concept and writing of the manuscript. H.H., V.M. and K.B. contributed to the validation of the method. All authors contributed to the review of and approved the final version of the manuscript.

Corresponding author

Correspondence to Hamed Haselimashhadi.

Ethics declarations

Competing interests

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 2516 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Haselimashhadi, H., Babalola, K., Wilson, R. et al. A consensus score to combine inferences from multiple centres. Mamm Genome 34, 379–388 (2023). https://doi.org/10.1007/s00335-023-09993-0

Download citation

Received: 08 November 2022
Accepted: 02 March 2023
Published: 08 May 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00335-023-09993-0

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A consensus score to combine inferences from multiple centres

Abstract

Similar content being viewed by others

Collaborative Cross and Diversity Outbred data resources in the Mouse Phenome Database

A practical solution to pseudoreplication bias in single-cell studies

Mouse phenome database: curated data repository with interactive multi-population and multi-trait analyses

Introduction

Method