QNB: differential RNA methylation analysis for countbased smallsample sequencing data with a quadnegative binomial model
 1.1k Downloads
 3 Citations
Abstract
Background
As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIPSeq, transcriptomewide RNA methylation profile is now available in the form of countbased data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task.
Results
We present QNB, a statistical approach for differential RNA methylation analysis with countbased smallsample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes.
Conclusion
QNB showed improved performance on both simulated and real MeRIPSeq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m^{1}ASeq, ParCLIP, RIPSeq, etc.
Keywords
Differential methylation analysis m^{6}A Negative binomial distribution RNA methylation Smallsample sizeAbbreviations
 DRMS
differential RNA methylation sites
 IP
the immunoprecipitation
 m^{1}A
N1methyladenosine
 m^{6}A
N6methyladenosine
 MeRIPSeq
methylated RNA immunoprecipitation sequencing
 NB
the negative binomial distribution
 OR
the odds ratio
 ParCLIP
photoactivatable ribonucleosideenhanced crosslinking and immunoprecipitation
 QNB
quadnegative binomial model
 RIPSeq
RNA Immunoprecipitation sequencing
 RR
the risk ratio
Background
By comparing the IP and input control samples, RNA methylation sites can be identified in a peak calling procedure [12, 13], based on which, differential RNA methylation analysis can unveil the dynamics in posttranscriptional RNA methylation under two different experimental conditions in a casecontrol study [14, 15].
Differential methylation analysis concerns the difference in methylation level between two conditions, which has shown to be of crucial biological significance [16]. Previously, there have been a number of computational approaches developed for differential methylation analysis of DNA [17, 18, 19, 20, 21, 22]. Similar to DNA methylation, RNA methylation is also reversible and nonstoichiometric, and it is reasonable to speculate that the computational algorithms developed for DNA methylation are equally applicable to RNA methylation data. However, the unique features of RNA methylation and MeRIPSeq technique call for novel computational approaches.
The second prominent feature of MeRIPSeq data is the limited number of samples (small sample size) available. Currently, due to the costs and technical difficulties of MeRIPSeq experiment, there are usually no more than 3 biological replicates presented in a single study, which causes major difficulty in estimating the sitespecific variability of RNA methylation level. When reliable estimation of variability in methylation level cannot be achieved, it is difficult to further assess whether the observed difference is due to withingroup biological variability or not, making differential RNA methylation analysis between two experimental conditions fail. To solve this problem, we need novel approaches that work at even smallsample size scenario. Meanwhile, a number of smallsample inference approaches have been developed for sequencing data including, most notably, DESeq [23] and EdgeR [24], both of which rely on negative binomial distribution model with a linked variance and mean, which can shed light on this issue with a feasible solution for differential RNA methylation analysis problem at small sample size scenario.
To address the aforementioned limitations and challenges of MeRIPSeq RNA methylation sequencing data, we propose here the QNB model, a smallsample size solution for differential RNA methylation analysis, which stands for quadnegative binomial model. With 4 crosslinked negativebinomial distributions for modeling the IP and Input control samples of MeRIPSeq in two different experimental conditions, respectively, the proposed model is capable to robustly capture the withingroup variability of RNA methylation level at small sample size scenario so as to perform more effective differential RNA methylation analysis. The model has been implemented in an R package that is freely available.
Methods
QNB model
Here, q _{ i }represents the expected abundance of feature i under all conditions in a standard sequencing library. s _{ t , j }and s _{ c , j } represent the size factor of the IP and input control sample of the jth biological replicate and directly reflect their sequencing depth. p _{ i , ρ(j)} stands for risk of RNA methylation, or the true percentage of methylation for feature i under condition ρ(j) on the common scale, i.e., without rescaling by the size factors s _{ c , j } ands _{ t , j }. Additionally, e _{ i , ρ(j) }is introduced to model differential expression at RNA level as a featurespecific size factor, which indicates the abundance of feature i under a specific experimental condition compared with the standard abundance q _{ i }.
Please note that, compared with the DRME model [26], a more robust estimator for background expression level of the feature is implemented Eq. (7) by taking advantage of both the IP and input control samples. In DRME model, the basal level of gene expression is estimated from the input control sample only, as in theory without antibody based enrichment, the input control sample of MeRIPSeq data should contain both methylated and unmodified molecules, and thus corresponds to the true expression level. However, since the reads are usually enriched in the IP samples for a methylation sites to be called, there is usually less reads in the input control samples, and thus the estimator is not robust for very lowly expressed genes. For this reason, the basal level is estimated from the sum of input and IP samples in the QNB model. The robust estimator should largely improve the testing performance for very lowly expressed genes.
We use a 2dimensional local regression on the graph \( \left({\widehat{p}}_{i,\rho },{\widehat{q}}_i,{\widehat{w}}_{t,i,\rho}\right) \)to obtain a smooth function of\( {w}_{t,i,\rho}\left({\widehat{p}}_{i,\rho },{\widehat{q}}_i\right) \). Since \( {\widehat{w}}_{t,i,\rho } \)in Eq. ( 14 ) is the sum of squared random variable, the residuals of the model\( {w}_{t,i,\rho }{w}_{t,i,\rho}\left({\widehat{p}}_{i,\rho },{\widehat{q}}_{i,\rho}\right) \) are skewed. Following reference [27] and the practice in DESeq [23], we also implemented a generalized linear model of the gamma family for the local regression with the implementation in R locfit package [28] for estimation of \( {w}_{t,i,\rho}\left({\widehat{p}}_{i,\rho },{\widehat{q}}_i\right) \).
Similar to the estimation of υ _{ t , i , ρ(j)} and w _{ t , i , ρ } in the IP samples as described previously, the raw variance parameter υ _{ c , i , ρ(j)} and the variance of reads on the common scale w _{ c , i , ρ } for the input control samples can also be estimated.
Testing & Metrics
Please note that, the overdispersion of reads counts in input control samples are also modeled and covered in the QNB test, making it substantially different from the DESeq, DRME or ChIPComp. The QNB test essentially covers all the 4 samples with 4 crosslinked binomial distributions; while in DRME model, the input control samples are used only for gene expression estimation, so the statistical test covers the IP samples only with 2 negative binomial distributions. The inclusion of input control samples in the test, rather than simply using it as a background, makes a major contribution to the performance improvement, and also makes QNB substantially different from all other countbased (negativebinomial distributionbased) approaches such as DRME, edgeR, DESeq and ChIPComp.
QNB package

The mode “percondition” calculates an empirical dispersion value by considering the data from samples for this condition for each condition with replicates.

The mode “pooled” estimates a single pooled dispersion value using the samples from all conditions with replicates.

The mode “blind” ignores the sample labels and estimates a dispersion value as if all samples were replicates of a single condition, so this mode supports variance estimation even if there are no real biological replicates from the same condition available.

The mode “auto” selects mode according to the number of samples automatically. Under this option, “percondition” mode is adopted when biological replicates are available for a more sensitive estimation of the raw variance parameter; while the “blind” mode is used when no biological replicates are available.
QNB package implements the “auto” mode by default.
Results
To evaluate the performance of the proposed method, it is tested on simulated and real datasets, and compared with other approaches including exomePeak [12], MeTDiff [15], DRME [26] and Bltest [29]. We have also included in the comparison the DSS method [30], which is a most recent method developed for DNA differential methylation analysis, and the ChIPComp method [31], which was developed for differential binding analysis from ChIPSeq data.
Test on simulated dataset
The simulated data mimics the reads count information of 20,000 methylation sites in 3 IP and input control samples from two experimental conditions. Specifically, to simulate the impact of differential expression, we let log(q _{ i }) follow a uniform distribution and the percentage of methylation p _{ i , ρ(j)} follow a uniform distribution between 0 and 1. The two size factors e _{ i , ρ(j)} and s _{ t , j } are set to follow normal distributions after log transformation, in which the variance can be adjusted to mimic the impact of conditionspecific differential expression and different sequencing depth. In addition, p _{ i , ρ(j)} are set to be equal between two conditions for 50% of the RNA methylation sites, which are corresponding to the nondifferential sites. The others are set different as the true differential RNA methylation sites. Additionally, we set υ _{ t , i , ρ(j)} = d/{e _{ i , ρ(j)} s _{ t , j }} and υ _{ c , i , ρ(j)} = d/{e _{ i , ρ(j)} s _{ c , j }}to mimic the impact of overdispersion among biological replicates. Here, d is a constant value to quantify the degree of overdispersion, with a greater value indicating increased difference among biological replicates from the same condition. To evaluate the performance of the methods tested, 100 random datasets are generated and tested against these methods, and their area under receiver operating characteristic curves (AUCs) are calculated to evaluate their performance, respectively.
Test on human U2OS dataset
QNB approach was then tested on real RNA methylation sequencing dataset that profiles m^{6}A methylome in untreated U2OS cells and after treated with SAH hydrolysis inhibitor 3deazaadenosine (DAA) [32]. The original raw data in SRA format was obtained directly from GEO (GSE48037), which consists of 3 IP and 3 Input MeRIPSeq replicates under control condition and after DAA treatment, respectively (a total of 12 libraries). The short sequencing reads are firstly aligned to human genome assembly hg19 with Tophat2 [33]. In the reads alignment step, other spliceaware aligners such as Tophat2 [33], HISAT [34], STAR [35], RSEM [36], Kallisto [37] and Salmon [38] are also applicable. Then, a total 29,427 RNA N6methyladenosine (m^{6}A) sites are called by using exomePeak R/Bioconductor package with UCSC gene annotation database. In the peak calling step, to obtain a consensus RNA methylation site set between two experimental conditions (control and DAA treatment), the IP and Input control samples are merged, respectively. Then we used Bioconductor packages GenomicFeatures and Rsamtools [39] on R platform to obtain the reads count of every RNA methylation sites from the 3 IP and input control samples under two conditions, respectively. The reads count information can then be used for comparing QNB method with the other competing approaches.
Test on mouse midbrain dataset
We showed previously with a sampleswop test that, QNB method outperforms competing methods on a real RNA methylation sequencing dataset that profiles the epitranscriptomic impact of DAA treatment to human U2OS cells. It is necessary to examine whether this is still true on a different dataset. For this purpose, we repeated this test on a different MeRIPSeq dataset, which studies the impact of FTO knock down in mouse midbrain [41].
Discussion
The newly proposed approach is in many ways related to DESeq sand DRME model, including the negative binomial assumption of reads count data, the decomposition of variance into the shot noise and the raw variance, the usage of local regression of gamma family for estimating the variance and the construction of the test; however, QNB also extended these two models by including the input control samples as additional components for a more comprehensive statistical evaluation. And compared with the DRME method [26], a more robust estimator of the background (RNA expression level) is used by merging information from both the IP and input control samples. Importantly, as shown on simulated system and the real MeRIPSeq datasets from human and mouse, we showed in a sampleswop test that, QNB obviously outperforms the existing differential RNA methylation approaches, including exomePeak [12], MeTDiff [15], DRME [26] and Bltest [29]. It also outperforms DSS [30], a method developed for DNA methylation differential analysis, and ChIPComp [31], a method developed for ChIPSeq analysis.
There exist a number of issues that may affect the performance of QNB method in differential RNA methylation analysis. Firstly, biological replicates are still essential for achieving reliable results. As shown in Fig. 4, increased number of replicates helps to improve the prediction performance of QNB and the other 6 methods tested. Secondly, due to the existence of very lowly expressed genes, adequate sequencing depth is still necessary for detecting the features of low abundance. Thirdly, QNB relies on accurate reads count data of the RNA methylation sites (or other features), so precise determination of RNA methylation sites on the transcripts and proper sequencing reads alignment and counting are indispensable. In MeRIPSeq data, it can be difficult to differentiate isoform transcripts and thus difficult to perform isoformspecific differential RNA methylation analysis. Fourthly, data quality can still be a major limitation for RNA methylation sequencing experiments because of the technical difficulties and high costs. Without proper experiment design and implementation, the following computational analyses may end in vain. Fifthly, it is still an open question how to best estimate the library size factor of the samples for MeRIPSeq data. Conceivably, the size factors of the IP and input control samples may not be directly comparable due to their instinct properties and their distinct distribution patterns, and the immunoprecipitation efficiency of different IP samples may not be the same. Sixthly, the proposed method assumes that the variability of methylation level is a smooth function of expression level and methylation level; however, as the number of biological replicates increases, a more straightforward approach might be directly modeled and estimate sitespecific variability without this assumption. All the aforementioned issues call for further investigation and improvements.
Conclusions
RNA methylation has emerged as an important layer for gene regulation, where biological functions are modulated by reversible posttranscriptional RNA modifications. We proposed here a QNB model together with an R package for differential RNA methylation analysis at small sample size scenario. The method is based on four negative binomial distributions with their means and variances crosslinked together, which model the IP and input control samples under 2 experimental conditions, respectively. Compared with other methods on the simulated and real MeRIPSeq datasets, QNB is much more effective for differential RNA methylation analysis with the smallsample sequencing data. QNB model can also be applied to other data types related to RNA modifications, such as RNA bisulfite sequencing, m^{1}ASeq, ParCLIP and RIPSeq.
Notes
Acknowledgements
We thank computational support from the UTSA Computational Systems Biology Core. We thank reviewers and editors for helpful comments.
Funding
This work has been supported by National Natural Science Foundation of China [No.61473232, No.31671373, No.61401370 and No.91430111] to SWZ and JM; National Institute on Minority Health and Health Disparities [G12MD007591] to YFH; National Institutes of Health [R01GM113245] to YFH; Jiangsu University Natural Science Program [16KJB180027] to JM; Jiangsu Science and Technology Program [BK20140403] to JM.
Availability of data and materials
QNB can be downloaded from https://cran.rstudio.com/web/packages/QNB/.
Authors’ contributions
LL and JM designed and implemented the software package, and wrote the manuscript. SWZ and YFH conceived the idea and designed the research. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Supplementary material
References
 1.Bernstein BE, Meissner A, Lander ES. The mammalian epigenome. Cell. 2007;128(4):669–81.CrossRefPubMedGoogle Scholar
 2.Bock C. Analysing and interpreting DNA methylation data. Nat Rev Genet. 2012;13(10):705–19.CrossRefPubMedGoogle Scholar
 3.Laird PW. Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet. 2010;11(3):191–203.CrossRefPubMedGoogle Scholar
 4.Meyer KD, Jaffrey SR. The dynamic epitranscriptome: N6methyladenosine and gene expression control. Nat Rev Mol Cell Biol. 2014;15(5):313–26.CrossRefPubMedPubMedCentralGoogle Scholar
 5.Fu Y, Dominissini D, Rechavi G, He C. Gene expression regulation mediated through reversible m(6)a RNA methylation. Nat Rev Genet. 2014;15(5):293–306.CrossRefPubMedGoogle Scholar
 6.Machnicka MA, Milanowska K, Oglou OO, Purta E, Kurkowska M, Olchowik A, Januszewski W, Kalinowski S, DuninHorkawicz S, Rother KM: MODOMICS: a database of RNA modification pathways—2012 update. Nucleic acids research 2012:gks1007.Google Scholar
 7.Dominissini D, MoshitchMoshkovitz S, Schwartz S, SalmonDivon M, Ungar L, Osenberg S, Cesarkas K, JacobHirsch J, Amariglio N, Kupiec M, et al. Topology of the human and mouse m6A RNA methylomes revealed by m6Aseq. Nature. 2012;485(7397):201–6.CrossRefPubMedGoogle Scholar
 8.Meyer KD, Saletore Y, Zumbo P, Elemento O, Mason CE, Jaffrey SR. Comprehensive analysis of mRNA methylation reveals enrichment in 3′ UTRs and near stop codons. Cell. 2012;149(7):1635–46.CrossRefPubMedPubMedCentralGoogle Scholar
 9.Dominissini D, MoshitchMoshkovitz S, SalmonDivon M, Amariglio N, Rechavi G. Transcriptomewide mapping of N(6)methyladenosine by m(6)Aseq based on immunocapturing and massively parallel sequencing. Nat Protoc. 2013;8(1):176–89.CrossRefPubMedGoogle Scholar
 10.Harcourt EM, Kietrys AM, Kool ET. Chemical and structural effects of base modifications in messenger RNA. Nature. 2017;541(7637):339.CrossRefPubMedPubMedCentralGoogle Scholar
 11.Zhao BS, Roundtree IA, He C. Posttranscriptional gene regulation by mRNA modifications. Nat Rev Mol Cell Biol. 2017;18(1):31.CrossRefPubMedGoogle Scholar
 12.Meng J, Cui X, Rao MK, Chen Y, Huang Y. Exomebased analysis for RNA epigenome sequencing data. Bioinformatics. 2013;29(12):1565–7.CrossRefPubMedPubMedCentralGoogle Scholar
 13.Cui X, Meng J, Zhang S, Chen Y, Huang Y. A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIPseq data. Bioinformatics. 2016;32(12):i378–85.CrossRefPubMedPubMedCentralGoogle Scholar
 14.Meng J, Lu Z, Liu H, Zhang L, Zhang S, Chen Y, Rao MK, Huang Y. A protocol for RNA methylation differential analysis with MeRIPSeq data and exomePeak R/Bioconductor package. Methods. 2014;69(3):274–81.CrossRefPubMedPubMedCentralGoogle Scholar
 15.Cui X, Zhang L, Meng J, Rao M, Chen Y, Huang Y: MeTDiff: a Novel Differential RNA Methylation Analysis for MeRIPSeq Data. IEEE/ACM Trans Comput Biol Bioinform 2015, PP(99):1–1.Google Scholar
 16.Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012;13(7):484–92.CrossRefPubMedGoogle Scholar
 17.Wang X, Gu J, HilakiviClarke L, Clarke R, Xuan J: DMBLD: Differential methylation detection using a hierarchical Bayesian model exploiting local dependency. Bioinformatics 2016:btw596.Google Scholar
 18.Klein HU, Hebestreit K: An evaluation of methods to test predefined genomic regions for differential methylation in bisulfite sequencing data. Briefings in bioinformatics 2015:bbv095.Google Scholar
 19.Stockwell PA, Chatterjee A, Rodger EJ, Morison IM: DMAP: differential methylation analysis package for RRBS and WGBS data. Bioinformatics 2014:btu126.Google Scholar
 20.Saito Y, Tsuji J, Mituyama T. Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions. Nucleic Acids Res. 2014;42(6):e45.CrossRefPubMedPubMedCentralGoogle Scholar
 21.Robinson MD, Kahraman A, Law CW, Lindsay H, Nowicka M, Weber LM, Zhou X. Statistical methods for detecting differentially methylated loci and regions. Front Genet. 2014;5Google Scholar
 22.Assenov Y, Müller F, Lutsik P, Walter J, Lengauer T, Bock C. Comprehensive analysis of DNA methylation data with RnBeads. Nat Methods. 2014;11(11):1138–40.CrossRefPubMedPubMedCentralGoogle Scholar
 23.Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106.CrossRefPubMedPubMedCentralGoogle Scholar
 24.Robinson MD, McCarthy DJ. Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.CrossRefPubMedGoogle Scholar
 25.Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNAseq data. Genome Biol. 2010;11(3):R25.CrossRefPubMedPubMedCentralGoogle Scholar
 26.Liu L, Zhang SW, Gao F, Zhang Y, Huang Y, Chen R, Meng J. DRME: countbased differential RNA methylation analysis at small sample size scenario. Anal Biochem. 2016;Google Scholar
 27.McCullagh P, Weiss MR, Ross D. Modeling considerations in motor skill acquisition and performance: an integrated approach. Exerc Sport Sci Rev. 1989;17:475–513.PubMedGoogle Scholar
 28.Loader C. Locfit: local regression, likelihood and density estimation. R package version. 2007:1.5–4.Google Scholar
 29.Zhang L, Meng J, Liu H, Cui X, Zhang SW, Chen Y, Huang Y: Detecting differentially methylated mRNA from MeRIPSeq with likelihood ratio test. In: Signal and Information Processing (GlobalSIP), 2014 IEEE Global Conference on: 2014: IEEE; 2014: 1368–1371.Google Scholar
 30.Park YWH. Differential methylation analysis for BSseq data under general experimental design. Bioinformatics. 2016;32(10):1446–53.CrossRefPubMedGoogle Scholar
 31.Chen L, Wang C, Qin ZS, Wu H. A novel statistical method for quantitative comparison of multiple ChIPseq datasets. Bioinformatics. 2015;31(12):1889–96.CrossRefPubMedPubMedCentralGoogle Scholar
 32.Fustin JM, Doi M, Yamaguchi Y, Hida H, Nishimura S, Yoshida M, Isagawa T, Morioka MS, Kakeya H, Manabe I, et al. RNAmethylationdependent RNA processing controls the speed of the circadian clock. Cell. 2013;155(4):793–806.CrossRefPubMedGoogle Scholar
 33.Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14(4):R36.CrossRefPubMedPubMedCentralGoogle Scholar
 34.Kim D, Langmead B, Salzberg SL. HISAT: a fast spliced aligner with low memory requirements. Nat Methods. 2015;12(4):357–60.CrossRefPubMedPubMedCentralGoogle Scholar
 35.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNAseq aligner. Bioinformatics. 2013;29(1):15–21.CrossRefPubMedGoogle Scholar
 36.Dewey CN, Li B. RSEM: accurate transcript quantification from RNASeq data with or without a reference genome. Bmc Bioinformatics. 2011;12(1):323.CrossRefPubMedPubMedCentralGoogle Scholar
 37.Bray NL, Pimentel H, Melsted P, Pachter L. Nearoptimal probabilistic RNAseq quantification. Nat Biotechnol. 2016;34(5):525.CrossRefPubMedGoogle Scholar
 38.Patro R, Duggal G, Kingsford C: Salmon: accurate, versatile and ultrafast quantification from RNAseq data using lightweightalignment. 2015.Google Scholar
 39.Morgan M: An introduction to Rsamtools. 2011.Google Scholar
 40.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W. Modelbased analysis of ChIPSeq (MACS). Genome Biol. 2008;9(9):R137.CrossRefPubMedPubMedCentralGoogle Scholar
 41.Hess ME, Hess S, Meyer KD, Verhagen LA, Koch L, Bronneke HS, Dietrich MO, Jordan SD, Saletore Y, Elemento O, et al. The fat mass and obesity associated gene (Fto) regulates activity of the dopaminergic midbrain circuitry. Nat Neurosci. 2013;16(8):1042–8.CrossRefPubMedGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.