Abstract
Fixation index (Fst) statistics provide critical insights into evolutionary processes affecting the structure of genetic variation within and among populations. Fst statistics have been widely applied in population and evolutionary genetics to identify genomic regions targeted by selection pressures. The FSTest 1.3 software was developed to estimate four Fst statistics of Hudson, Weir and Cockerham, Nei, and Wright using high-throughput genotyping or sequencing data. Here, we introduced FSTest 1.3 and compared its performance with two widely used software VCFtools 0.1.16 and PLINK 2.0. Chromosome 1 of 1000 Genomes Phase III variant data belonging to South Asian (n = 211) and African (n = 274) populations were included as an example case in this study. Different Fst estimates were calculated for each single-nucleotide polymorphism (SNP) in a pairwise comparison of South Asian against African populations, and the results of FSTest 1.3 were confirmed by VCFtools 0.1.16 and PLINK 2.0. Two different sliding window approaches, one based on a fixed number of SNPs and another based on a fixed number of base pair (bp) were conducted using FSTest 1.3 and VCFtools 0.1.16. Our results showed that regions with low coverage genotypic data could lead to an overestimation of Fst in sliding window analysis using a fixed number of bp. FSTest 1.3 could mitigate this challenge by estimating the average of consecutive SNPs along the chromosome. FSTest 1.3 allows direct analysis of VCF files with a small amount of code and can calculate Fst estimates on a desktop computer for more than a million SNPs in a few minutes. FSTest 1.3 is freely available at https://github.com/similab/FSTest.
Similar content being viewed by others
References
Anand L., Rodriguez Lopez C. and M. 2022 ChromoMap: an R package for interactive visualization of multi-omics data and annotation of chromosomes. BMC Bioinformatics 23, 33.
Biswas S. and Akey J. M. 2006 Genomic insights into positive selection. Trends Genet. 22, 437–446.
Danecek P., Auton A., Abecasis G., Albers C. A., Banks E., DePristo M. A. et al. 2011 The variant call format and VCFtools. Bioinformatics 27, 2156–2158.
Gusnanto A., Taylor C. C., Nafisah I., Wood H. M., Rabbitts P. and Berri S. 2014 Estimating optimal window size for analysis of low-coverage next-generation sequence data. Bioinformatics 30, 1823–1829.
Holsinger K. E. and Weir B. S. 2009 Genetics in geographically structured populations: defining, estimating and interpreting FST. Nat. Rev. Genet. 10, 639–650.
Hudson R. R., Slatkin M. and Maddison W. P. 1992 Estimation of levels of gene flow from DNA sequence data. Genetics 132, 583.
Nei M. 1973 Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70, 3321–3323.
Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M. A., Bender D. et al. 2007 PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575.
R Core Team 2021 R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Wei T., Simko V., Levy M., Xie Y., Jin Y. and Zemla J. 2017 Package ‘corrplot.’ Statistician 56, e24.
Weir B. S. and Cockerham C. C. 1984 Estimating F-statistics for the analysis of population structure. Evolution 1358–1370..
Wright S. 1949 The genetical structure of populations. Ann. Eugen. 15, 323–354.
Yin L., Zhang H., Tang Z., Xu J., Yin D., Zhang Z. et al. 2021 rMVP: a memory-efficient, visualization-enhanced, and parallel-accelerated tool for genome-wide association study. Genom. Proteom. Bioinform. 19, 619–628.
Acknowledgments
We acknowledge the valuable comments provided by the reviewer, which greatly contributed to improving the quality of the paper. We have received no specific funding for this study.
Author information
Authors and Affiliations
Contributions
SV and SS developed the original FSTest software. SV wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Additional information
Corresponding editor: Divya Tej Sowpati
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Vahedi, S.M., Salek Ardestani, S. FSTest: an efficient tool for cross-population fixation index estimation on variant call format files. J Genet 103, 4 (2024). https://doi.org/10.1007/s12041-023-01459-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12041-023-01459-1