Skip to main content
Log in

FSTest: an efficient tool for cross-population fixation index estimation on variant call format files

  • Research Article
  • Published:
Journal of Genetics Aims and scope Submit manuscript

Abstract

Fixation index (Fst) statistics provide critical insights into evolutionary processes affecting the structure of genetic variation within and among populations. Fst statistics have been widely applied in population and evolutionary genetics to identify genomic regions targeted by selection pressures. The FSTest 1.3 software was developed to estimate four Fst statistics of Hudson, Weir and Cockerham, Nei, and Wright using high-throughput genotyping or sequencing data. Here, we introduced FSTest 1.3 and compared its performance with two widely used software VCFtools 0.1.16 and PLINK 2.0. Chromosome 1 of 1000 Genomes Phase III variant data belonging to South Asian (n = 211) and African (n = 274) populations were included as an example case in this study. Different Fst estimates were calculated for each single-nucleotide polymorphism (SNP) in a pairwise comparison of South Asian against African populations, and the results of FSTest 1.3 were confirmed by VCFtools 0.1.16 and PLINK 2.0. Two different sliding window approaches, one based on a fixed number of SNPs and another based on a fixed number of base pair (bp) were conducted using FSTest 1.3 and VCFtools 0.1.16. Our results showed that regions with low coverage genotypic data could lead to an overestimation of Fst in sliding window analysis using a fixed number of bp. FSTest 1.3 could mitigate this challenge by estimating the average of consecutive SNPs along the chromosome. FSTest 1.3 allows direct analysis of VCF files with a small amount of code and can calculate Fst estimates on a desktop computer for more than a million SNPs in a few minutes. FSTest 1.3 is freely available at https://github.com/similab/FSTest.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1

Similar content being viewed by others

References

  • Anand L., Rodriguez Lopez C. and M. 2022 ChromoMap: an R package for interactive visualization of multi-omics data and annotation of chromosomes. BMC Bioinformatics 23, 33.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Biswas S. and Akey J. M. 2006 Genomic insights into positive selection. Trends Genet. 22, 437–446.

    Article  PubMed  CAS  Google Scholar 

  • Danecek P., Auton A., Abecasis G., Albers C. A., Banks E., DePristo M. A. et al. 2011 The variant call format and VCFtools. Bioinformatics 27, 2156–2158.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Gusnanto A., Taylor C. C., Nafisah I., Wood H. M., Rabbitts P. and Berri S. 2014 Estimating optimal window size for analysis of low-coverage next-generation sequence data. Bioinformatics 30, 1823–1829.

    Article  PubMed  CAS  Google Scholar 

  • Holsinger K. E. and Weir B. S. 2009 Genetics in geographically structured populations: defining, estimating and interpreting FST. Nat. Rev. Genet. 10, 639–650.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Hudson R. R., Slatkin M. and Maddison W. P. 1992 Estimation of levels of gene flow from DNA sequence data. Genetics 132, 583.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Nei M. 1973 Analysis of gene diversity in subdivided populations. Proc. Natl. Acad. Sci. USA 70, 3321–3323.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Purcell S., Neale B., Todd-Brown K., Thomas L., Ferreira M. A., Bender D. et al. 2007 PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • R Core Team 2021 R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

  • Wei T., Simko V., Levy M., Xie Y., Jin Y. and Zemla J. 2017 Package ‘corrplot.’ Statistician 56, e24.

    Google Scholar 

  • Weir B. S. and Cockerham C. C. 1984 Estimating F-statistics for the analysis of population structure. Evolution 1358–1370..

  • Wright S. 1949 The genetical structure of populations. Ann. Eugen. 15, 323–354.

    Article  Google Scholar 

  • Yin L., Zhang H., Tang Z., Xu J., Yin D., Zhang Z. et al. 2021 rMVP: a memory-efficient, visualization-enhanced, and parallel-accelerated tool for genome-wide association study. Genom. Proteom. Bioinform. 19, 619–628.

    Article  Google Scholar 

Download references

Acknowledgments

We acknowledge the valuable comments provided by the reviewer, which greatly contributed to improving the quality of the paper. We have received no specific funding for this study.

Author information

Authors and Affiliations

Authors

Contributions

SV and SS developed the original FSTest software. SV wrote the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Seyed Milad Vahedi.

Additional information

Corresponding editor: Divya Tej Sowpati

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vahedi, S.M., Salek Ardestani, S. FSTest: an efficient tool for cross-population fixation index estimation on variant call format files. J Genet 103, 4 (2024). https://doi.org/10.1007/s12041-023-01459-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12041-023-01459-1

Keywords

Navigation