Automated Collection and Sharing of Adaptive Amino Acid Changes Data
When changes at few amino acid sites are the target of selection, adaptive amino acid changes in protein sequences can be identified using maximum-likelihood methods based on models of codon substitution (such as codeml). Such methods have been used numerous times using a variety of different organisms but the time needed to collect the data and prepare the input files means that tens or a couple of hundred coding regions are usually analyzed. Nevertheless, the recent availability of flexible and ease to use computer applications to collect the relevant data (such as BDBM), and infer positively selected amino acid sites (such as ADOPS) means that the whole process is easier and quicker than before, but the lack of a batch option in ADOPS, here reported, still precluded the analysis of hundreds or thousands of sequence files. Given the interest and possibility of running such large scale projects, we also developed a database where ADOPS projects can be stored. Therefore, here we also present B+ that is both a data repository and a convenient interface to look at the information contained in ADOPS projects without the need to download and unzip the corresponding ADOPS project file. The ADOPS projects available at B+ can also be downloaded, unzipped, and opened using the ADOPS graphical interface. The availability of such a database ensures results repeatability, promotes data reuse with significant savings on the time needed for preparing datasets, and allows further exploration of the data contained in ADOPS projects effortlessly.
KeywordsADOPS Positive selection B+ database Open data
This article is a result of the project Norte-01-0145-FEDER-000008 - Porto Neurosciences and Neurologic Disease Research Initiative at I3S, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (FEDER). This work has been also funded by the “Platform of integration of intelligent techniques for analysis of biomedical information” project (TIN2013-47153-C3-3-R) from Spanish Ministry of Economy and Competitiveness. SING group thanks CITI (Centro de Investigación, Transferencia e Innovación) from University of Vigo for hosting its IT infrastructure. H. López-Fernández is supported by a post-doctoral fellowship from Xunta de Galicia.
- 2.Yang, Z.H.: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13(5), 555–556 (1997)Google Scholar
- 6.Zhang, S., Gao, B., Zhu, S.: Target-driven evolution of scorpion toxins. Sci. Rep. 5 (2015). Article No: 14973, doi: 10.1038/srep14973
- 23.Nunes, M.D.S., Santos, R.A.M., Ferreira, S.M., Vieira, J., Vieira, C.P.: Variability patterns and positively selected sites at the gametophytic self-incompatibility pollen SFB gene in a wild self-incompatible Prunus spinosa (Rosaceae) population. New Phytol. 172(3), 577–587 (2006)CrossRefGoogle Scholar