Amino Acids

, Volume 42, Issue 5, pp 1703–1713 | Cite as

Predicting sub-cellular localization of tRNA synthetases from their primary structures

Original Article

Abstract

Since endo-symbiotic events occur, all genes of mitochondrial aminoacyl tRNA synthetase (AARS) were lost or transferred from ancestral mitochondrial genome into the nucleus. The canonical pattern is that both cytosolic and mitochondrial AARSs coexist in the nuclear genome. In the present scenario all mitochondrial AARSs are nucleus-encoded, synthesized on cytosolic ribosomes and post-translationally imported from the cytosol into the mitochondria in eukaryotic cell. The site-based discrimination between similar types of enzymes is very challenging because they have almost same physico-chemical properties. It is very important to predict the sub-cellular location of AARSs, to understand the mitochondrial protein synthesis. We have analyzed and optimized the distinguishable patterns between cytosolic and mitochondrial AARSs. Firstly, support vector machines (SVM)-based modules have been developed using amino acid and dipeptide compositions and achieved Mathews correlation coefficient (MCC) of 0.82 and 0.73, respectively. Secondly, we have developed SVM modules using position-specific scoring matrix and achieved the maximum MCC of 0.78. Thirdly, we developed SVM modules using N-terminal, intermediate residues, C-terminal and split amino acid composition (SAAC) and achieved MCC of 0.82, 0.70, 0.39 and 0.86, respectively. Finally, a SVM module was developed using selected attributes of split amino acid composition (SA-SAAC) approach and achieved MCC of 0.92 with an accuracy of 96.00%. All modules were trained and tested on a non-redundant data set and evaluated using fivefold cross-validation technique. On the independent data sets, SA-SAAC based prediction model achieved MCC of 0.95 with an accuracy of 97.77%. The web-server ‘MARSpred’ based on above study is available at http://www.imtech.res.in/raghava/marspred/.

Keywords

Mitochondrial tRNA synthetase Support vector machine Prediction MARSpred 

Notes

Acknowledgments

The authors are thankful to the Council of Scientific and Industrial Research (CSIR) and Department of Biotechnology (DBT), Government of India for financial assistance.

References

  1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  2. Antonellis A, Green ED (2008) The role of aminoacyl-tRNA synthetases in genetic diseases. Annu Rev Genomics Hum Genet 9:87–107PubMedCrossRefGoogle Scholar
  3. Baker MJ, Frazier AE, Gulbis JM, Ryan MT (2007) Mitochondrial protein-import machinery: correlating structure with function. Trends Cell Biol 17:456–464PubMedCrossRefGoogle Scholar
  4. Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16:412–424PubMedCrossRefGoogle Scholar
  5. Berg P (1961) Specificity in protein synthesis. Annu Rev Biochem 30:293–324CrossRefGoogle Scholar
  6. Bhasin M, Raghava GPS (2004) Classification of nuclear receptors based on amino acid composition and dipeptide composition. J Biol Chem 279:23262–23266PubMedCrossRefGoogle Scholar
  7. Bhasin M, Raghava GPS (2005) GPCRsclass: a web tool for classification of amine type of G-protein coupled receptors. Nucleic Acids Res 33:W143–W147PubMedCrossRefGoogle Scholar
  8. Brindefalk B, Viklund J, Larsson D, Thollesson M, Andersson SG (2007) Origin and evolution of the mitochondrial aminoacyl-tRNA synthetases. Mol Biol Evol 24:743–756PubMedCrossRefGoogle Scholar
  9. Chou KC, Shen HB (2007) Recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16PubMedCrossRefGoogle Scholar
  10. Claros MG, Vincens P (1996) Computational method to predict mitochondrially imported proteins and their targeting sequences. Eur J Biochem 241:779–786PubMedCrossRefGoogle Scholar
  11. Doolittle WF (1998) You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes. Trends Genet 14:307–311PubMedCrossRefGoogle Scholar
  12. Duchêne AM, Pujol C, Maréchal-Drouard L (2009) Import of tRNAs and aminoacyl-tRNA synthetases into mitochondria. Curr Genet 55:1–18PubMedCrossRefGoogle Scholar
  13. Español Y, Thut D, Schneider A, de Pouplana LR (2009) A mechanism for functional segregation of mitochondrial and cytosolic genetic codes. Proc Natl Acad Sci USA 106(46):19420–19425PubMedCrossRefGoogle Scholar
  14. Garg A, Bhasin M, Raghava GPS (2005) SVM-based method for subcellular localization of human proteins using amino acid compositions, their order and similarity search. J Biol Chem 280(15):14427–14432PubMedCrossRefGoogle Scholar
  15. Guda C, Guda P, Fahy E, Subramaniam S (2004) MITOPRED: a web server for the prediction of mitochondrial proteins. Nucleic Acids Res 32:W372–W374PubMedCrossRefGoogle Scholar
  16. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA Data Mining Software: an update. SIGKDD Explorations 11(1):10–18Google Scholar
  17. Joachims T (1999) Making large-scale SVM learning particles. In: Scholkopf B, Berges C, Smola A (eds) Advances in kernel methods support vector learning. MIT Press, Cambridge, pp 42–56Google Scholar
  18. Kaur H, Raghava GPS (2004) A neural network method for prediction of beta-turn types in proteins using evolutionary information. Bioinformatics 20:2751–2758PubMedCrossRefGoogle Scholar
  19. Kumar M, Raghava GPS (2009) Prediction of nuclear proteins using SVM and HMM models. BMC Bioinformatics 10:22PubMedCrossRefGoogle Scholar
  20. Kumar M, Bhasin M, Natt NK, Raghava GPS (2005) BhairPred: a webserver for prediction of beta-hairpins in proteins from multiple alignment information using ANN and SVM techniques. Nucleic Acids Res 33:W154–W159PubMedCrossRefGoogle Scholar
  21. Kumar M, Verma R, Raghava GPS (2006) Prediction of mitochondrial proteins using support vector machine and hidden Markov model. J Biol Chem 281(9):5357–5363PubMedCrossRefGoogle Scholar
  22. Kumar M, Gromiha MM, Raghava GPS (2007) Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics 8:463PubMedCrossRefGoogle Scholar
  23. Lee JW, Beebe K, Nangle LA, Jang J, Longo-Guess CM, Cook SA, Muriel TD, Sundberg JP, Schimmel P, Ackerman SL (2006) Editing-defective tRNA synthetase causes protein misfolding and neurodegeration. Nature 443:50–55PubMedCrossRefGoogle Scholar
  24. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large datasets of proteins or nucleotide sequences. Bioinformatics 22:1658–1659PubMedCrossRefGoogle Scholar
  25. Miyaki M, Iijima T, Shiba K, Aki T, Kita Y, Yasuno M, Mori T, Kuroki T, Iwama T (2001) Alterations of repeated sequences in 5′ upstream and coding regions in colorectal tumors from patients with hereditary nonpolyposis colorectal cancer and Turcot syndrome. Oncogene 20:5215–5218PubMedCrossRefGoogle Scholar
  26. Panwar B, Raghava GPS (2010) Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains. BMC Genomics 11:507PubMedCrossRefGoogle Scholar
  27. Park SG, Schimmel P, Kim S (2008) Aminoacyl tRNA synthetases and their connections to disease. Proc Natl Acad Sci USA 105:11043–11049PubMedCrossRefGoogle Scholar
  28. Raghava GPS, Han JH (2005) Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein. BMC Bioinformatics 6:59PubMedCrossRefGoogle Scholar
  29. Rajbhandary UL (1997) Once there were twenty. Proc Natl Acad Sci USA 94:11761–11763PubMedCrossRefGoogle Scholar
  30. Scheper GC, van der Klok T, van Andel RJ, van Berkel CG, Sissler M, Smet J, Muravina TI, Serkov SV, Uziel G, Bugiani M, Schiffmann R, Krägeloh-Mann I, Smeitink JA, Florentz C, Van Coster R, Pronk JC, van der Knaap MS (2007) Mitochondrial aspartyl-tRNA synthetase deficiency causes leukoencephalopathy with brain stem and spinal cord involvement and lactate elevation. Nat Genet 39:534–539PubMedCrossRefGoogle Scholar
  31. Schimmel P (2008) Development of tRNA synthetases and connection to genetic code and disease. Protein Sci 17:1643–1652PubMedCrossRefGoogle Scholar
  32. Small I, Peeters N, Legeai F, Lurin C (2004) Predotar: a tool for rapidly screening proteomes for N-terminal targeting sequences. Proteomics 4(6):1581–1590PubMedCrossRefGoogle Scholar
  33. t Hart LM, Hansen T, Rietveld I, Dekker JM, Nijpels G, Janssen GM, Arp PA, Uitterlinden AG, Jørgensen T, Borch-Johnsen K, Pols HA, Pedersen O, van Duijn CM, Heine RJ, Maassen JA (2005) Evidence that the mitochondrial leucyl tRNA synthetase (LARS2) gene represents a novel type 2 diabetes susceptibility gene. Diabetes 54:1892–1895CrossRefGoogle Scholar
  34. Unseld M, Marienfeld JR, Brandt P, Brennicke A (1997) The mitochondrial genome of Arabidopsis thaliana contains 57 genes in 366, 924 nucleotides. Nat Genet 15:57–61PubMedCrossRefGoogle Scholar
  35. Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10:988–999PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  1. 1.Bioinformatics CentreInstitute of Microbial Technology (CSIR)ChandigarhIndia

Personalised recommendations