Skip to main content
Log in

A complexity-based method for predicting protein subcellular location

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

A complexity-based approach is proposed to predict subcellular location of proteins. Instead of extracting features from protein sequences as done previously, our approach is based on a complexity decomposition of symbol sequences. In the first step, distance between each pair of protein sequences is evaluated by the conditional complexity of one sequence given the other. Subcellular location of a protein is then determined using the k-nearest neighbor algorithm. Using three widely used data sets created by Reinhardt and Hubbard, Park and Kanehisa, and Gardy et al., our approach shows an improvement in prediction accuracy over those based on the amino acid composition and Markov model of protein sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Andrade MA, O’Donoghue SI, Rost B (1998) Adaptation of protein surfaces to subcellular location. J Mol Biol 276:517–525

    Article  PubMed  CAS  Google Scholar 

  • Bernaola-Galván P, Carpena P, Román-Roldán R, Oliver JL (1999) Compositional complexity of DNA sequence models. Comput Phys Commun 121(1):136–138

    Article  Google Scholar 

  • Bezdek JC, Hall LO, Clarke LP (1993) Review of MR image segmentation techniques using pattern recognition. Med Phys 20:1033–1048

    Article  PubMed  CAS  Google Scholar 

  • Boyd D, Schierle C, Beckwith J (1998) How many membrane proteins are there? Protein Sci 7:201–205

    Article  PubMed  CAS  Google Scholar 

  • Cedano J, Aloy P, Pérez-Pons JA, Querol E (1997) Relation between amino acid composition and cellular location of proteins. J Mol Biol 266:594–600

    Article  PubMed  CAS  Google Scholar 

  • Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins Struct Funct Genet 43:246–255

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277:45765–45769

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2003) A new hybrid approach to predict subcellular localization of proteins by incorporating Gene ontology. Biochem Biophys Res Commun 311:743–747

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Cai YD (2004) Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem Biophys Res Commun 320:1236–1239

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng 12:107–118

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2007) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Shen HB (2008) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162

    Article  PubMed  CAS  Google Scholar 

  • Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30:275–349

    Article  PubMed  CAS  Google Scholar 

  • Diao Y, Ma D, Wen Z, Yin J, Xiang J, Li M (2008) Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel–Ziv complexity. Amino Acids 34(1):111–117

    Article  PubMed  CAS  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York

    Google Scholar 

  • Emanuelsson O, Nielsen H, Brunak S, Von Heijne G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300:1005–1016

    Article  PubMed  CAS  Google Scholar 

  • Feng ZP, Zhang CT (2002) A graphic representation of protein sequence and predicting the subcellular localizations of prokaryotic proteins. Int J Biochem Cell Biol 34:298–307

    Article  PubMed  CAS  Google Scholar 

  • Gao QB, Wang ZZ, Yan C, Du YH (2005) Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett 579:3444–3448

    Article  PubMed  CAS  Google Scholar 

  • Gardy JL, Spencer C, Wang K, Ester M, Tusnady GE, Simon I, Hua S, deFays K, Lambert C, Nakai K, Brinkman FS (2003) PSORT-B: improving protein subcellular Iocalization prediction for Gram-negative bacteria. Nucleic Acids Res 31:3613–3617

    Article  PubMed  CAS  Google Scholar 

  • Guo J, Lin YL, Sun ZR (2005) A novel method for protein subcellular localization: combining residue-couple model and SVM. Proc APBC 2005:117–129

    Google Scholar 

  • Hua SJ, Sun ZR (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17:721–728

    Article  PubMed  CAS  Google Scholar 

  • Huang Y, Li YD (2004) Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20:21–28

    Article  PubMed  CAS  Google Scholar 

  • Lempel A, Ziv J (1976) On the complexity of finite sequence. IEEE T Inform Theory 22:75–81

    Article  Google Scholar 

  • Leszczynski K, Cosby S, Bissett R, Provost D, Boyko S, Loose S, Mvilongo E (1999) Application of a fuzzy pattern classifier to decision making in portal verification of radiotherapy. Phys Med Biol 44:253–269

    Article  PubMed  CAS  Google Scholar 

  • Lu Z, Szafron D, Greiner R, Lu P, Wishart DS, Poulin B, Anvik J, Macdonell C, Eisner R (2004) Predicting subcellular localization of proteins using machine-learned classifiers. Bioinformatics 20:547–556

    Article  PubMed  CAS  Google Scholar 

  • Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London

    Google Scholar 

  • Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451

    PubMed  CAS  Google Scholar 

  • Murphy RF, Boland MV, Velliste M (2000) Towards a systematics for protein subcellular location: quantitative description of protein localization patterns and automated analysis of fluorescence microscope images. Proc Int Conf Intell Syst Mol Biol 8:251–259

    PubMed  CAS  Google Scholar 

  • Nakai K (2000) Protein sorting signals and prediction of subcellular localization. Adv Protein Chem 54:277–344

    Article  PubMed  CAS  Google Scholar 

  • Nakai K, Kanehisa M (1991) Expert system for predicting protein localization sites in Gram-negative bacteria. Proteins 11:95–110

    Article  PubMed  CAS  Google Scholar 

  • Nakashima H, Nishikawa K (1994) Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J Mol Biol 238:54–61

    Article  PubMed  CAS  Google Scholar 

  • Nielsen H, Engelbrecht J, Brunak S, Von Heijne G (1997) A neural network method for identification of prokaryotic and eukaryotic signal perptides and prediction of their cleavage sites. Int J Neural Sys 8:581–599

    Article  CAS  Google Scholar 

  • Nielsen H, Brunak S, Von Heijne G (1999) Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng 12:3–9

    Article  PubMed  CAS  Google Scholar 

  • Orlov YL, Potapov VN (2004) Complexity: an internet resource for analysis of DNA sequence complexity. Nucleic Acids Res 32:628–633

    Article  CAS  Google Scholar 

  • Otu HH, Sayood K (2003) A new sequence distance measure for phylogenetic tree construction. Bioinformatics 19:2122–2130

    Article  PubMed  CAS  Google Scholar 

  • Park KJ, Kanehisa M (2003) Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19:1656–1663

    Article  PubMed  CAS  Google Scholar 

  • Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26:2230–2236

    Article  PubMed  CAS  Google Scholar 

  • Sadovsky MG (2003) The method to compare nucleotide sequences based on minimum entropy principle. Bull Math Biol 65:309–322

    Article  PubMed  CAS  Google Scholar 

  • Troyanskaya OG, Arbell O, Koren Y, Landau GM, Bolshoy A (2002) Sequence complexity profiles of prokaryotic genomic sequences: A fast algorithm for calculating linguistic complexity. Bioinformatics 18(5):679–688

    Article  PubMed  CAS  Google Scholar 

  • Wang J, Zheng X (2008) Comparison of protein secondary structures based on backbone dihedral angles. J Theor Biol 250:382–387

    Article  PubMed  CAS  Google Scholar 

  • Xiao X, Shao S, Ding Y, Huang Z, Huang Y, Chou KC (2005) Using complexity measure factor to predict protein subcellular location. Amino Acids 28:57–61

    Article  PubMed  CAS  Google Scholar 

  • Xie D, Li A, Wang M, Fan Z, Feng H (2005) LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res 33:105–110

    Article  CAS  Google Scholar 

  • Yu CS, Lin CJ, Hwang JK (2004) Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Sci 13:1402–1406

    Article  PubMed  CAS  Google Scholar 

  • Yuan Z (1999) Prediction of protein subcellular locations using Markov chain models. FEBS Lett 451:23–26

    Article  PubMed  CAS  Google Scholar 

  • Ziv J, Lempel A (1977) A universal algorithm for sequential data compression. IEEE T Inform Theory 23:337–343

    Article  Google Scholar 

  • Ziv J, Lempel A (1978) Compression of individual sequences via variable-rate coding. IEEE T Inform Theory 24:530–536

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, X., Liu, T. & Wang, J. A complexity-based method for predicting protein subcellular location. Amino Acids 37, 427–433 (2009). https://doi.org/10.1007/s00726-008-0172-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-008-0172-0

Keywords

Navigation