Skip to main content
Log in

Analysing the localisation sites of proteins through neural networks ensembles

  • Original Article
  • Published:
Neural Computing & Applications Aims and scope Submit manuscript

Abstract

Scientists involved in the area of proteomics are currently seeking integrated, customised and validated research solutions to better expedite their work in proteomics analyses and drug discoveries. Some drugs and most of their cell targets are proteins, because proteins dictate biological phenotype. In this context, the automated analysis of protein localisation is more complex than the automated analysis of DNA sequences; nevertheless the benefits to be derived are of same or greater importance. In order to accomplish this target, the right choice of the kind of the methods for these applications, especially when the data set is drastically imbalanced, is very important and crucial. In this paper we investigate the performance of some commonly used classifiers, such as the K nearest neighbours and feed-forward neural networks with and without cross-validation, in a class of imbalanced problems from the bioinformatics domain. Furthermore, we construct ensemble-based schemes using the notion of diversity, and we empirically test their performance on the same problems. The experimental results favour the generation of neural network ensembles as these are able to produce good generalisation ability and significant improvement compared to other single classifier methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Boland MV, Murphy RF (1999) After sequencing: quantitative analysis of protein localization. IEEE Eng Med Biol Sept/Oct:115–119

  2. Liang P, Labedan B, Riley M (2002) Physiological genomics of Escherichia coli protein families. Physiol Genomics 9(1):15–26

    Google Scholar 

  3. Lu Z, Szafron D, Greiner R, Lu P, Wishart DS, Poulin B, Anvik J, Macdonell C, Eisner R (2004) Predicting subcellular localization of proteins using machine learned classifiers. Bioinformatics 20:547–556

    Article  Google Scholar 

  4. Clare A, King RD (2003) Predicting gene function in Saccharomyces cerevisiae. Bioinformatics 19:42–49

    Article  Google Scholar 

  5. Neagu D, Palade V (2003) A neuro-fuzzy approach for fuctional genomics data interpretation and analysis. Neural Comput Appl 12:153–159

    Article  Google Scholar 

  6. Nakai K, Kanehisa M (1991) Expert system for predicting protein localization sites in gram-negative bacteria. Proteins: Struct Funct Genet 11:95–110

    Article  Google Scholar 

  7. Nakai K, Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14:897–911

    Article  Google Scholar 

  8. Horton P, Nakai K (1996) A probabilistic classification system for predicting the cellular localization sites of proteins. In: Proceedings of the 4th international conference on intelligent systems for molecular biology, AAAI Press, St. Louis, pp 109–115

  9. Horton P, Nakai K (1997) Better prediction of protein cellular localization sites with the k nearest neighbors classifier. In: Proceedings of intelligent systems in molecular biology, Halkidiki, Greece, pp 368–383

  10. Cairns P, Huyck C, Mitchell I, Wu W (2001) A comparison of categorisation algorithms for predicting the cellular localization sites of proteins. In: Proceedings of IEEE international workshop on database and expert systems applications, pp 296–300

  11. Bolat B, Yıldırım T (2003) A data selection method for probabilistic neural networks. In: International XII. Turkish symposium on artificial intelligence and neural networks—TAINN, pp 1137–1140

  12. Tan AC, Gilbert D (2003) An empirical comparison of supervised machine learning techniques in bioinformatics. In: Proceedings of the first Asia Pacific bioinformatics conference (APBC 2003), Adelaide, Australia. Australian Computer Society, Sydney. Chen P (ed) Conferences in research and practice in information technology, vol 19, pp 219–222

  13. Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York

    MATH  Google Scholar 

  14. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClellend JL (eds) Parallel distributed processing: explorations in the microstructure of cognition. MIT Press, Cambridge, pp 318–362

    Google Scholar 

  15. Sima J (1996) Back propagation is not efficient. Neural Netw 6:1017–1023

    Article  Google Scholar 

  16. Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. In: Proceedings of international conference on neural networks, San Francisco, CA, pp 586–591

  17. Riedmiller M (1994) RPROP-description and implementation details. Technical Report, University of Karlsruhe, Germany

  18. Udelhoven T, Schutt B (2000) Capability of feed-forward neural networks for a chemical evaluation of sediments with diffuse reflectance spectroscopy. Chemometr Intell Lab Syst 51:9–22

    Article  Google Scholar 

  19. Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12:993–1001

    Article  Google Scholar 

  20. Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. In: Tesauro G, Touretzky D, Leen T (eds) Advances in neural information processing systems, vol 2, pp 650–659

  21. Opitz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198

    MATH  Google Scholar 

  22. Sharkey AJC (1996) On combining artificial neural nets. Connect Sci 8:299–314

    Article  Google Scholar 

  23. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    MATH  MathSciNet  Google Scholar 

  24. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international machine learning conference, pp 148–156

  25. Sharkey AJC, Sharkey NE (1997) Combining diverse neural nets. Knowl Eng Rev 12:231–247

    Article  Google Scholar 

  26. Zenobi G, Cunningham P (2001) Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In: Proceedings of the European conference on machine learning, pp 576–587

  27. Murphy PM, Aha DW (1996) UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn

  28. Blattner FR, Plunkett G, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y (1997) The complete genome sequence of Escherichia coli K-12. Science 277(5331):1453–1474

    Article  Google Scholar 

  29. Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, James Darnell J (2003) Molecular cell biology, 5th edn. Freeman, San Francisco, CA

    Google Scholar 

  30. Van Belle D, Andre B (2001) A genomic view of yeast membrane transporters. Curr Opin Cell Biol 13(4):389–398

    Article  Google Scholar 

  31. Emanuelsson O, Nielsen H, Brunak S, von Heijne G (2000) Predicting Subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300:1005–1016

    Article  Google Scholar 

  32. Igel C, Husken M (2003) Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing 50:105–123

    Article  MATH  Google Scholar 

  33. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International joint conference on artificial intelligence, AAAI Press and MIT Press, pp 223–228

  34. Nugent CD, Lopez JA, Smith AE 1, Black ND (2002) Prediction models in the design of neural network based ECG classifiers: a neural network and genetic programming approach. BMC Med Inform Decis Making 2(1)

  35. Snedecor G, Cochran W (1989) Statistical methods, 8th edn. Iowa State University Press, Ames, IA

    MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank Dr Maria Roubelakis of Oxford University for assistance in biological aspects of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aristoklis D. Anastasiadis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anastasiadis, A.D., Magoulas, G.D. Analysing the localisation sites of proteins through neural networks ensembles. Neural Comput & Applic 15, 277–288 (2006). https://doi.org/10.1007/s00521-006-0029-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-006-0029-y

Keywords

Navigation