Skip to main content

Advertisement

Log in

Clustering of protein expression data: a benchmark of statistical and neural approaches

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Clustering issues are fundamental to exploratory analysis of bioinformatics data. This process may follow algorithms that are reproducible but make assumptions about, for instance, the ability to estimate the global structure by successful local agglomeration or alternatively, they use pattern recognition methods that are sensitive to the initial conditions. This paper reviews two clustering methodologies and highlights the differences that result from the changes in data representation, applied to a protein expression data set for breast cancer (n = 1,076). The two clustering methodologies are a reproducible approach to model-free clustering and a probabilistic competitive neural network. The results from the two methods are compared with existing studies of the same data set, and the preferred clustering solutions are profiled for clinical interpretation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abd El-Rehim D, Ball G, Pinder S, Rakha E, Paish C, Robertson J, Macmillan D, Blamey R, Ellis IO (2005) High-throughput protein expression analysis using tissue microarray technology of a large well-characterised series identifies biologically distinct classes of breast cancer confirming recent cDNA expression analyses. Int J Cancer 116:340–350

    Article  Google Scholar 

  • Bacciu D, Starita A (2008) Competitive repetition suppression (CoRe) clustering: a biologically inspired learning model with application to robust clustering. IEEE Trans Neural Netw 19(11):1922–1941

    Article  Google Scholar 

  • Bacciu D, Micheli A, Starita A (2007) Simultaneous clustering and feature ranking by competitive repetition suppression learning with application to gene data. In: International conference on computational intelligence in medicine and healthcare (CIMED’07)

  • Bacciu D, Jarman IH, Etchells TA, Lisboa PJG (2009) Patient stratification with competing risks by multivariate fisher distance. In: International joint conference on neural networks (IJCNN), Atlanta, USA

  • Ben-Hur A, Elisseeff A, Guyon I (2002) A stability based method for discovering structure in clustered data. In: Altman RB, Lauderdalc K (eds) Pacific symposium on biocomputing 2002, Kauai, Hawaii, USA, vol 7, pp 6–17

  • Bishop YMM, Fienberg SE, Holland DW (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge, MA

  • Etchells TA, Lisboa PJG (2006) Orthogonal search-based rule extraction (OSRE) from trained neural networks: a practical and efficient approach. IEEE Trans Neural Netw 17(2):374–384

    Article  Google Scholar 

  • Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Am Stat Assoc 62(320):1159–1178

    Article  MathSciNet  Google Scholar 

  • Green AR, Garibaldi JM, Soria D, Ambrogi F, Ball G, Lisboa PJG, Etchells TA, Boracchi P, Biganzoli E, Macmillan RD, Blamey RW, Ellis IO (2007) Identification of sub-classes of breast cancer through consensus derived from automated clustering methods. Eur J Cancer Suppl 5(3):18

    Google Scholar 

  • Langfelder P, Zhang B, Horvath S (2008) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics 24(5):719–720

    Article  Google Scholar 

  • Lisboa PJG, Ellis IO, Green AR, Ambrogi F, Dias MB (2008) Cluster-based visualisation with scatter matrices. Pattern Recogn Lett 29(13):1814–1823

    Article  Google Scholar 

  • Soria D, Garibaldi JM, Ambrogi F, Green AR, Powe DG, Rakha EA, Macmillan RD, Blamey RW, Ball G, Lisboa PJG, Etchells TA, Boracchi P, Biganzoli E, Ellis IO (2010) A methodology to identify consensus classes from clustering algorithms applied to immunohistochemical data from breast cancer patients. Comput Biol Med 40:318–330

    Article  Google Scholar 

  • Vellido A, Lisboa PJG, Vicente D (2006) Robust analysis of MRS brain tumour data using t-GTM. Neurocomputing 69(7–9):754–768

    Article  Google Scholar 

Download references

Acknowledgments

The authors acknowledge the support of Dr. A. Green and other members of the Breast Cancer Research Team of the University of Nottingham, UK, and Dr. G. Ball, of Nottingham Trent University, UK, towards the collection of high-throughput protein expression dataset used in this study, and financial support from the European Network of Excellence Biopattern (FP6-2002-IST-1 No. 508803).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I. H. Jarman.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jarman, I.H., Etchells, T.A., Bacciu, D. et al. Clustering of protein expression data: a benchmark of statistical and neural approaches. Soft Comput 15, 1459–1469 (2011). https://doi.org/10.1007/s00500-010-0596-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-010-0596-9

Keywords

Navigation