Exploring Protein Functional Relationships Using Genomic Information and Data Mining Techniques

  • Jack Y. Yang
  • Mary Qu Yang
  • Okan K. Ersoy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2714)


Anapproach that uses both supervised and unsupervised learning methods for exploring protein functional relationships is reported; we refer to this as Maximum Contrast (MC) tree. The tree is constructed by performing a hierarchical decomposition of the feature space; this step is performed regardless of complex nature of protein functions, i.e. it performs this decomposition even without knowledge of the protein functional class labels. In order to test our algorithm, we have constructed a library of Protein Phylogenetic Profiles for the proteins in the yeast Saccharomyces Cerevisiae with 60 species. Results showed our algorithm compares favorably to other classification algorithms such as the decision tree algorithms C4.5, C5, and to support vector machines.


Support Vector Machine Feature Space Leaf Node Class Label Test Instance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W. & Lipman, D.J. (1997) “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.” Nucleic Acids Res. 25:3389–3402.CrossRefGoogle Scholar
  2. 2.
    Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T. S., Ares M. J., and Haussler, D. (2000), Knowledge-based analysis of microarray gene expression data by using support vector machines, PNAS 97, p. 262–267.Google Scholar
  3. 3.
    Cover, T. M. and Hart, P. E. (1967) “Nearest Neighbor Pattern Classification” IEEE Trans. IT Vol. 13.No.1 P21–27, 1967.zbMATHCrossRefGoogle Scholar
  4. 4.
    Ersoy, O K., Choe W, Bina M (2000) “Neural network schemes for detecting rare events in human genomic DNA” Bioinformatics, Vol. 16no 12 Pages 1062–1072.CrossRefGoogle Scholar
  5. 5.
    Ersoy, O.K., Deng, S.W. (1995). “Parallel, Self-Organizing Neural Networks with Continuous Inputs and Outputs”, IEEE Transactions on Neural Networks Volume 6Number 4, pp. 1037–1044.CrossRefGoogle Scholar
  6. 6.
    Ersoy, O. K. et al (1998) in Algorithm and Architectures (Leondes, C. T. editor) Pages 364–401, Academic Press 1998 (ISBN: 012443861X).Google Scholar
  7. 7.
    Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O., and Eisenberg, D. (1999), A combined algorithm for genome-wide prediction of protein function, Nature 402, p.83–86.CrossRefGoogle Scholar
  8. 8.
    Pavlidis, Paul, Jason Weston, Jinsong Cai and William Noble Grundy. “Learning Gene Functional Classification from Multiple Data Types”. J. of Computational Biology, Vol 9. pp. 401–444.Google Scholar
  9. 9.
    Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., and Yeates, T. O. (1999), Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles, PNAS 96, p. 4285–4288.Google Scholar
  10. 10.
    Yang, Jack, Yang, Mary and Ersoy, O.K. (2002) “Gene finding and protein functional determination by protein phylogenetic profile and computational intelligence,” Intelligent Engineering Systems through Neural Networks, Vol 12. Page 733–740 ASME Press (ISBN: 0791801918)Google Scholar
  11. 11.
    Vert J.(2002) “A tree kernel to analyze phylogenetic profiles”, Bioinformatics, Vol 18Suppl 1. pp. S276–S284.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Jack Y. Yang
    • 1
  • Mary Qu Yang
    • 1
  • Okan K. Ersoy
    • 1
  1. 1.School of Electrical and Computer Engineering Purdue UniversityWest LafayetteUSA

Personalised recommendations