Comparison of Protein Descriptors Used in Hierarchical Multi-label Classification Based on Gene Ontology

Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 150)


Proteins are the main, building cell blocks, responsible for the existing cell biological processes. Therefore, precise knowledge of protein function is of great significance. There are a lot of methods which are used for protein comparison and for determining protein function. Some of them use structure alignment, others use sequence alignment, while some use protein descriptors. Here, we use two protein descriptors: Voxel and Ray-based descriptors to encode the structural and biological features of proteins. In biology there is a trend to hierarchically organize the things, like protein functions, cell components and the whole living world. There are a lot of classification systems which generate proteins in tree structure. However, due to the fact that it often happens that one protein has more than one parent, the Directed Acyclic Graph (DAG) hierarchy is used. Gene Ontology (GO) is a system for structural and hierarchical representation of proteins and gene products which support DAG hierarchy. CLUS, however, is a system which deals with hierarchical data. In this paper, we present a comparison between the two previously mentioned protein descriptors for predicting protein function. Firstly, protein descriptors are extracted from the structural coordinates found in the Protein Data Bank (PDB) and proteins backbone, appropriately. Afterwards, GO class hierarchy is added to each protein which has descriptor data. This created file is used as an input to the CLUS system. CLUS generates a decision tree model which is trained from the protein structure. The results from this system are the GO classes in which the protein belongs. The generated output shows that the predicting protein function with the Voxel protein descriptor gives better results instead of predicting protein function with the Ray protein descriptor.


Gene Ontology CLUS Voxel protein descriptor Ray-based protein descriptor Predicting protein function 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jensen, L.J., Gupta, R., Staerfeldt, H.H., Brunak, S.: Prediction of human protein function according to Gene Ontology categories. Bioinformatics 19(5), 635–642 (2003)CrossRefGoogle Scholar
  2. 2.
    Protein Data Bank,
  3. 3.
    Protein Databases–Genpept,
  4. 4.
    The Reference Sequence-RefSeq Database,
  5. 5.
  6. 6.
    Protein Information Resource,
  7. 7.
    Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRefGoogle Scholar
  8. 8.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)CrossRefGoogle Scholar
  9. 9.
    Taylor, W.R., Orengo, C.A.: Protein structure alignment. Journal of Molecular Biology 208, 1–22 (1989)CrossRefGoogle Scholar
  10. 10.
    Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology 233, 123–138 (1993)CrossRefGoogle Scholar
  11. 11.
    Shindyalov, H.N., Bourne, P.E.: Protein structure alignment by incremental combinatorial extension (ce) of the optimal path. Protein Eng. 9, 739–747 (1998)CrossRefGoogle Scholar
  12. 12.
    Madej, T., Gibrat, J.F., Bryant, S.H.: Threading a database of protein cores. Proteins 23(3), 356–369 (1995)CrossRefGoogle Scholar
  13. 13.
    SCOP: Structural Classification of Proteins,
  14. 14.
    CATH: Protein Structure Classification,
  15. 15.
    FSSP: Families of Structurally Similar Proteins,
  16. 16.
  17. 17.
  18. 18.
    Kalajdziski, S., Mirceva, G., Trivodaliev, K., Davcev, D.: Protein Classification by Matching Voxel Structures. In: IFMBE Proceedings 13th International Conference on Biomedical Engineering, vol. 23, Track 1, pp. 174–178 (2009)Google Scholar
  19. 19.
    Mirceva, G., Davcev, D.: Incorporating several features in the protein ray descriptor for more accurate protein Voxel structure retrieval. In: ACM Workshop on Voxel Object Retrieval (VoxelOR 2010), ACM Multimedia 2010, pp. 51–56 (2010)Google Scholar
  20. 20.
    Vens, C., Struyf, J., Schietgat, L., Dzeroski, S., Blockeel, H.: Decision Trees for Hierarchical Multi-label Classification. Machine Learning 73(2), 185–214 (2008)CrossRefGoogle Scholar
  21. 21.
    Struyf, J., Zenko, B., Blockeel, H., Vens, C., Dzeroski, S.: CLUS: User’s Manual (2010)Google Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Faculty of Computer Science and EngineeringSs. Cyril and Methodious UniversitySkopjeMacedonia

Personalised recommendations