A Novel Algorithm for Hub Protein Identification in H.Sapiens Using Global Amino Acid Features

  • B. L. Aswathi
  • Baharak Goli
  • Achuthsankar S. Nair
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 178)


Identification of hub proteins solely from amino acids in proteome remains an open problem in computational biology that has been getting increasing deliberations with extensive growth in sequence information. In this context, we have chosen to investigate whether hub proteins can be predicted from amino acid sequence information alone. Here, we propose a novel hub identifying algorithm which relies on the use of conformational, physiochemical and pattern characteristics of amino acid sequences. In order to extract the most potential features, two feature selection techniques, CFS (Correlation-based Feature Selection) and ReliefF algorithms were used, which are widely used in data preprocessing for machine learning problems. The performance of two types of neural network classifiers such as RBF network and multilayer perceptron were evaluated with these filtering approaches. Our proposed model led to successful prediction of hub proteins from amino acid sequences alone with 92.98% and 92.61% accuracy for multilayer perceptron and RBF Network respectively with CFS algorithm and 94.69% and 90.89% accuracy for multilayer perceptron and RBF Network respectively using ReliefF algorithm.


Protein hubness Protein protein interaction networks Protein protein interaction feature selection methods machine learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aswathi, B.L., Nair, A.S., Sivasankaran, A., Dhar, P.K.: Identification of hub proteins from sequence. Bioinformation 7 (2011)Google Scholar
  2. 2.
    Tun, K., Rao, R.K., Samavedham, L., Tanaka, H., Dhar, P.K.: Rich can get poor: conversion of hub to non-hub proteins. Systems and Synthetic Biology 2, 75–82 (2009)CrossRefGoogle Scholar
  3. 3.
    He, X., Zhang, J.: Why do hubs tend to be essential in protein networks? PLoS Genetics 2, e88 (2006)CrossRefGoogle Scholar
  4. 4.
    Patil, A., Kinoshita, K., Nakamura, H.: Hub promiscuity in protein-protein interaction networks. International Journal of Molecular Sciences 11, 1930–1943 (2010)CrossRefGoogle Scholar
  5. 5.
    Hsing, M., Byler, K.G., Cherkasov, A.: The use of Gene Ontology terms for predicting highly-connected “hub” nodes in protein-protein interaction networks. BMC Systems Biology 2, 80 (2008)CrossRefGoogle Scholar
  6. 6.
    Srihari, S.: Detecting hubs and quasi cliques in scale-free networks. In: 2008 19th International Conference on Pattern Recognition, pp. 1–4 (2008)Google Scholar
  7. 7.
    Dandekar, T., Snel, B., Huynen, M., Bork, P.: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328 (1998)CrossRefGoogle Scholar
  8. 8.
    Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G.D., Maltsev, N.: The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901 (1999)CrossRefGoogle Scholar
  9. 9.
    Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999)CrossRefGoogle Scholar
  10. 10.
    Enright, J., Iliopoulos, I., Kyrpides, N.C., Ouzounis, C.A.: Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999)CrossRefGoogle Scholar
  11. 11.
    Ge, H., Liu, Z., Church, G.M., Vidal, M.: Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet. 29, 482–486 (2001)CrossRefGoogle Scholar
  12. 12.
    Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288 (1999)CrossRefGoogle Scholar
  13. 13.
    Kerrien, S., Alam-Faruque, Y., Aranda, B., Bancarz, I., Bridge, A., Derow, C., et al.: IntAct–open source resource for molecular interaction data. Nucleic Acids Research 35, D561–D565 (2007),
  14. 14.
    Apweiler, R., Bairoch, A., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., et al.: Uni-Prot: the Universal Protein knowledgebase. Nucleic Acids Research 9, D115–D119 (2004), CrossRefGoogle Scholar
  15. 15.
    Jeffrey, H.J.: Chaos game representation of gene structure. Nucleic Acids Res. 18, 2163–2170 (1990)CrossRefGoogle Scholar
  16. 16.
    Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., Kanehisa, M.: AAindex: amino acid index database, progress report 2008. Nucleic Acids Research 5, D202–D205 (2008),
  17. 17.
    Goli, B., Aswathi, B.L., Nair, A.S.: A Novel Algorithm for Prediction of Protein Coding DNA from Non-coding DNA in Microbial Genomes Using Genomic Composition and Dinucleotide Compositional Skew. In: Meghanathan, N., Chaki, N., Nagamalai, D. (eds.) CCSIT 2012, Part II. LNICST, vol. 85, pp. 535–542. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Hall, M., Holmes, G.: Benchmarking Attribute Selection Techniques for Discrete Class Data Mining. IEEE Trans. Knowl. Data Eng. 15, 1–16 (2003)CrossRefGoogle Scholar
  19. 19.
    Wang, C., Ding, C., Meraz, R.F., Holbrook, S.R.: PSoL.: A positive sample only learn-ing algorithm for finding non-coding RNA genes. Bioinformatics 22, 2590–2596 (2006)CrossRefGoogle Scholar
  20. 20.
    Liu, H., Yu, L.: Towards integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17(3), 1–12 (2005)MATHCrossRefGoogle Scholar
  21. 21.
    Hall, M.A.: Correlation based feature selection for machine learning. Doctoral dissertation, The University of Waikato, Dept. of Comp. Sci. (1999)Google Scholar
  22. 22.
    Marko, R.S., Igor, K.: Theoretical and empirical analysis of relief and rreliefF. Machine Learning Journal 53, 23–69 (2003)MATHCrossRefGoogle Scholar
  23. 23.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256. Morgan Kaufmann Publishers Inc. (1992)Google Scholar
  24. 24.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)Google Scholar
  25. 25.
    Werbos, P.J.: Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University (1974)Google Scholar
  26. 26.
    Parker, D.B.: Learning-logic. Technical report, TR-47, Sloan School of Management. MIT, Cambridge (1985)Google Scholar
  27. 27.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by errorpropagation in Parallel distributed processing. In: Explorations in the Microstructure of Cognition, vol. I. Bradford Books, Cambridge (1986)Google Scholar
  28. 28.
    Achuthsankar, S.N., Sreenadhan, S.P.: An improved digital fltering technique using nucleotide frequency indicators for locating exons. Journal of the Computer Society of India 36, 60–66 (2006)Google Scholar
  29. 29.
    Cherian, B.S., Nair, A.S.: Protein location prediction using atomic composition and global features of the amino acid sequence. Biochemical and Biophysical Research Communications 391, 1670–1674 (2010)CrossRefGoogle Scholar
  30. 30.
    Namboodiri, S., Verma, C., Dhar, P.K., Giuliani, A., Nair, A.S.: Sequence signatures of allosteric proteins towards rational design. Systems and Synthetic Biology 4, 271–280 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • B. L. Aswathi
    • 1
  • Baharak Goli
    • 1
  • Achuthsankar S. Nair
    • 1
  1. 1.Department of Computational Biology and BioinformaticsUniversity of KeralaTrivandrumIndia

Personalised recommendations