Skip to main content
Log in

Classification of DNA Minor and Major Grooves Binding Proteins According to the NLSs by Data Analysis Methods

  • Published:
Applied Biochemistry and Biotechnology Aims and scope Submit manuscript

Abstract

High-mobility group proteins are a superfamily of DNA-binding proteins that bind to the DNA minor groove and bend it, whereas most of the transcription factors such as centromere protein B (CENP-B), octamer (Oct)-1, growth factor independence 1 (Gfi-1), and WRKY bind to the major groove of DNA. Classification of proteins using their DNA-binding features is the aim of this study. Nuclear localization signals play more important roles in entering DNA-binding proteins to nucleus and doing their functions; therefore, they have been considered as a feature which is important for DNA-binding manner in proteins. Nuclear localization signals (NLSs) were predicted by two prediction web servers, and then, their sequence ordered features were extracted by Chou’s pseudo amino acid composition (PseAAC) and ProtParam. Multilayer perceptron was used as an artificial neural network for analyzing the features by calculating the correlation coefficient and 30-fold cross-validation. Another used data-analyzing program was principal component analysis of the Minitab software. By calculating the eigenvalues and considering five principal components, the sequence length of NLSs was known as the best feature for classifying DNA-binding proteins. Minimum mean squared error (MSE) (0.1098) and the highest R 2 (0.963) mean that there is a significant difference between the NLS length of the DNA major groove and minor groove binder proteins. Results showed that it is possible to classify DNA major groove and minor groove binder proteins by their NLS sequences as a feature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Kosugi, S., Hasebe, M., Tomita, M., & Yanagawa, H. (2009). Systematic identification of cell cycle-dependent yeast nucleocytoplasmic shuttling proteins by prediction of composite motifs. Biophysics and Computational Biology, 106(25), 6.

    Google Scholar 

  2. Bedard, J. E. J., Purnell, J. D., & Ware, S. M. (2006). Nuclear import and export signals are essential for proper cellular trafficking and function of ZIC3. Hum Mol Gen Hum. Mol, 16(2), 12.

    Google Scholar 

  3. Lange, A., Mills, R. E., Lange, C. J., Stewart, M., Devine, S. E., & Corbett, A. H. (2007). Classical nuclear localization signals: definition, function, and interaction with importin. Journal of Biological Chemistry, 282(8), 5.

    Article  Google Scholar 

  4. Itman, C., Miyamoto, Y., Young, J., Jans, D. A., & Loveland, K. L. (2009). Nucleocytoplasmic transport as a driver of mammalian gametogenesis. CDB, 20, 13.

    Google Scholar 

  5. Fontes, M. R. M., Teh, T., & Kobe, B. (2000). Structural basis of recognition of monopartite and bipartite nuclear localization sequences by mammalian importin-alpha. JMB, 297, 12.

    Article  Google Scholar 

  6. Mincer, J. S., & Simon, S. M. (2011). Simulations of nuclear pore transport yield mechanistic insights and quantitative predictions. Cell Biol, 108(31), 8.

    Google Scholar 

  7. Leung, S. W., Harreman, M. T., Hodel, M. R., Hodel, A. E., & Corbett, A. H. (2003). Dissection of the karyopherin nuclear localization signal (NLS)-binding groove. Journal of Biological Chemistry, 278(43), 7.

    Article  Google Scholar 

  8. Hébert, E. (2003). Improvement of exogenous DNA nuclear importation by nuclear localization signal-bearing vectors: a promising way for non-viral gene therapy? Molecular Biology of the Cell, 95, 10.

    Google Scholar 

  9. Ketha, K. M. V., & Atreya, C. D. (2008). Application of bioinformatics-coupled experimental analysis reveals a new transport-competent nuclear localization signal in the nucleoprotein of influenza A virus strain. BMC Cell Biology, 9(22), 12.

    Google Scholar 

  10. Rodríguez, M., Benito, A., Tubert, P., Castro, J., Ribó, M., Beaumelle, B., et al. (2006). A cytotoxic ribonuclease variant with a discontinuous nuclear localization signal constituted by basic residues scattered over three areas of the molecule. Journal of Molecular Biology, 360, 10.

    Article  Google Scholar 

  11. Tóth, E., Kulcsár, P. I., Fodor, E., Ayaydin, F., Kalmár, L., Borsy, A. É., et al. (2013). The highly conserved, N-terminal (RXXX) 8 motif of mouse Shadoo mediates nuclear accumulation. Biochimica et Biophysica Acta, 1833, 13.

    Google Scholar 

  12. Zhang, Q., & Wang, Y. (2010). HMG modifications and nuclear function. Biochimica et Biophysica Acta, 1799, 20.

    Google Scholar 

  13. Stros, M., Launholt, D., & Grasser, K. D. (2007). The HMG-box: a versatile protein domain occurring in a wide variety of DNA-binding proteins. Cellular and Molecular Life Sciences, 64, 17.

    Article  Google Scholar 

  14. Argentaro, A., Sim, H., Kelly, S., Preiss, S., Clayton, A., Jans, D. A., et al. (2003). A SOX9 defect of calmodulin-dependent nuclear import in campomelic dysplasia/autosomal sex reversal. The Journal of Biological Chemistry, 278(36), 9.

    Article  Google Scholar 

  15. Yang, Q.-w., Wang, J.-Z., Li, J.-C., Zhou, Y., Qi-Zhong, Lu, F.-L., et al. (2010). High-mobility group protein box-1 and its relevance to cerebral ischemia. Journal of Cerebral Blood Flow & Metabolism, 30, 12.

    Article  Google Scholar 

  16. Jiang, X. G., & Wang, Y. (2006). Phosphorylation of human high mobility group N1 protein by protein kinase CK2. Biochemical and Biophysical Research Communications, 345, 7.

    Article  Google Scholar 

  17. Pabo, C. (1984). Protein-DNA recognition. Annual Reviews of Biochemistry, 53, 29.

    Article  Google Scholar 

  18. Ulloa, L., & Messmer, D. (2006). High-mobility group box 1 (HMGB1) protein: friend and foe. Cytokine and Growth Factor Reviews, 17, 13.

    Article  Google Scholar 

  19. Furusawa, T., & Cherukuri, S. (2009). Developmental function of HMGN proteins. Biochimica Biophysica Acta, 1799, 11.

    Google Scholar 

  20. Assfalg, J., Gong, J., Kriegel, H.-P., Pryakhin, A., Wei, T., & Zimek, A. (2009). Supervised ensembles of prediction methods for subcellular localization. Journal of Bioinformatics and Computational Biology, 7(2), 17.

    Article  Google Scholar 

  21. Mehdi, A. M., Sehgal, M. S. B., Kobe, B., Bailey, T. L., & Bodén, M. (2011). A probabilistic model of nuclear import of proteins. Bioinformatics, 27(9), 8.

    Article  Google Scholar 

  22. Nair, R., & Rost, B. (2005). Mimicking cellular sorting improves prediction of subcellular localization. Journal of Molecular Biology, 348, 16.

    Article  Google Scholar 

  23. Nakai, K., & Horton, P. (1999). PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends in Biochemical Sciences, 24(1), 3.

    Article  Google Scholar 

  24. Gardy, J. L., Spencer, C., Wang, K., Ester, M., Tusnády, G. E., Simon, I. N., et al. (2003). PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Research, 31(13), 5.

    Article  Google Scholar 

  25. Lange, A., McLane, L. M., Mills, R. E., Devine, S. E., & Corbett, A. H. (2010). Expanding the definition of the classical bipartite nuclear localization signal. Traffic, 11(3), 26.

    Article  Google Scholar 

  26. Wagner, R., & Pfannschmidt, T. (2006). Eukaryotic transcription factors in plastids—bioinformatic assessment and implications for the evolution of gene expression machineries in plants. Gene, 381, 9.

    Article  Google Scholar 

  27. Eddy, S. R. (2004). What is a hidden Markov model? Nature Biotechnology, 22, 2.

    Google Scholar 

  28. Sullivan, K. F., & Glass, C. A. (1991). CENP-B is a highly conserved mammalian centromere protein with homology to the helix-loop-helix family of proteins. Chromosoma, 100, 11.

    Article  Google Scholar 

  29. Chan, G. K. T., Schaar, B. T., & Yen, T. J. (1998). Characterization of the kinetochore binding domain of CENP-E reveals interactions with the kinetochore proteins CENP-F and hBUBR1. The Journal of Cell Biology, 143(1), 15.

    Article  Google Scholar 

  30. Tanaka, Y., Nureki, O., Kurumizaka, H., Fukai, S., Kawaguchi, S., Ikuta, M., et al. (2001). Crystal structure of the CENP-B protein-DNA complex: the DNA-binding domains of CENP-B induce kinks in the CENP-B box DNA. EMBO, 20(23), 7.

    Google Scholar 

  31. Kristie, T. M., & Sharp, P. A. (1990). Interactions of the Oct-1 POU subdomains with specific DNA sequences and with the HSV alpha-trans-activator protein. Genes and Development, 4, 15.

    Article  Google Scholar 

  32. Sturm, R. A., Das, G., & Herr, W. (1988). The ubiquitous octamer-binding protein Oct-1 contains a POU domain with a homeo box subdomain. Genes and Development, 2, 19.

    Article  Google Scholar 

  33. Mysiak, M. E., Wyman, C., Holthuizen, P. E., & Vliet, P. C. (2004). NFI and Oct-1 bend the Ad5 origin in the same direction leading to optimal DNA replication. Nucleic Acids Research, 32(21), 8.

    Article  Google Scholar 

  34. Duan, Z., & Horwitz, M. (2011). Targets of the transcriptional repressor oncoprotein Gfi-1. PNAS, 100(10), 6.

    Google Scholar 

  35. Meer, L. T., Jansen, J. H., & Reijden, B. A. (2010). Gfi1 and Gfi1b: key regulators of hematopoiesis. Leukemia, 24, 10.

    Google Scholar 

  36. Yücel, R., Kosan, C., Heyd, F., & Möröy, T. (2004). Mutant reveals differential expression and autoregulation of the growth factor independence 1 (Gfi1) gene during lymphocyte development. Journal of Biological Chemistry, 279, 14.

    Article  Google Scholar 

  37. Rushton, P. J., Somssich, I. E., Ringler, P., & Shen, Q. J. (2010). WRKY transcription factors. Trends in Plant Science, 15(5), 12.

    Article  Google Scholar 

  38. Ulker, B., & Somssich, I. E. (2004). WRKY transcription factors: from DNA binding towards biological function. Current Opinion in Plant Biology, 7, 8.

    Article  Google Scholar 

  39. Pan, Y.-J., Cho, C.-C., Kao, Y.-Y., & Sun, C.-H. (2009). A novel WRKY-like protein involved in transcriptional activation of cyst wall protein genes in Giardia lamblia. Journal of Molecular Biology, 284(27), 14.

    Google Scholar 

  40. Li, W., Jaroszewski, L., & Godzik, A. (2001). Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics, 17(3), 2.

    Article  Google Scholar 

  41. Li, W., & Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 22(13), 2.

    Article  Google Scholar 

  42. Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: accelerated for clustering the next generation sequencing data. Bioinformatics, 28(23), 3.

    Article  Google Scholar 

  43. Ba, N., Alex, N., Pogoutse, A., Provart, N., & Moses, A. M. (2009). NLStradamus: a simple hidden Markov model for nuclear localization signal prediction. BMC Bioinformatics, 10, 202.

  44. Marlow, H., Roettinger, E., Boekhout, M., & Martindale, M. Q. (2012). Functional roles of Notch signaling in the cnidarian Nematostella vectensis. Developmental Biology, 362, 14.

    Article  Google Scholar 

  45. Rolando, M., Sanulli, S., Rusniok, C., Gomez-Valero, L., Bertholet, C., Sahr, T., et al. (2013). Legionella pneumophila effector RomA uniquely modifies host chromatin to repress gene expression and promote intracellular bacterial replication. Cell Press, 13, 11.

    Google Scholar 

  46. Su, Z., Li, R., Song, X., Liu, G., Li, Y., Chang, X., et al. (2012). Identification of a novel isoform of DHRS4 protein with a nuclear localization signal. Gene, 494, 7.

    Article  Google Scholar 

  47. Marfori, M., Mynott, A., Ellis, J. J., Mehdi, A. M., Saunders, N. F. W., Curmi, P. M., et al. (2011). Molecular basis for specificity of nuclear import and prediction of nuclear localization. Biochimica et Biophysica Acta, 1813, 16.

    Google Scholar 

  48. Blount, B. A., Weenink, T., & Ellis, T. (2012). Construction of synthetic regulatory networks in yeast. FEBS Letters, 586, 10.

    Article  Google Scholar 

  49. Tsugama, D., Liu, S., & Takano, T. (2012). A putative myristoylated 2C-type protein phosphatase, PP2C74, interacts with SnRK1 in Arabidopsis. FEBS Letters, 586, 6.

    Article  Google Scholar 

  50. Okamoto, K., Nakatsukasa, M., Alié, A., Masuda, Y., Agata, K., & Funayama, N. (2012). The active stem cell specific expression of sponge Musashi homolog EflMsiA suggests its involvement in maintaining the stem cell state. Mechanisms of Development, 129, 14.

    Article  Google Scholar 

  51. Lin, W.-Z., Fang, J.-A., Xiao, X., & Chou, K.-C. (2011). iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One, 6(9), 7.

    Google Scholar 

  52. Chou, K. C. (2011). Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology, 273(1), 10.

    Article  Google Scholar 

  53. Nanni, L., Brahnam, S., & Lumini, A. (2010). High performance set of PseAAC and sequence based descriptors for protein classification. Journal of Theoretical Biology, 266(7), 11.

    Google Scholar 

  54. Nanni, L., Brahnam, S., & Lumini, A. (2012). Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids, 43, 9.

    Article  Google Scholar 

  55. Shen, H.-B., & Chou, K.-C. (2007). PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical Biochemistry, 373(2), 3.

    Google Scholar 

  56. Chou, K.-C., & Cai, Y.-D. (2003). Predicting protein quaternary structure by pseudo amino acid composition. Proteins, 53, 8.

    Article  Google Scholar 

  57. Fang, Y., Guo, Y., Feng, Y., & Li, M. (2008). Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids, 34, 7.

    Article  Google Scholar 

  58. Gao, Q.-B., Zhao, H., Ye, X., & He, J. (2012). Prediction of pattern recognition receptor family using pseudo-amino acid composition. Biochemical and Biophysical Research Communications, 417, 5.

    Article  Google Scholar 

  59. Mohabatkar, H. (2010). Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein & Peptide Letters, 17, 8.

    Article  Google Scholar 

  60. Mohabatkar, H., Beigi, M. M., & Esmaeili, A. (2011). Prediction of GABA A receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine. Journal of Theoretical Biology, 281, 6.

    Article  Google Scholar 

  61. Khosravian, M., Faramarzi, F. K., Beigi, M., Behbahani, M., & Mohabatkar, H. (2012). Predicting antibacterial peptides by the concept of Chou’s pseudo-amino acid composition and machine learning methods. Protein and Peptide Letters, 20(2), 7.

    Article  Google Scholar 

  62. Esmaeili, M., Mohabatkar, H., & Mohsenzadeh, S. (2010). Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. Journal of Theoretical Biology, 263(2), 7.

    Article  Google Scholar 

  63. Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M.R., Appel, R.d. & Bairoch, A. (2005). Protein identification and analysis tools on the ExPASy server. In J. M. Walker (Ed.). 37.

  64. Leeds, J., McAlindon, M. E., Grant, J., Robson, H. E., Morley, S. R., James, G., et al. (2011). Albumin level and patient age predict outcomes in patients referred for gastrostomy insertion: internal and external validation of a gastrostomy score and comparison with artificial neural networks. Gastrointestinal Endoscopy, 74(5), 10.

    Article  Google Scholar 

  65. Bologna, G. (2004). Is it worth generating rules from neural network ensembles? Journal of Applied Logic, 2, 24.

    Article  Google Scholar 

  66. Dehghani, M. R., Modarress, H., & Bakhshi, A. (2006). Modeling and prediction of activity coefficient ratio of electrolytes in aqueous electrolyte solution containing amino acids using artificial neural network. Fluid Phase Equilibria, 244, 7.

    Article  Google Scholar 

  67. Acquaah-Mensah, G. K., Leach, S. M., & Guda, C. (2006). Predicting the subcellular localization of human proteins using machine learning and exploratory data analysis. Genomics and Proteomics Bioinformatics, 4(2), 14.

    Article  Google Scholar 

  68. Murtagh, F. (1991). Multilayer perceptrons for classification and regression. Neurocomputing, 2(5–6), 15.

    Google Scholar 

  69. González, A., & Dorronsoro, J. R. (2006). Natural conjugate gradient training of multilayer perceptrons. Neurocomputing, 71(13–15), 7.

    Google Scholar 

  70. Eller, P. R., Cheng, J.-R. C., & Maier, R. S. (2012). Dynamic linear solver selection for transient simulations using multi-label classifiers. Procedia Computer Science, 9, 10.

    Article  Google Scholar 

  71. Maisuradze, G., Liwo, A., & Scheraga, H. A. (2009). Principal component analysis for protein folding dynamics. Journal of Molecular Biology, 358, 10.

    Google Scholar 

  72. Das, G., Gentile, F., Coluccio, M. L., Perri, A. M., Nicastri, A., Mecarini, F., et al. (2011). Principal component analysis based methodology to distinguish protein SERS spectra. Journal of Molecular Structure, 993, 6.

    Article  Google Scholar 

  73. Tsai, C.-Y., & Chiu, C.-C. (2008). An efficient conserved region detection method for multiple protein sequences using principal component analysis and wavelet transform. Pattern Recognition Letters, 29, 13.

    Google Scholar 

  74. Wong, J. H., Marx, D. B., Wilson, J. D., Buchanan, B. B., Lemaux, P. G., & Pedersen, J. F. (2010). Principal component analysis and biochemical characterization of protein and starch reveal primary targets for improving sorghum grain. Plant Science, 179, 14.

    Article  Google Scholar 

  75. Miranda, A.A., Borgne, Y.-A.e.L., & Bontempi, G. (2007). New routes from minimal approximation error to principal components. The Netherlands: Kluwer Academic Publishers, p. 14.

  76. Buciński, A., Bączek, T., Krysiński, J., Szoszkiewicz, R., & Załuski, J. (2007). Clinical data analysis using artificial neural networks (ANN) and principal component analysis (PCA) of patients with breast cancer after mastectomy. Reports of Practical Oncology and Radiotherapy, 12(1), 9.

    Article  Google Scholar 

  77. Pardo, R., Vega, M., Deban, L., Cazurro, C., & Carretero, C. (2008). Modelling of chemical fractionation patterns of metals in soils by two-way and three-way principal component analysis. Analytica Chimica Acta, 606, 11.

    Article  Google Scholar 

  78. Schechtman, E., & Sherman, M. (2007). The two-sample t-test with a known ratio of variances. Statistical Methodology, 4, 7.

    Article  Google Scholar 

  79. Jain, N., Thatte, J., Braciale, T., Ley, K., O’Connell, M., & Lee, J. K. (2003). Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics, 19(15), 1945–1951.

    Article  CAS  Google Scholar 

  80. Sandoval, J. A., Dobrolecki, L. E., Huang, J., Grosfeld, J. L., Hickey, R. J., & Malkas, L. H. (2006). Neuroblastoma detection using serum proteomic profiling: a novel mining technique for cancer? Journal of Pediatric Surgery, 41, 8.

    Google Scholar 

  81. Kurylowicz, M., Yu, C.-H., & Pomès, R. (2010). Systematic study of anharmonic features in a principal component analysis of gramicidin A. Biophysical Journal, 98, 10.

    Article  Google Scholar 

  82. Pant, S. D., Schenkel, F. S., Verschoor, C. P., You, Q., Kelton, D. F., Moore, S. S., et al. (2010). A principal component regression based genome wide analysis approach reveals the presence of a novel QTL on BTA7 for MAP resistance in Holstein cattle. Genomics, 95, 7.

    Article  Google Scholar 

  83. Tang, Y., & Li, J. (2010). Another neural network based approach for computing eigenvalues and eigenvectors of real skew-symmetric matrices. Computers & Mathematics with Applications, 60(5), 8.

    Article  Google Scholar 

  84. Liu, X., Kruger, U., Littler, T., Xie, L., & Wang, S. (2009). Moving window kernel PCA for adaptive monitoring of nonlinear processes. Chemometrics and Intelligent Laboratory Systems, 96(2), 12.

    Article  Google Scholar 

  85. Kim, D., & Lee, I.-B. (2003). Process monitoring based on probabilistic PCA. Chemometrics and Intelligent Laboratory Systems, 67(2), 18.

    Article  Google Scholar 

  86. Erhel, J., Burrage, K., & Pohl, B. (1996). Restarted GMRES preconditioned by deflation. Journal of Computational and Applied Mathematics, 69, 16.

    Article  Google Scholar 

  87. Smith, D. B. (2013). A Sufficient Condition for the Existence of a Principal Eigenvalue for Nonlocal Diffusion Equations with Applications. Journal of Mathematical Analysis and Applications, 418(2): 766--774.

  88. Brameier, M., Krings, A., & MacCallum, R. M. (2007). NucPred—predicting nuclear localization of proteins. Bioinformatics, 23(9), 2.

    Article  Google Scholar 

  89. Li, Y., Oh, H. J., & Lau, Y.-F. C. (2006). The poly(ADP-ribose) polymerase 1 interacts with Sry and modulates its biological functions. Molecular and Cellular Endocrinology, 257–258(26), 12.

    Google Scholar 

  90. Malina, J., Kasparkova, J., Natile, G., & Brabec, V. (2002). Recognition of major DNA adducts of enantiomeric cisplatin analogs by HMG box proteins and nucleotide excision repair of these adducts. Chemistry & Biology, 9, 10.

    Article  Google Scholar 

  91. Won, H.-H. & Cho, S.-B. (2003). Neural network ensemble with negatively correlated features for cancer classification. Springer, Berlin. 200: p. 8.

Download references

Acknowledgments

Support of this research by the University of Isfahan is acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hassan Mohabatkar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amanzadeh, E., Mohabatkar, H. & Biria, D. Classification of DNA Minor and Major Grooves Binding Proteins According to the NLSs by Data Analysis Methods. Appl Biochem Biotechnol 174, 437–451 (2014). https://doi.org/10.1007/s12010-014-0926-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12010-014-0926-y

Keywords

Navigation