Abstract
High-mobility group proteins are a superfamily of DNA-binding proteins that bind to the DNA minor groove and bend it, whereas most of the transcription factors such as centromere protein B (CENP-B), octamer (Oct)-1, growth factor independence 1 (Gfi-1), and WRKY bind to the major groove of DNA. Classification of proteins using their DNA-binding features is the aim of this study. Nuclear localization signals play more important roles in entering DNA-binding proteins to nucleus and doing their functions; therefore, they have been considered as a feature which is important for DNA-binding manner in proteins. Nuclear localization signals (NLSs) were predicted by two prediction web servers, and then, their sequence ordered features were extracted by Chou’s pseudo amino acid composition (PseAAC) and ProtParam. Multilayer perceptron was used as an artificial neural network for analyzing the features by calculating the correlation coefficient and 30-fold cross-validation. Another used data-analyzing program was principal component analysis of the Minitab software. By calculating the eigenvalues and considering five principal components, the sequence length of NLSs was known as the best feature for classifying DNA-binding proteins. Minimum mean squared error (MSE) (0.1098) and the highest R 2 (0.963) mean that there is a significant difference between the NLS length of the DNA major groove and minor groove binder proteins. Results showed that it is possible to classify DNA major groove and minor groove binder proteins by their NLS sequences as a feature.
Similar content being viewed by others
References
Kosugi, S., Hasebe, M., Tomita, M., & Yanagawa, H. (2009). Systematic identification of cell cycle-dependent yeast nucleocytoplasmic shuttling proteins by prediction of composite motifs. Biophysics and Computational Biology, 106(25), 6.
Bedard, J. E. J., Purnell, J. D., & Ware, S. M. (2006). Nuclear import and export signals are essential for proper cellular trafficking and function of ZIC3. Hum Mol Gen Hum. Mol, 16(2), 12.
Lange, A., Mills, R. E., Lange, C. J., Stewart, M., Devine, S. E., & Corbett, A. H. (2007). Classical nuclear localization signals: definition, function, and interaction with importin. Journal of Biological Chemistry, 282(8), 5.
Itman, C., Miyamoto, Y., Young, J., Jans, D. A., & Loveland, K. L. (2009). Nucleocytoplasmic transport as a driver of mammalian gametogenesis. CDB, 20, 13.
Fontes, M. R. M., Teh, T., & Kobe, B. (2000). Structural basis of recognition of monopartite and bipartite nuclear localization sequences by mammalian importin-alpha. JMB, 297, 12.
Mincer, J. S., & Simon, S. M. (2011). Simulations of nuclear pore transport yield mechanistic insights and quantitative predictions. Cell Biol, 108(31), 8.
Leung, S. W., Harreman, M. T., Hodel, M. R., Hodel, A. E., & Corbett, A. H. (2003). Dissection of the karyopherin nuclear localization signal (NLS)-binding groove. Journal of Biological Chemistry, 278(43), 7.
Hébert, E. (2003). Improvement of exogenous DNA nuclear importation by nuclear localization signal-bearing vectors: a promising way for non-viral gene therapy? Molecular Biology of the Cell, 95, 10.
Ketha, K. M. V., & Atreya, C. D. (2008). Application of bioinformatics-coupled experimental analysis reveals a new transport-competent nuclear localization signal in the nucleoprotein of influenza A virus strain. BMC Cell Biology, 9(22), 12.
Rodríguez, M., Benito, A., Tubert, P., Castro, J., Ribó, M., Beaumelle, B., et al. (2006). A cytotoxic ribonuclease variant with a discontinuous nuclear localization signal constituted by basic residues scattered over three areas of the molecule. Journal of Molecular Biology, 360, 10.
Tóth, E., Kulcsár, P. I., Fodor, E., Ayaydin, F., Kalmár, L., Borsy, A. É., et al. (2013). The highly conserved, N-terminal (RXXX) 8 motif of mouse Shadoo mediates nuclear accumulation. Biochimica et Biophysica Acta, 1833, 13.
Zhang, Q., & Wang, Y. (2010). HMG modifications and nuclear function. Biochimica et Biophysica Acta, 1799, 20.
Stros, M., Launholt, D., & Grasser, K. D. (2007). The HMG-box: a versatile protein domain occurring in a wide variety of DNA-binding proteins. Cellular and Molecular Life Sciences, 64, 17.
Argentaro, A., Sim, H., Kelly, S., Preiss, S., Clayton, A., Jans, D. A., et al. (2003). A SOX9 defect of calmodulin-dependent nuclear import in campomelic dysplasia/autosomal sex reversal. The Journal of Biological Chemistry, 278(36), 9.
Yang, Q.-w., Wang, J.-Z., Li, J.-C., Zhou, Y., Qi-Zhong, Lu, F.-L., et al. (2010). High-mobility group protein box-1 and its relevance to cerebral ischemia. Journal of Cerebral Blood Flow & Metabolism, 30, 12.
Jiang, X. G., & Wang, Y. (2006). Phosphorylation of human high mobility group N1 protein by protein kinase CK2. Biochemical and Biophysical Research Communications, 345, 7.
Pabo, C. (1984). Protein-DNA recognition. Annual Reviews of Biochemistry, 53, 29.
Ulloa, L., & Messmer, D. (2006). High-mobility group box 1 (HMGB1) protein: friend and foe. Cytokine and Growth Factor Reviews, 17, 13.
Furusawa, T., & Cherukuri, S. (2009). Developmental function of HMGN proteins. Biochimica Biophysica Acta, 1799, 11.
Assfalg, J., Gong, J., Kriegel, H.-P., Pryakhin, A., Wei, T., & Zimek, A. (2009). Supervised ensembles of prediction methods for subcellular localization. Journal of Bioinformatics and Computational Biology, 7(2), 17.
Mehdi, A. M., Sehgal, M. S. B., Kobe, B., Bailey, T. L., & Bodén, M. (2011). A probabilistic model of nuclear import of proteins. Bioinformatics, 27(9), 8.
Nair, R., & Rost, B. (2005). Mimicking cellular sorting improves prediction of subcellular localization. Journal of Molecular Biology, 348, 16.
Nakai, K., & Horton, P. (1999). PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends in Biochemical Sciences, 24(1), 3.
Gardy, J. L., Spencer, C., Wang, K., Ester, M., Tusnády, G. E., Simon, I. N., et al. (2003). PSORT-B: improving protein subcellular localization prediction for Gram-negative bacteria. Nucleic Acids Research, 31(13), 5.
Lange, A., McLane, L. M., Mills, R. E., Devine, S. E., & Corbett, A. H. (2010). Expanding the definition of the classical bipartite nuclear localization signal. Traffic, 11(3), 26.
Wagner, R., & Pfannschmidt, T. (2006). Eukaryotic transcription factors in plastids—bioinformatic assessment and implications for the evolution of gene expression machineries in plants. Gene, 381, 9.
Eddy, S. R. (2004). What is a hidden Markov model? Nature Biotechnology, 22, 2.
Sullivan, K. F., & Glass, C. A. (1991). CENP-B is a highly conserved mammalian centromere protein with homology to the helix-loop-helix family of proteins. Chromosoma, 100, 11.
Chan, G. K. T., Schaar, B. T., & Yen, T. J. (1998). Characterization of the kinetochore binding domain of CENP-E reveals interactions with the kinetochore proteins CENP-F and hBUBR1. The Journal of Cell Biology, 143(1), 15.
Tanaka, Y., Nureki, O., Kurumizaka, H., Fukai, S., Kawaguchi, S., Ikuta, M., et al. (2001). Crystal structure of the CENP-B protein-DNA complex: the DNA-binding domains of CENP-B induce kinks in the CENP-B box DNA. EMBO, 20(23), 7.
Kristie, T. M., & Sharp, P. A. (1990). Interactions of the Oct-1 POU subdomains with specific DNA sequences and with the HSV alpha-trans-activator protein. Genes and Development, 4, 15.
Sturm, R. A., Das, G., & Herr, W. (1988). The ubiquitous octamer-binding protein Oct-1 contains a POU domain with a homeo box subdomain. Genes and Development, 2, 19.
Mysiak, M. E., Wyman, C., Holthuizen, P. E., & Vliet, P. C. (2004). NFI and Oct-1 bend the Ad5 origin in the same direction leading to optimal DNA replication. Nucleic Acids Research, 32(21), 8.
Duan, Z., & Horwitz, M. (2011). Targets of the transcriptional repressor oncoprotein Gfi-1. PNAS, 100(10), 6.
Meer, L. T., Jansen, J. H., & Reijden, B. A. (2010). Gfi1 and Gfi1b: key regulators of hematopoiesis. Leukemia, 24, 10.
Yücel, R., Kosan, C., Heyd, F., & Möröy, T. (2004). Mutant reveals differential expression and autoregulation of the growth factor independence 1 (Gfi1) gene during lymphocyte development. Journal of Biological Chemistry, 279, 14.
Rushton, P. J., Somssich, I. E., Ringler, P., & Shen, Q. J. (2010). WRKY transcription factors. Trends in Plant Science, 15(5), 12.
Ulker, B., & Somssich, I. E. (2004). WRKY transcription factors: from DNA binding towards biological function. Current Opinion in Plant Biology, 7, 8.
Pan, Y.-J., Cho, C.-C., Kao, Y.-Y., & Sun, C.-H. (2009). A novel WRKY-like protein involved in transcriptional activation of cyst wall protein genes in Giardia lamblia. Journal of Molecular Biology, 284(27), 14.
Li, W., Jaroszewski, L., & Godzik, A. (2001). Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics, 17(3), 2.
Li, W., & Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 22(13), 2.
Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: accelerated for clustering the next generation sequencing data. Bioinformatics, 28(23), 3.
Ba, N., Alex, N., Pogoutse, A., Provart, N., & Moses, A. M. (2009). NLStradamus: a simple hidden Markov model for nuclear localization signal prediction. BMC Bioinformatics, 10, 202.
Marlow, H., Roettinger, E., Boekhout, M., & Martindale, M. Q. (2012). Functional roles of Notch signaling in the cnidarian Nematostella vectensis. Developmental Biology, 362, 14.
Rolando, M., Sanulli, S., Rusniok, C., Gomez-Valero, L., Bertholet, C., Sahr, T., et al. (2013). Legionella pneumophila effector RomA uniquely modifies host chromatin to repress gene expression and promote intracellular bacterial replication. Cell Press, 13, 11.
Su, Z., Li, R., Song, X., Liu, G., Li, Y., Chang, X., et al. (2012). Identification of a novel isoform of DHRS4 protein with a nuclear localization signal. Gene, 494, 7.
Marfori, M., Mynott, A., Ellis, J. J., Mehdi, A. M., Saunders, N. F. W., Curmi, P. M., et al. (2011). Molecular basis for specificity of nuclear import and prediction of nuclear localization. Biochimica et Biophysica Acta, 1813, 16.
Blount, B. A., Weenink, T., & Ellis, T. (2012). Construction of synthetic regulatory networks in yeast. FEBS Letters, 586, 10.
Tsugama, D., Liu, S., & Takano, T. (2012). A putative myristoylated 2C-type protein phosphatase, PP2C74, interacts with SnRK1 in Arabidopsis. FEBS Letters, 586, 6.
Okamoto, K., Nakatsukasa, M., Alié, A., Masuda, Y., Agata, K., & Funayama, N. (2012). The active stem cell specific expression of sponge Musashi homolog EflMsiA suggests its involvement in maintaining the stem cell state. Mechanisms of Development, 129, 14.
Lin, W.-Z., Fang, J.-A., Xiao, X., & Chou, K.-C. (2011). iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One, 6(9), 7.
Chou, K. C. (2011). Some remarks on protein attribute prediction and pseudo amino acid composition. Journal of Theoretical Biology, 273(1), 10.
Nanni, L., Brahnam, S., & Lumini, A. (2010). High performance set of PseAAC and sequence based descriptors for protein classification. Journal of Theoretical Biology, 266(7), 11.
Nanni, L., Brahnam, S., & Lumini, A. (2012). Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids, 43, 9.
Shen, H.-B., & Chou, K.-C. (2007). PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical Biochemistry, 373(2), 3.
Chou, K.-C., & Cai, Y.-D. (2003). Predicting protein quaternary structure by pseudo amino acid composition. Proteins, 53, 8.
Fang, Y., Guo, Y., Feng, Y., & Li, M. (2008). Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids, 34, 7.
Gao, Q.-B., Zhao, H., Ye, X., & He, J. (2012). Prediction of pattern recognition receptor family using pseudo-amino acid composition. Biochemical and Biophysical Research Communications, 417, 5.
Mohabatkar, H. (2010). Prediction of cyclin proteins using Chou’s pseudo amino acid composition. Protein & Peptide Letters, 17, 8.
Mohabatkar, H., Beigi, M. M., & Esmaeili, A. (2011). Prediction of GABA A receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine. Journal of Theoretical Biology, 281, 6.
Khosravian, M., Faramarzi, F. K., Beigi, M., Behbahani, M., & Mohabatkar, H. (2012). Predicting antibacterial peptides by the concept of Chou’s pseudo-amino acid composition and machine learning methods. Protein and Peptide Letters, 20(2), 7.
Esmaeili, M., Mohabatkar, H., & Mohsenzadeh, S. (2010). Using the concept of Chou’s pseudo amino acid composition for risk type prediction of human papillomaviruses. Journal of Theoretical Biology, 263(2), 7.
Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M.R., Appel, R.d. & Bairoch, A. (2005). Protein identification and analysis tools on the ExPASy server. In J. M. Walker (Ed.). 37.
Leeds, J., McAlindon, M. E., Grant, J., Robson, H. E., Morley, S. R., James, G., et al. (2011). Albumin level and patient age predict outcomes in patients referred for gastrostomy insertion: internal and external validation of a gastrostomy score and comparison with artificial neural networks. Gastrointestinal Endoscopy, 74(5), 10.
Bologna, G. (2004). Is it worth generating rules from neural network ensembles? Journal of Applied Logic, 2, 24.
Dehghani, M. R., Modarress, H., & Bakhshi, A. (2006). Modeling and prediction of activity coefficient ratio of electrolytes in aqueous electrolyte solution containing amino acids using artificial neural network. Fluid Phase Equilibria, 244, 7.
Acquaah-Mensah, G. K., Leach, S. M., & Guda, C. (2006). Predicting the subcellular localization of human proteins using machine learning and exploratory data analysis. Genomics and Proteomics Bioinformatics, 4(2), 14.
Murtagh, F. (1991). Multilayer perceptrons for classification and regression. Neurocomputing, 2(5–6), 15.
González, A., & Dorronsoro, J. R. (2006). Natural conjugate gradient training of multilayer perceptrons. Neurocomputing, 71(13–15), 7.
Eller, P. R., Cheng, J.-R. C., & Maier, R. S. (2012). Dynamic linear solver selection for transient simulations using multi-label classifiers. Procedia Computer Science, 9, 10.
Maisuradze, G., Liwo, A., & Scheraga, H. A. (2009). Principal component analysis for protein folding dynamics. Journal of Molecular Biology, 358, 10.
Das, G., Gentile, F., Coluccio, M. L., Perri, A. M., Nicastri, A., Mecarini, F., et al. (2011). Principal component analysis based methodology to distinguish protein SERS spectra. Journal of Molecular Structure, 993, 6.
Tsai, C.-Y., & Chiu, C.-C. (2008). An efficient conserved region detection method for multiple protein sequences using principal component analysis and wavelet transform. Pattern Recognition Letters, 29, 13.
Wong, J. H., Marx, D. B., Wilson, J. D., Buchanan, B. B., Lemaux, P. G., & Pedersen, J. F. (2010). Principal component analysis and biochemical characterization of protein and starch reveal primary targets for improving sorghum grain. Plant Science, 179, 14.
Miranda, A.A., Borgne, Y.-A.e.L., & Bontempi, G. (2007). New routes from minimal approximation error to principal components. The Netherlands: Kluwer Academic Publishers, p. 14.
Buciński, A., Bączek, T., Krysiński, J., Szoszkiewicz, R., & Załuski, J. (2007). Clinical data analysis using artificial neural networks (ANN) and principal component analysis (PCA) of patients with breast cancer after mastectomy. Reports of Practical Oncology and Radiotherapy, 12(1), 9.
Pardo, R., Vega, M., Deban, L., Cazurro, C., & Carretero, C. (2008). Modelling of chemical fractionation patterns of metals in soils by two-way and three-way principal component analysis. Analytica Chimica Acta, 606, 11.
Schechtman, E., & Sherman, M. (2007). The two-sample t-test with a known ratio of variances. Statistical Methodology, 4, 7.
Jain, N., Thatte, J., Braciale, T., Ley, K., O’Connell, M., & Lee, J. K. (2003). Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays. Bioinformatics, 19(15), 1945–1951.
Sandoval, J. A., Dobrolecki, L. E., Huang, J., Grosfeld, J. L., Hickey, R. J., & Malkas, L. H. (2006). Neuroblastoma detection using serum proteomic profiling: a novel mining technique for cancer? Journal of Pediatric Surgery, 41, 8.
Kurylowicz, M., Yu, C.-H., & Pomès, R. (2010). Systematic study of anharmonic features in a principal component analysis of gramicidin A. Biophysical Journal, 98, 10.
Pant, S. D., Schenkel, F. S., Verschoor, C. P., You, Q., Kelton, D. F., Moore, S. S., et al. (2010). A principal component regression based genome wide analysis approach reveals the presence of a novel QTL on BTA7 for MAP resistance in Holstein cattle. Genomics, 95, 7.
Tang, Y., & Li, J. (2010). Another neural network based approach for computing eigenvalues and eigenvectors of real skew-symmetric matrices. Computers & Mathematics with Applications, 60(5), 8.
Liu, X., Kruger, U., Littler, T., Xie, L., & Wang, S. (2009). Moving window kernel PCA for adaptive monitoring of nonlinear processes. Chemometrics and Intelligent Laboratory Systems, 96(2), 12.
Kim, D., & Lee, I.-B. (2003). Process monitoring based on probabilistic PCA. Chemometrics and Intelligent Laboratory Systems, 67(2), 18.
Erhel, J., Burrage, K., & Pohl, B. (1996). Restarted GMRES preconditioned by deflation. Journal of Computational and Applied Mathematics, 69, 16.
Smith, D. B. (2013). A Sufficient Condition for the Existence of a Principal Eigenvalue for Nonlocal Diffusion Equations with Applications. Journal of Mathematical Analysis and Applications, 418(2): 766--774.
Brameier, M., Krings, A., & MacCallum, R. M. (2007). NucPred—predicting nuclear localization of proteins. Bioinformatics, 23(9), 2.
Li, Y., Oh, H. J., & Lau, Y.-F. C. (2006). The poly(ADP-ribose) polymerase 1 interacts with Sry and modulates its biological functions. Molecular and Cellular Endocrinology, 257–258(26), 12.
Malina, J., Kasparkova, J., Natile, G., & Brabec, V. (2002). Recognition of major DNA adducts of enantiomeric cisplatin analogs by HMG box proteins and nucleotide excision repair of these adducts. Chemistry & Biology, 9, 10.
Won, H.-H. & Cho, S.-B. (2003). Neural network ensemble with negatively correlated features for cancer classification. Springer, Berlin. 200: p. 8.
Acknowledgments
Support of this research by the University of Isfahan is acknowledged.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Amanzadeh, E., Mohabatkar, H. & Biria, D. Classification of DNA Minor and Major Grooves Binding Proteins According to the NLSs by Data Analysis Methods. Appl Biochem Biotechnol 174, 437–451 (2014). https://doi.org/10.1007/s12010-014-0926-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12010-014-0926-y