Abstract
In plants, GIGANTEA (GI) protein plays different biological functions including carbon and sucrose metabolism, cell wall deposition, transpiration and hypocotyl elongation. This suggests that GI is an important class of proteins. So far, the resource-intensive experimental methods have been mostly utilized for identification of GI proteins. Thus, we made an attempt in this study to develop a computational model for fast and accurate prediction of GI proteins. Ten different supervised learning algorithms i.e., SVM, RF, JRIP, J48, LMT, IBK, NB, PART, BAGG and LGB were employed for prediction, where the amino acid composition (AAC), FASGAI features and physico-chemical (PHYC) properties were used as numerical inputs for the learning algorithms. Higher accuracies i.e., 96.75% of AUC-ROC and 86.7% of AUC-PR were observed for SVM coupled with AAC + PHYC feature combination, while evaluated with five-fold cross validation. With leave-one-out cross validation, 97.29% of AUC-ROC and 87.89% of AUC-PR were respectively achieved. While the performance of the model was evaluated with an independent dataset of 18 GI sequences, 17 were observed as correctly predicted. We have also performed proteome-wide identification of GI proteins in wheat, followed by functional annotation using Gene Ontology terms. A prediction server “GIpred” is freely accessible at http://cabgrid.res.in:8080/gipred/ for proteome-wide recognition of GI proteins.
Similar content being viewed by others
Availability of data and material
All the datasets used in this study are available at http://cabgrid.res.in:8080/gipred/dataset.html.
Code availability
A prediction server “GIpred” has been developed for the prediction of GIGANTEA proteins. The server is freely accessible at http://cabgrid.res.in:8080/gipred/index.html.
References
Abe M, Fujiwara M, Kurotani K-I et al (2008) Identification of dynamin as an interactor of rice GIGANTEA by tandem affinity purification (TAP). Plant Cell Physiol 49:420–432. https://doi.org/10.1093/pcp/pcn019
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66. https://doi.org/10.1023/A:1022689900470
Ali F, Hayat M (2015) Classification of membrane protein types using voting feature interval in combination with Chou’s pseudo amino acid composition. J Theor Biol 384:78–83. https://doi.org/10.1016/j.jtbi.2015.07.034
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410. https://doi.org/10.1016/S0022-2836(05)80360-2
Baldi P, Brunak S (2001) Bioinformatics: the machine learning approach. MIT Press, Cambridge, p 452
Berns MC, Nordström K, Cremer F et al (2014) Evening expression of Arabidopsis GIGANTEA is controlled by combinatorial interactions among evolutionarily conserved regulatory motifs. Plant Cell 26:3999–4018. https://doi.org/10.1105/tpc.114.129437
Bonawitz ND, Soltau WL, Blatchley MR et al (2012) REF4 and RFR1, subunits of the transcriptional coregulatory complex mediator, are required for phenylpropanoid homeostasis in Arabidopsis. J Biol Chem 287:5434–5445. https://doi.org/10.1074/jbc.M111.312298
Bouzakri K, Austin R, Rune A et al (2008) Malonyl CoenzymeA decarboxylase regulates lipid and glucose metabolism in human skeletal muscle. Diabetes 57:1508–1516. https://doi.org/10.2337/db07-0583
Breiman L (1996) Bagging predictors. Mach Learn 2:123–140. https://doi.org/10.1023/A:1018054314350
Breiman L (2001) Random forests. Mach Learn 1:5–32. https://doi.org/10.1023/A:1010933404324
Brown JB, Akutsu T (2009) Identification of novel DNA repair proteins via primary sequence, secondary structure, and homology. BMC Bioinform 10:25. https://doi.org/10.1186/1471-2105-10-25
Busino L, Bassermann F, Maiolica A et al (2007) SCFFbxl3 controls the oscillation of the circadian clock by directing the degradation of cryptochrome proteins. Science 316:900–904. https://doi.org/10.1126/science.1141194
Cai Y-D, Chou K-C (2006) Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. J Theor Biol 238:395–400. https://doi.org/10.1016/j.jtbi.2005.05.035
Cao S, Ye M, Jiang S (2005) Involvement of GIGANTEA gene in the regulation of the cold stress response in Arabidopsis. Plant Cell Rep 24:683–690. https://doi.org/10.1007/s00299-005-0061-x
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953
Chen M, Ni M (2006) RFI2, a RING-domain zinc finger protein, negatively regulates CONSTANS expression and photoperiodic flowering. Plant J 46:823–833. https://doi.org/10.1111/j.1365-313X.2006.02740.x
Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russell S (eds) Machine learning proceedings 1995. Morgan Kaufmann, San Francisco, CA, pp 115–123
Conrad KS, Hurley JM, Widom J et al (2016) Structure of the frequency-interacting RNA helicase: a protein interaction hub for the circadian clock. EMBO J 35:1707–1719. https://doi.org/10.15252/embj.201694327
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297. https://doi.org/10.1023/A:1022627411411
Dalchau N, Baek SJ, Briggs HM et al (2011) The circadian oscillator gene GIGANTEA mediates a long-term response of the Arabidopsis thaliana circadian clock to sucrose. PNAS 108:5104–5109. https://doi.org/10.1073/pnas.1015452108
Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning (ICML). ACM Press, pp 233–240
Ding J, Böhlenius H, Rühl MG et al (2018) GIGANTEA-like genes control seasonal growth cessation in Populus. New Phytol 218:1491–1503. https://doi.org/10.1111/nph.15087
Dunford RP, Griffiths S, Christodoulou V, Laurie DA (2005) Characterisation of a barley (Hordeum vulgare L.) homologue of the Arabidopsis flowering time regulator GIGANTEA. Theor Appl Genet 110:925–931. https://doi.org/10.1007/s00122-004-1912-5
Edwards J, Martin AP, Andriunas F et al (2010) GIGANTEA is a component of a regulatory pathway determining wall ingrowth deposition in phloem parenchyma transfer cells of Arabidopsis thaliana. Plant J 63:651–661. https://doi.org/10.1111/j.1365-313X.2010.04269.x
Fitzpatrick MC, Keller SR (2015) Ecological genomics meets community-level modelling of biodiversity: mapping the genomic landscape of current and future environmental adaptation. Ecol Lett 18:1–16. https://doi.org/10.1111/ele.12376
Fornara F, de Montaigu A, Sánchez-Villarreal A et al (2015) The GI-CDF module of Arabidopsis affects freezing tolerance and growth as well as flowering. Plant J 81:695–706. https://doi.org/10.1111/tpj.12759
Fowler S, Thomashow MF (2002) Arabidopsis transcriptome profiling indicates that multiple regulatory pathways are activated during cold acclimation in addition to the CBF cold response pathway. Plant Cell 14:1675–1690. https://doi.org/10.1105/tpc.003483
Frank E (1998) Generating accurate rule sets without global optimization. In: Proceedings of the fifteenth international conference on machine learning, 1998
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28:337–407. https://doi.org/10.1214/aos/1016218223
Gao Y, Hao W, Gu J et al (2016) PredPhos: an ensemble framework for structure-based prediction of phosphorylation sites. J Biol Res (thessalon) 23:12. https://doi.org/10.1186/s40709-016-0042-y
Govindan G, Nair AS (2011) New feature vector for apoptosis protein subcellular localization prediction. In: Abraham A, Lloret Mauri J, Buford JF et al (eds) Advances in computing and communications. Springer, Berlin, pp 294–301
Grau J, Grosse I, Keilwagen J (2015) PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R. Bioinformatics 31:2595–2597. https://doi.org/10.1093/bioinformatics/btv153
Halavaty AS, Moffat K (2013) Coiled-coil dimerization of the LOV2 domain of the blue-light photoreceptor phototropin 1 from Arabidopsis thaliana. Acta Crystallogr Sect F Struct Biol Cryst Commun 69:1316–1321. https://doi.org/10.1107/S1744309113029199
Harmer SL (2009) The circadian system in higher plants. Annu Rev Plant Biol 60:357–377. https://doi.org/10.1146/annurev.arplant.043008.092054
Hayama R, Izawa T, Shimamoto K (2002) Isolation of rice genes possibly involved in the photoperiodic control of flowering by a fluorescent differential display method. Plant Cell Physiol 43:494–504. https://doi.org/10.1093/pcp/pcf059
He Y, Wang C, Higgins JD et al (2016) MEIOTIC F-BOX is essential for male meiotic DNA double-strand break repair in rice. Plant Cell 28:1879–1893. https://doi.org/10.1105/tpc.16.00108
Hong S-Y, Lee S, Seo PJ et al (2010) Identification and molecular characterization of a Brachypodium distachyon GIGANTEA gene: functional conservation in monocot and dicot plants. Plant Mol Biol 72:485–497. https://doi.org/10.1007/s11103-009-9586-7
Hornik K, Buchta C, Zeileis A (2009) Open-source machine learning: R meets Weka. Comput Stat 24:225–232
Huang Y, Niu B, Gao Y et al (2010) CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26:680–682. https://doi.org/10.1093/bioinformatics/btq003
Huang H-L, Lin I-C, Liou Y-F et al (2011) Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties. BMC Bioinform 12:S47. https://doi.org/10.1186/1471-2105-12-S1-S47
Hwang DY, Park S, Lee S et al (2019) GIGANTEA regulates the timing stabilization of CONSTANS by altering the interaction between FKF1 and ZEITLUPE. Mol Cells 42:693–701. https://doi.org/10.14348/molcells.2019.0199
Ito S, Song YH, Imaizumi T (2012) LOV domain-containing F-Box proteins: light-dependent protein degradation modules in Arabidopsis. Mol Plant 5:47–56. https://doi.org/10.1093/mp/sss013
Jones P, Binns D, Chang H-Y et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240. https://doi.org/10.1093/bioinformatics/btu031
Keilwagen J, Grosse I, Grau J (2014) Area under precision-recall curves for weighted and unweighted data. PLoS ONE 9:e92209. https://doi.org/10.1371/journal.pone.0092209
Kim W-Y, Fujiwara S, Suh S-S et al (2007) ZEITLUPE is a circadian photoreceptor stabilized by GIGANTEA in blue light. Nature 449:356–360. https://doi.org/10.1038/nature06132
Kim W-Y, Ali Z, Park HJ et al (2013) Release of SOS2 kinase from sequestration with GIGANTEA determines salt tolerance in Arabidopsis. Nat Commun 4:1352. https://doi.org/10.1038/ncomms2357
Kim JA, Jung H-E, Hong JK et al (2016) Reduction of GIGANTEA expression in transgenic Brassica rapa enhances salt tolerance. Plant Cell Rep 35:1943–1954. https://doi.org/10.1007/s00299-016-2008-9
Kinoshita A, Niwa Y, Onai K et al (2017) CSL encodes a leucine-rich-repeat protein implicated in red/violet light signaling to the circadian clock in Chlamydomonas. PLoS Genet 13:e1006645. https://doi.org/10.1371/journal.pgen.1006645
Kırlı K, Karaca S, Dehne HJ et al (2015) A deep proteomics perspective on CRM1-mediated nuclear export and nucleocytoplasmic partitioning. Elife 4:e11466. https://doi.org/10.7554/eLife.11466
Krahmer J, Goralogia GS, Kubota A et al (2019) Time-resolved interaction proteomics of the GIGANTEA protein under diurnal cycles in Arabidopsis. FEBS Lett 593:319–338. https://doi.org/10.1002/1873-3468.13311
Kumagai T, Ito S, Nakamichi N et al (2008) The common function of a novel subfamily of B-Box zinc finger proteins with reference to circadian-associated events in Arabidopsis thaliana. Biosci Biotechnol Biochem 72:1539–1549. https://doi.org/10.1271/bbb.80041
Kumar V (ed) (1997) Introduction to parallel computing: design and analysis of algorithms, Nachdr. Benjamin/Cummings, Redwood City, CA
Kurepa J, Smalle J, Van Montagu M, Inzé D (1998) Oxidative stress tolerance and longevity in Arabidopsis: the late-flowering mutant gigantea is tolerant to paraquat. Plant J 14:759–764. https://doi.org/10.1046/j.1365-313x.1998.00168.x
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 1–2:161–205. https://doi.org/10.1007/s10994-005-0466-3
Langdon T, Thomas A, Huang L, Farrar K, King J, Armstead I (2009) Fragments of the key flowering gene GIGANTEA are associated with helitron-type sequences in the Pooideae grass Lolium perenne. BMC Plant Biol 9(1):70. https://doi.org/10.1186/1471-2229-9-70
Larrañaga P, Calvo B, Santana R et al (2006) Machine learning in bioinformatics. Brief Bioinform 7:86–112. https://doi.org/10.1093/bib/bbk007
Lee S, Lee B-C, Kim D (2006) Prediction of protein secondary structure content using amino acid composition and evolutionary information. Proteins 62:1107–1114. https://doi.org/10.1002/prot.20821
Li S, Shui K, Zhang Y et al (2017) CGDB: a database of circadian genes in eukaryotes. Nucleic Acids Res 45:D397–D403. https://doi.org/10.1093/nar/gkw1028
Liang G, Li Z (2007) Factor analysis scale of generalized amino acid information as the source of a new set of descriptors for elucidating the structure and activity relationships of cationic antimicrobial peptides. QSAR Comb Sci 26:754–763. https://doi.org/10.1002/qsar.200630145
Liang H, Barakat A, Schlarbaum SE et al (2010) Comparison of gene order of GIGANTEA loci in yellow-poplar, monocots, and eudicots. Genome 53:533–544. https://doi.org/10.1139/g10-031
Liaw A (2002) Classification and regression by random. For R News 2:18–22
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta (BBA) Protein Struct 405:442–451. https://doi.org/10.1016/0005-2795(75)90109-9
Meher PK, Sahu TK, Banchariya A, Rao AR (2017) DIRProt: a computational approach for discriminating insecticide resistant proteins from non-resistant proteins. BMC Bioinform 18:190. https://doi.org/10.1186/s12859-017-1587-y
Meyer D, Dimitriadou E, Hornik K, et al (2021) e1071: misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. Version 1.7–7. https://CRAN.R-project.org/package=e1071
Mi H, Muruganujan A, Huang X et al (2019) Protocol update for large-scale genome and gene function analysis with PANTHER classification system (v.14.0). Nat Protoc 14:703–721. https://doi.org/10.1038/s41596-019-0128-8
Minazzato G, Gasparrini M, Amici A et al (2020) Functional characterization of COG1713 (YqeK) as a novel diadenosine tetraphosphate hydrolase family. J Bacteriol 202:e00053-e120. https://doi.org/10.1128/JB.00053-20
Mishra P, Panigrahi KC (2015) GIGANTEA—an emerging story. Front Plant Sci 6:8. https://doi.org/10.3389/fpls.2015.00008
Mugford ST, Fernandez O, Brinton J et al (2014) Regulatory properties of ADP glucose pyrophosphorylase are required for adjustment of leaf starch synthesis in different photoperiods. Plant Physiol 166:1733–1747. https://doi.org/10.1104/pp.114.247759
Nakasako M, Zikihara K, Matsuoka D et al (2008) Structural basis of the LOV1 dimerization of Arabidopsis phototropins 1 and 2. J Mol Biol 381:718–733. https://doi.org/10.1016/j.jmb.2008.06.033
Osorio D, Rondon-Villarreal P, Torres R (2015) Peptides: a package for data mining of antimicrobial peptides. R J 7:4–14
Park C, Lim CW, Lee SC (2016) The pepper RING-Type E3 Ligase, CaAIP1, functions as a positive regulator of drought and high salinity stress responses. Plant Cell Physiol 57:2202–2212. https://doi.org/10.1093/pcp/pcw139
Putker M, O’Neill JS (2016) Reciprocal control of the circadian clock and cellular redox state—a critical appraisal. Mol Cells 39:6–19. https://doi.org/10.14348/molcells.2016.2323
Radivojac P, Vacic V, Haynes C et al (2010) Identification, analysis, and prediction of protein ubiquitination sites. Proteins 78:365–380. https://doi.org/10.1002/prot.22555
Riboni M, Galbiati M, Tonelli C, Conti L (2013) GIGANTEA enables drought escape response via abscisic acid-dependent activation of the florigens and SUPPRESSOR OF OVEREXPRESSION OF CONSTANS. Plant Physiol 162:1706–1719. https://doi.org/10.1104/pp.113.217729
Rohila JS, Chen M, Chen S et al (2006) Protein-protein interactions of tandem affinity purification-tagged protein kinases in rice. Plant J 46:1–13. https://doi.org/10.1111/j.1365-313X.2006.02671.x
Salzberg SL (1994) Book review: C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers Inc., 1993. Mach Learn 16:235–240. https://doi.org/10.1023/A:1022645310020
Scott MS, Thomas DY, Hallett MT (2004) Predicting subcellular localization via protein motif co-occurrence. Genome Res 14:1957–1966. https://doi.org/10.1101/gr.2650004
Sothern RB, Tseng T-S, Orcutt SL et al (2002) GIGANTEA and SPINDLY genes linked to the clock pathway that controls circadian characteristics of transpiration in Arabidopsis. Chronobiol Int 19:1005–1022. https://doi.org/10.1081/cbi-120015965
Stefanowicz K, Lannoo N, Van Damme EJM (2015) Plant F-box proteins—judges between life and death. Crit Rev Plant Sci 34:523–552. https://doi.org/10.1080/07352689.2015.1024566
Stout J, Romero-Severson E, Ruegger MO, Chapple C (2008) Semidominant mutations in reduced epidermal fluorescence 4 reduce phenylpropanoid content in Arabidopsis. Genetics 178:2237–2251. https://doi.org/10.1534/genetics.107.083881
Sua JN, Lim SY, Yulius MH et al (2020) Incorporating convolutional neural networks and sequence graph transform for identifying multilabel protein Lysine PTM sites. Chemom Intell Lab Syst. https://doi.org/10.1016/j.chemolab.2020.104171
Tamura T, Akutsu T (2007) Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition. BMC Bioinform 8:466. https://doi.org/10.1186/1471-2105-8-466
Tang W, Yan H, Su ZX, Park SC, Liu YJ, Zhang YG, Li Q (2017) Cloning and characterization of a novel GIGANTEA gene in sweet potato. Plant Physiol Biochem 116:27–35. https://doi.org/10.1016/j.plaphy.2017.04.025
Tseng T-S, Salomé PA, McClung CR, Olszewski NE (2004) SPINDLY and GIGANTEA interact and act in Arabidopsis thaliana pathways involved in light responses, flowering, and rhythms in cotyledon movements. Plant Cell 16:1550–1563. https://doi.org/10.1105/tpc.019224
Wang Y, Wu J-F, Nakamichi N et al (2011) LIGHT-REGULATED WD1 and PSEUDO-RESPONSE REGULATOR9 form a positive feedback regulatory loop in the Arabidopsis circadian clock. Plant Cell 23:486–498. https://doi.org/10.1105/tpc.110.081661
Xiao N, Cao DS, Zhu MF, Xu QS (2015) protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 31:1857–1859. https://doi.org/10.1093/bioinformatics/btv042
Xie Q, Lou P, Hermand V et al (2015) Allelic polymorphism of GIGANTEA is responsible for naturally occurring variation in circadian period in Brassica rapa. Proc Natl Acad Sci U S A 112:3829–3834. https://doi.org/10.1073/pnas.1421803112
Xiong Y, Liu J, Wei D-Q (2011) An accurate feature-based method for identifying DNA-binding residues on protein surfaces. Proteins 79:509–517. https://doi.org/10.1002/prot.22898
Zhang EE, Liu AC, Hirota T et al (2009) A genome-wide RNAi screen for modifiers of the circadian clock in human cells. Cell 139:199–210. https://doi.org/10.1016/j.cell.2009.08.031
Zhang X, Gonzalez-Carranza ZH, Zhang S, Miao Y, Liu CJ, Roberts JA (2019) F-Box proteins in plants. Annu Plant Rev 2:307–328. https://doi.org/10.1002/9781119312994.apr0701
Zhao XY, Liu MS, Li JR et al (2005) The wheat TaGI1, involved in photoperiodic flowering, encodes an Arabidopsis GI ortholog. Plant Mol Biol 58:53–64. https://doi.org/10.1007/s11103-005-4162-2
Zheng Z, Webb GI (2000) Lazy learning of Bayesian rules. Mach Learn 41:53–84. https://doi.org/10.1023/A:1007613203719
Zimmerman MD, Proudfoot M, Yakunin A, Minor W (2008) Structural insight into the mechanism of substrate specificity and catalytic activity of an HD domain phosphohydrolase: the 5′-deoxyribonucleotidase YfbR from Escherichia coli. J Mol Biol 378:215–226. https://doi.org/10.1016/j.jmb.2008.02.036
Funding
This study was supported by ICAR CABin Scheme Network project on Agricultural Bioinformatics and Computational Biology (F. No. Agril. Edn. 14/2/2017-A&P dated 02.08.2017), received from Indian Council of Agricultural Research (ICAR), New Delhi. The funder had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Contributions
Conceptualization: PKM; Data curation: SD, SS and TKS; Formal analysis: PKM; Investigation: PKM and SD; Methodology: PKM and SD; Software: TKS and PKM; Validation: SS, SD, TKS and SP; Visualization: PKM, SS and TKS; Roles/Writing—original draft: PKM, SD, SP and SS; Writing—review and editing: PKM and SP.
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Meher, P.K., Dash, S., Sahu, T.K. et al. GIpred: a computational tool for prediction of GIGANTEA proteins using machine learning algorithm. Physiol Mol Biol Plants 28, 1–16 (2022). https://doi.org/10.1007/s12298-022-01130-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12298-022-01130-6