Skip to main content

Computational approaches for the classification of seed storage proteins

Abstract

Seed storage proteins comprise a major part of the protein content of the seed and have an important role on the quality of the seed. These storage proteins are important because they determine the total protein content and have an effect on the nutritional quality and functional properties for food processing. Transgenic plants are being used to develop improved lines for incorporation into plant breeding programs and the nutrient composition of seeds is a major target of molecular breeding programs. Hence, classification of these proteins is crucial for the development of superior varieties with improved nutritional quality. In this study we have applied machine learning algorithms for classification of seed storage proteins. We have presented an algorithm based on nearest neighbor approach for classification of seed storage proteins and compared its performance with decision tree J48, multilayer perceptron neural (MLP) network and support vector machine (SVM) libSVM. The model based on our algorithm has been able to give higher classification accuracy in comparison to the other methods.

This is a preview of subscription content, access via your institution.

Fig. 1

References

  • Anderson OD, Huo N, Gu YQ (2013) The gene space in wheat: the complete γ-gliadin gene family from the wheat cultivar Chinese Spring. Funct Integr Genomics 13(2):261–273

    CAS  Article  Google Scholar 

  • Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27

    Article  Google Scholar 

  • Frank W (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83

    Article  Google Scholar 

  • Hall M (1999) Correlation-based Feature Selection for Machine Learning. http://www.cs.waikato.ac.nz/∽mhall/thesis.pdf

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: An update. SIGKDD Explor 11(1):10–18

    Article  Google Scholar 

  • Hilton H, Gaut BS (1998) Speciation and domestication in maize and its wild relatives: evidence from the globulin-1 gene. Genetics 150(2):863–872

    CAS  Google Scholar 

  • Kawakatsu T, Hirose S, Yasuda H, Takaiwa F (2010) Reducing rice seed storage protein accumulation leads to changes in nutrient quality and storage organelle formation. Plant Physiol 154:1842–1854

    CAS  Article  Google Scholar 

  • Kawakatsu T, Hirose S, Yasuda H, Takaiwa F (2010) Reducing Rice Seed Storage Protein Accumulation Leads to Changes in Nutrient Quality and Storage Organelle Formation,

  • Li WJ, Dai LL, Chai ZJ, Yin ZJ, Qu LQ (2012) Evaluation of seed storage protein gene 30-untranslated regions in enhancing gene expression in transgenic rice seed. Transgenic Res 21:545–553

    CAS  Article  Google Scholar 

  • Mandal S, Mandal RK (2000) Seed storage proteins and approaches for improvement of their nutritional quality by genetic engineering. Curr Sci 79(5):576–589

    CAS  Google Scholar 

  • Marla S, Bharatiya D, Bala M, Singh V, Kumar A (2010) Classification of rice seed storage proteins using neural networks. J Plant Biochem Biotechnol 19(1):123–126

    CAS  Article  Google Scholar 

  • Munck L, Shewry PR (1992) The case of high-lysine barley breeding. In: Shewry PR (ed) Barley: genetics, biochemistry, molecular biology and biotechnology. CAB International, Wallingford, pp 573–601

    Google Scholar 

  • Müntz K, Christov V, Saalbach G, Saalbach I, Waddell D, Pickardt T, Schieder O, Wüstenhagen T (1998) Genetic engineering for high methionine grain legumes. Food Nahrung 42(03–04):125–127

    Article  Google Scholar 

  • Osborne TB (1924) The vegetable proteins second edition. Longmans, green and Co. London Plant Physiol 154:1842–1854

    Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco

    Google Scholar 

  • Rafiqul M, Khan I, Ceriotti ATL, Aryan A, Rafiqul M, Khan I, Ceriotti ATL, Aryan A, Mc Nabb W, Moore A, Craig S, Spencer D, Higgins TJV (1996) Accumulation of a sulphur-rich seed albumin from sunflower in the leaves of transgenic subterranean clover (Trifolium subterraneum L.). Transgenic Res 5:179–185

    CAS  Article  Google Scholar 

  • Resource Coordinators NCBI (2013) Database resources of the national center for biotechnology information. Nucleic Acids Res 41(D1):D8–D20

    Article  Google Scholar 

  • Rice Annotation Project et al (2007) The Rice Annotation Project Database (RAP-DB): 2008 update. Nucleic Acids Res 36(Database issue):D1028–D1033

    Article  Google Scholar 

  • Rivarola M, Jeffrey T, Foster JT et al (2011) Castor bean organelle genome sequencing and worldwide genetic diversity analysis. PLoS One 6(7):e21743

    CAS  Article  Google Scholar 

  • Saalbach G, Jung E, Saalbach I, Muntz K (1988) Construction of storage protein genes with increased number of methionine codons and their use in transformation experiments. Biochem Physiol Pflanz 183:211–218

    Article  Google Scholar 

  • Schmidt MA, Barbazuk WB, Sandford M, May G, Song Z, Zhou W, Nikolau BJ, Herman EM (2011) Silencing of soybean seed storage proteins results in a rebalanced protein composition preserving seed protein content without major collateral changes in the Metabolome and Transcriptome. Plant Physiol 156:330–345

    CAS  Article  Google Scholar 

  • Schnable PS et al (2009) The B73 maize genome: complexity, diversity and dynamics. Science 326(5956):1112–1115

    CAS  Article  Google Scholar 

  • Sharma SB, Hancock KR, Ealing PM, White DWR (1998) Expression of a sulfur-rich maize seed storage protein, δ-zein, in white clover (shape Trifolium repens) to improve forage quality. Mol Breed 4:435–448

    CAS  Article  Google Scholar 

  • Shewry PR, Halford NG (2002) Cereal seed storage proteins: structures, properties and role in grain utilization. J Exp Bot 53(370):947–958

    CAS  Article  Google Scholar 

  • Spencer D, Boulter D (1984) The physiological role of storage proteins in seeds. Phil Trans R Soc B 304(1120):275–285

    CAS  Article  Google Scholar 

  • Swarbreck D, Wilks C, Lamesch P et al (2007) The Arabidopsis information resource (TAIR): gene structure and function annotation. Nucleic Acids Res 36(Database issue):D1009–D1014

    Article  Google Scholar 

  • Tenaillon MI, Sawkins MC, Long AD et al (2001) Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc Natl Acad Sci U S A 98(16):9161–9166

    CAS  Article  Google Scholar 

  • Whisstock JC, Lesk AM (2003) Prediction of protein function from protein sequence and structure. Q Rev Biophys 36(3):307–340

    CAS  Article  Google Scholar 

  • Zheng Z, Sumi K, Tanaka K, Murai N (1995) The bean seed storage protein [beta]-phaseolin is synthesized, processed, and accumulated in the vacuolar type-II protein bodies of transgenic rice endosperm. Plant Physiol 109:777–786

    CAS  Article  Google Scholar 

Download references

Acknowledgments

The first author is thankful to Director, Indian Institute for Horticultural Research (IIHR) for providing facilities for carrying out the study. The research of the second author is supported by a grant from the Foundation for Scientific Research and Technological Innovation (FSRTI). The authors are thankful to Dr. M. Naresh Kumar, Head, Information Management Systems, National Remote Sensing Centre (ISRO), India, for his suggestions in improving this work. The authors would also like to thank the anonymous reviewers for many helpful suggestions which led to the improvement of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. Radhika.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Radhika, V., Rao, V.S.H. Computational approaches for the classification of seed storage proteins. J Food Sci Technol 52, 4246–4255 (2015). https://doi.org/10.1007/s13197-014-1500-x

Download citation

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13197-014-1500-x

Keywords

  • Classification
  • Nearest neighbour algorithm
  • Correlation based feature selection
  • Machine learning
  • Seed storage proteins
  • Bio-informatics