Skip to main content

Protein Analysis: From Sequence to Structure

  • Chapter
  • First Online:
Advances in Bioinformatics

Abstract

Proteins are primary molecules that control most of the cellular processes. The sequence of a protein is linked to its structure which in turn is linked to its function. Understanding and integrating protein sequence, structure, and function information is necessary to address many challenging areas of Biology including protein engineering, structural biology, and drug discovery. Bioinformatics deals with protein sequences, structures, predictions, and analysis. Accessibility of these data and availability of high-throughput analysis tools will supplement experimental work in order to understand proteins better. Prediction of three-dimensional structures of proteins and studying the structural features are very necessary to understand various diseases and aid in disease diagnosis and drug discovery. In this chapter we discuss about various databases and in silico tools and methods related to protein sequence and structure analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, Lindahl E (2015) GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2:19–25

    Article  Google Scholar 

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  • Andreeva A, Kulesha E, Gough J, Murzin AG (2020) The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res 48:D376–D382

    Article  CAS  PubMed  Google Scholar 

  • Attwood TK, Croning MD, Flower DR, Lewis AP, Mabey JE, Scordis P, Selley JN, Wright W (2000) PRINTS-S: the database formerly known as PRINTS. Nucleic Acids Res 28:225–227

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28:45–48

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Becker OM, Marantz Y, Shacham S, Inbal B, Heifetz A, Kalid O, Bar-Haim S, Warshaviak D, Fichman M, Noiman S (2004) G protein-coupled receptors: In silico drug discovery in 3D. Proc Natl Acad Sci U S A 101:11304

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Berman HM (2008) The protein data bank: a historical perspective. Acta Crystallogr A 64:88–95

    Article  CAS  PubMed  Google Scholar 

  • Böhm HJ (1994) On the use of LUDI to search the Fine Chemicals Directory for ligands of proteins of known three-dimensional structure. J Comput Aided Mol Des 8:623–632

    Article  PubMed  Google Scholar 

  • Chandra NR, Kumar N, Jeyakani J, Singh DD, Gowda SB, Prathima MN (2006) Lectindb: a plant lectin database. Glycobiology 16:938–946

    Article  CAS  PubMed  Google Scholar 

  • Chen C, Huang H, Wu CH (2017) Protein bioinformatics databases and resources. Methods Mol Biol (Clifton, N.J.) 1558:3–39

    Article  CAS  Google Scholar 

  • Cheng J, Randall AZ, Sweredoski MJ, Baldi P (2005) SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 33:W72–W76

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31:3497–3500

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Chou KC (2004) Structural bioinformatics and its impact to biomedical science. Curr Med Chem 11:2105–2134

    Article  CAS  PubMed  Google Scholar 

  • Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13:222–245

    Article  CAS  PubMed  Google Scholar 

  • Cohn EJ (1939) Proteins as chemical substances and as biological components. Bull N Y Acad Med 15:639

    CAS  PubMed  PubMed Central  Google Scholar 

  • Do CB, Katoh K (2008) Protein multiple sequence alignment. Methods Mol Biol 484:379–413

    Article  CAS  PubMed  Google Scholar 

  • Dunbar J, Krawczyk K, Leem J, Marks C, Nowak J, Regep C, Georges G, Kelm S, Popovic B, Deane CM (2016) SAbPred: a structure-based antibody prediction server. Nucleic Acids Res 44:W474–W478

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Elmezayen AD, Al-Obaidi A, Åžahin AT, Yelekçi K (2020) Drug repurposing for coronavirus (COVID-19): in silico screening of known drugs against coronavirus 3CL hydrolase and protease enzymes. J Biomol Struct Dyn:1–13

    Google Scholar 

  • Feig M (2017) Computational protein structure refinement: almost there, yet still so far to go. WIREs Comput Mol Sci 7:e1307

    Article  CAS  Google Scholar 

  • Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer ELL, Tate J, Punta M (2014) Pfam: the protein families database. Nucleic Acids Res 42:D222–D230

    Article  CAS  PubMed  Google Scholar 

  • Garnier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120:97–120

    Article  CAS  PubMed  Google Scholar 

  • Gasteiger E, Hoogland C, Gattiker A, Wilkins MR, Appel RD, Bairoch A (2005) Protein identification and analysis tools on the ExPASy server. The proteomics protocols handbook. Springer

    Google Scholar 

  • Ghoorah AW, Devignes M-D, Alborzi SZ, Smaïl-Tabbone M, Ritchie DW (2015) A structure-based classification and analysis of protein domain family binding sites and their interactions. Biology 4:327–343

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gil C, Ginex T, Maestro I, Nozal V, Barrado-Gil L, Cuesta-Geijo M, Urquiza J, Ramírez D, Alonso C, Campillo NE, Martinez A (2020) COVID-19: drug targets and potential treatments. J Med Chem

    Google Scholar 

  • Grosdidier A, Zoete V, Michielin O (2011) SwissDock, a protein-small molecule docking web service based on EADock DSS. Nucleic Acids Res 39:W270–W277

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Haas J, Barbato A, Behringer D, Studer G, Roth S, Bertoni M, Mostaguir K, Gumienny R, Schwede T (2018) Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12. Proteins 86(Suppl 1):387–398

    Article  CAS  PubMed  Google Scholar 

  • Hauser AS, Chavali S, Masuho I, Jahn LJ, Martemyanov KA, Gloriam DE, Babu MM (2018) Pharmacogenomics of GPCR drug targets. Cell 172:41–54.e19

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hebditch M, Carballo-Amador MA, Charonis S, Curtis R, Warwicker J (2017) Protein-Sol: a web tool for predicting protein solubility from sequence. Bioinformatics (Oxford, England) 33:3098–3100

    Article  CAS  Google Scholar 

  • Hollingsworth SA, Dror RO (2018) Molecular dynamics simulation for all. Neuron 99:1129–1143

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Huang X, Pearce R, Zhang Y (2020) De novo design of protein peptides to block association of the SARS-CoV-2 spike protein with human ACE2. Aging 12:11263

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hulo N, Bairoch A, Bulliard V, Cerutti L, De Castro E, Langendijk-Genevaux PS, Pagni M, Sigrist CJA (2006) The PROSITE database. Nucleic Acids Res 34:D227–D230

    Article  CAS  PubMed  Google Scholar 

  • Jones DT (2001) Predicting novel protein folds by using FRAGFOLD. Proteins 45:127–132

    Article  CAS  Google Scholar 

  • Jones S, Thornton JM (1997) Analysis of protein-protein interaction sites using surface patches. J Mol Biol 272:121–132

    Article  CAS  PubMed  Google Scholar 

  • Joshi T, Joshi T, Sharma P, Mathpal S, Pundir H, Bhatt V, Chandra S (2020) In silico screening of natural compounds against COVID-19 by targeting Mpro and ACE2 using molecular docking. Eur Rev Med Pharmacol Sci 24:4529–4536

    CAS  PubMed  Google Scholar 

  • Kabsch W, Sander C (1983) How good are predictions of protein secondary structure? FEBS Lett 155:179–182

    Article  CAS  PubMed  Google Scholar 

  • Kangueane P, Nilofer C (2018) Protein-protein and domain-domain interactions. Springer

    Google Scholar 

  • Kc DB (2017) Recent advances in sequence-based protein structure prediction. Brief Bioinform 18:1021–1032

    PubMed  Google Scholar 

  • Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC (1958) A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181:662–666

    Article  CAS  PubMed  Google Scholar 

  • Kerzmann A, Neumann D, Kohlbacher O (2006) SLICK– scoring and energy functions for protein–carbohydrate interactions. J Chem Inf Model 46:1635–1642

    Article  CAS  PubMed  Google Scholar 

  • Kinch LN, Kryshtafovych A, Monastyrskyy B, Grishin NV (2019) CASP13 target classification into tertiary structure prediction categories. Proteins Struct Funct Bioinform 87:1021–1036

    Article  CAS  Google Scholar 

  • Kleywegt GJ, Jones TA (1996) Phi/psi-chology: Ramachandran revisited. Structure 4:1395–1400

    Article  CAS  PubMed  Google Scholar 

  • Laskowski RA, Macarthur MW, Moss DS, Thornton JM (1993) PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr 26:283–291

    Article  CAS  Google Scholar 

  • Lin K, Simossis VA, Taylor WR, Heringa J (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21:152–159

    Article  CAS  PubMed  Google Scholar 

  • Madden TL, Busby B, Ye J (2019) Reply to the paper: misunderstood parameters of NCBI BLAST impacts the correctness of bioinformatics workflows. Bioinformatics 35:2699–2700

    Article  CAS  PubMed  Google Scholar 

  • Malik A, Firoz A, Jha V, Ahmad S (2010) PROCARB: a database of known and modelled carbohydrate-binding protein structures with sequence-based prediction tools. Adv Bioinform 2010

    Google Scholar 

  • Marco W (2009) Structural bioinformatics: from the sequence to structure and function. Curr Bioinform 4:54–87

    Article  Google Scholar 

  • Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30:1072–1080

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Miszta P, Pasznik P, Jakowiecki J, Sztyler A, Latek D, Filipek S (2018) GPCRM: a homology modeling web service with triple membrane-fitted quality assessment of GPCR models. Nucleic Acids Res 46:W387–W395

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mount DW (2009) Using hidden Markov models to align multiple sequences. Cold Spring Harb Protoc, 2009, pdb.top41

    Google Scholar 

  • Mount DW, Mount DW (2001) Bioinformatics: sequence and genome analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY

    Google Scholar 

  • Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540

    CAS  PubMed  Google Scholar 

  • Narayanan N, Nair DT (2020) Vitamin B12 may inhibit RNA-dependent-RNA polymerase activity of nsp12 from the SARS-CoV-2 virus. IUBMB Life

    Google Scholar 

  • Nelson DL, Lehninger AL, Cox MM (2008) Lehninger principles of biochemistry. Macmillan

    Google Scholar 

  • Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (1997) CATH – a hierarchic classification of protein domain structures. Structure 5:1093–1109

    Article  CAS  PubMed  Google Scholar 

  • Pauling L, Corey RB, Branson HR (1951) The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci 37:205

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ramakrishnan C, Ramachandran G (1965) Stereochemical criteria for polypeptide and protein chain conformations: II. Allowed conformations for a pair of peptide units. Biophys J 5:909–933

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rao VS, Srinivas K, Sujini GN, Kumar GNS (2014) Protein-protein interaction detection: methods and analysis. Int J Proteomics 2014:147648

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Richardson JS (1977) β-Sheet topology and the relatedness of proteins. Nature 268:495–500

    Article  CAS  PubMed  Google Scholar 

  • Rohl CA, Strauss CE, Misura KM, Baker D (2004) Protein structure prediction using Rosetta. Methods Enzymol 383:66–93

    Article  CAS  PubMed  Google Scholar 

  • Roy A, Kucukural A, Zhang Y (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nat Protoc 5:725–738

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sacchettini JC, Baum LG, Brewer CF (2001) Multivalent protein− carbohydrate interactions. a new paradigm for supermolecular assembly and signal transduction. Biochemistry 40:3009–3015

    Article  CAS  PubMed  Google Scholar 

  • Sandal M, Duy TP, Cona M, Zung H, Carloni P, Musiani F, Giorgetti A (2013) GOMoDo: a GPCRs online modeling and docking webserver. PLoS ONE 8:e74092

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sarai A, Kono H (2005) Protein-DNA recognition patterns and predictions. Annu Rev Biophys Biomol Struct 34:379–398

    Article  CAS  PubMed  Google Scholar 

  • Schomburg I, Chang A, Schomburg D (2002) BRENDA, enzyme data and metabolic information. Nucleic Acids Res 30:47–49

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Schwede T, Kopp J, Guex N, Peitsch MC (2003) SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res 31:3381–3385

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Si J, Zhao R, Wu R (2015) An overview of the prediction of protein DNA-binding sites. Int J Mol Sci 16:5194–5215

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Taherzadeh G, Zhou Y, Liew AW-C, Yang Y (2016) Sequence-based prediction of protein–carbohydrate binding sites using support vector machines. J Chem Inf Model 56:2115–2122

    Article  CAS  PubMed  Google Scholar 

  • Takeda-Shitaka M, Takaya D, Chiba C, Tanaka H, Umeyama H (2004) Protein structure prediction in structure based drug design. Curr Med Chem 11:551–558

    Article  CAS  PubMed  Google Scholar 

  • Taylor HS (1941) Large molecules through atomic spectacles. Proc Am Philos Soc:1–12

    Google Scholar 

  • Venkatachalam CM (1968) Stereochemical criteria for polypeptides and proteins. V. Conformation of a system of three linked peptide units. Biopolymers 6:1425–1436

    Article  CAS  PubMed  Google Scholar 

  • Waterhouse A, Bertoni M, Bienert S, Studer G, Tauriello G, Gumienny R, Heer FT, De Beer TAP, Rempfer C, Bordoli L, Lepore R, Schwede T (2018) SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res 46:W296–W303

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Webb B, Sali A (2016) Comparative protein structure modeling using MODELLER. Curr Protoc Bioinform 54:5.6.1–5.6.37

    Article  Google Scholar 

  • Wlodawer A (2017) Stereochemistry and validation of macromolecular structures. Methods Mol Biol 1607:595–610

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Worth CL, Kreuchwig F, Tiemann JKS, Kreuchwig A, Ritschel M, Kleinau G, Hildebrand PW, Krause G (2017) GPCR-SSFE 2.0-a fragment-based molecular modeling web tool for Class A G-protein coupled receptors. Nucleic Acids Res 45:W408–w415

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Wu CH, Yeh L-SL, Huang H, Arminski L, Castro-Alvear J, Chen Y, Hu Z, Kourtesis P, Ledley RS, Suzek BE, Vinayaka CR, Zhang J, Barker WC (2003) The protein information resource. Nucleic Acids Res 31:345–347

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Xu D, Zhang Y (2011) Improving the physical realism and structural accuracy of protein models by a two-step atomic-level energy minimization. Biophys J 101:2525–2534

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xu D, Zhang Y (2012) Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 80:1715–1735

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang Y, Skolnick J (2004) Automated structure prediction of weakly homologous proteins on a genomic scale. Proc Natl Acad Sci U S A 101:7594–7599

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zhang J, Liang Y, Zhang Y (2011) Atomic-level protein structure refinement using fragment-guided molecular dynamics conformation sampling. Structure 19:1784–1795

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anju Pappachan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jani, J., Pappachan, A. (2021). Protein Analysis: From Sequence to Structure. In: Singh, V., Kumar, A. (eds) Advances in Bioinformatics. Springer, Singapore. https://doi.org/10.1007/978-981-33-6191-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-33-6191-1_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-33-6190-4

  • Online ISBN: 978-981-33-6191-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics