Abstract
Bioinformatics is the application of computing in the storage and analysis of vast amount of biological data. These data are available as sequences and protein and nucleic acid structures. Sequences are represented as single dimensions while a structure includes three-dimensional data of sequences. A biological database organises its data in such a way so that they can be easily accessed and analysed. Biological databases can be classified into sequence and structure databases. Sequence databases are applied to both protein and nucleic acid sequences while protein databases are applied only to proteins. The first database was developed after the insulin protein sequence was made available back in 1956. Insulin was the first protein to be sequenced. During the sixties, the first nucleic acid sequence of Yeast tRNA was developed. There was development of three-dimensional structures of proteins and the Protein Data Bank was established with only 10 entries. This database has evolved to a large database with over 10000 entries. In 1986, the SWISS-PROT protein sequence database was developed and it has about 70000 protein sequences that cover more than 5000 model organisms ((Babu 1997).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Babu MM (1997) Biological databases and protein sequence analysis. https://www.mrc-lmb.cam.ac.uk/genomes/madanm/pdfs/biodbseq.pdf. Accessed 13 Sept 2019
Baxevanis AD, Bateman A (2018) The importance of biological databases in biological discovery. Curr Protoc. Bioinform 50(1):1.1.1–1.1.8
Benson DA, Cavanaugh M, Clark K et al (2013) GenBank. Nucl Acids Res 41(Database issue):D36–D42
Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov IN, Zhuang P (2000) The protein data bank. Nucl Acids Res 28:235–242
Bourne P (2005) Will a biological database be different from a biological journal? PLoS Comput Biol 1:179–181
Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2016) GenBank. Nucl Acids Res 44:D67–D72
Cochrane G, Karsch-Mizrachi I, Takagi T (2016) The international nucleotide sequence database collaboration. Nucl Acids Res 44:D48–D50
EMBL-EBI (2020) Primary and secondary databases. https://www.ebi.ac.uk/training/online/course/bioinformatics-terrified-2018/primary-and-secondary-databases. Accessed 12 Dec 2019
Enago Academy (2019) Biological databases: an overview and future perspective. https://www.enago.com/academy/biological-databases-an-overview-and-future-perspectives/. Accessed 06 Dec 2019
GenBank (2020) GenBank Overview. https://www.ncbi.nlm.nih.gov/genbank/. Accessed 07 Nov 2019
Gibson R, Alako B, Amid C, Cerdeño-Tárraga A, Cleland I, Goodgame N, Hoopen PT, Jayathilaka S, Kay S, Leinonen R et al (2016) Biocuration of functional annotation at the European nucleotide archive. Nucl Acids Res 44:D58–D66
Henneges C, Hinselmann G, Jung S, Madlung J, Schutz W et al (2009) Ranking methods for the prediction of frequent top scoring peptides from proteomics data. J Proteomics Bioinform 2:226–235
Holzinger A, Jurisica I (2014) Knowledge discovery and data mining in biomedical informatics: the future is in integrative, interactive machine learning solutions. Lecture notes in computer science, pp 1–18
Karthick RNS, Muthukumaran J (2008) ‘Prediction of three dimensional model and active site analysis of inducible serine protease inhibitor -2 (ISPI -2)’, Galleria Mellonella. J Comput Sci Syst Biol 1:119–125
Kodama Y, Shumway M, Leinonen R (2012) International nucleotide sequence database collaboration. The sequence read archive: explosive growth of sequencing data. Nucl Acids Res 40(Database issue):D54–D56
Kulikova T, Aldebert P, Althorpe N, Baker W, Bates K, Browne P, Broek A, Cochrane G, Duggan K, Eberhardt R, Faruque N, GarcÃa-Pastor MP, Harte N, Kanz C, Leinonen R, Lin Q, Lombard V, Lopez R, Mancuso R, Apweiler R (2004) The EMBL nucleotide sequence database. Nucl Acids Res 32:D27–D30. https://doi.org/10.1093/nar/gkh120
Liu Z, Liu Y, Liu S, Ding X, Yang Y et al (2009) Analysis of the sequence of ITS1-5.8S-ITS2 regions of the three species of fructus Evodiae in Guizhou Province of China and identification of main ingredients of their medicinal chemistry. J Comput Sci Syst Biol 2:200–207
Manach C (2016) Metabolomics databases. In: Max Rubner conference, 10–12 October 2016. Max Rubner-Institut, Karlsruhe, Germany
Mashima J, Kodama Y, Fujisawa T, Katayama T, Okuda Y, Kaminuma E, Ogasawara O, Okubo K, Nakamura Y, Takagi T (2017) DNA data bank of Japan. Nucl Acids Res 45(D1):D25–D31
Moreau Y, Tranchevent L-C (2012) Computational tools for prioritizing candidate genes: boosting disease gene discovery. Nat Rev Genet 13:523–536
Oswaldo Cruz Institute (2001) Characteristics of biological data. http://www.dbbm.fiocruz.br/class/Lecture/d17/db_overview/characteristics_of_biological_data.htm. Accessed 01 Nov 2019
Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O’Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, DiCuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM (2014) RefSeq: an update on mammalian reference sequences. Nucl Acids Res 42(D1):D756–D763
Riad AM, Hassan AE, Hassan QF (2009) Investigating investigating performance of XML web services in real-time business systems. J Comput Sci Syst Biol 2:266–271
Shanthi V, Ramanathan K, Sethumadhavan R (2009) Role of the cation-π interaction in therapeutic proteins: a comparative study with conventional stabilizing forces. J Comput Sci Syst Biol 2:051–068
Singh S, Gupta SK, Nischal A, Khattri S, Nath R et al (2010) Comparative modeling study of the 3-D structure of small delta antigen protein of hepatitis delta virus. J Comput Sci Syst Biol 3:001–004
Toomula S, Kumar A, Kumar DS, Bheemidi VS (2011) Biological databases-integration of life science data. J Comput Sci & Syst Biol 4(5):088–092
The UniProt Consortium (2017) UniProt: the universal protein knowledgebase. Nucl Acids Res 45(D1):D158–D169
UniProt Consortium (2020) The universal protein resource – UniProt. [Flyer obtained online]. Accessed 06 Mar 2019
Varsale AR, Wadnerkar AS, Mandage RH, Jadhavrao PK (2010) Cheminformatics. J Proteomics Bioinform 3:253–259
Wooley JC, Lin HS (eds) (2005) Catalyzing inquiry at the interface of computing and biology. National Research Council (US) Committee on Frontiers at the Interface of Computing and Biology. National Academies Press, Washington (DC)
wwPDB (2020) Worldwide Protein Data Bank (PDB). http://www.wwpdb.org/. Accessed 23 Aug 2019
Yadav G, Mohanty D (2017) Databases developed in India for biological sciences. J Proteins Proteomics 8(3):159–167
Zou D, Ma L, Yu J, Zhang Z (2015) Biological databases for human research. Genomics Proteomics Bioinform 13(1):55–63
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Domdouzis, K., Lake, P., Crowther, P. (2021). Biological Databases. In: Concise Guide to Databases. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-42224-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-42224-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-42223-3
Online ISBN: 978-3-030-42224-0
eBook Packages: Computer ScienceComputer Science (R0)