Abstract
Genome is considered as the carrier of hereditary information and the operative system of an organism. The genomic data are in the form of sequentially arranged nucleotide base pairs. The data mining of genome resources is chiefly based on computational tools acknowledged as computational genome annotation. The computational genome annotation may be either structural or functional. The structural annotation refers to the identification of hypothetical genes in a DNA sequence using computational algorithms, while the functional annotation is assigned as the functions to the predicted genes using sequence similarity searches against other genes of known function. The aim of this chapter is to focus on the genomic resources and mining of genomic databases using the computational tools.
Similar content being viewed by others
References
Babu PA, Udyama J, Kumar RK, Boddepalli R, Mangala DS, Rao GP (2007) DoD2007: 1082 molecular biology databases. Bioinformation 2:64–67
Bakheet T, Williams BR, Khabar KS (2006) ARED 3.0: the large and diverse AU-rich transcriptome. Nucleic Acids Res 34:D111–D114
Brown JW (1999) The ribonuclease P database. Nucleic Acids Res 27:314
Bryne JC, Valen E, Tang MH, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A (2008) JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res 36:D102–D106
Chen YB, Chattopadhyay A, Bergen P, Gadd C, Tannery N (2007) The online bioinformatics resources collection at the university of Pittsburgh health sciences library system–a one-stop gateway to online bioinformatics databases and software tools. Nucleic Acids Res 35:D780–D785
Dai L, Toor N, Olson R, Keeping A, Zimmerly S (2003) Database for mobile group II introns. Nucleic Acids Res 31:424–426
Fernandez-Suarez X, Birney E (2008) Advanced genomic data mining. PLoS Comput Biol 4:e1000121
Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ (2008) miRBase: tools for microRNA genomics. Nucleic Acids Res 36:D154–D158
He S, Liu C, Skogerbo G, Zhao H, Wang J, Liu T, Bai B, Zhao Y, Chen R (2008) NONCODE v2.0: decoding the non-coding. Nucleic Acids Res 36:D170–D172
Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I, Clamp M (2002) The Ensembl genome database project. Nucleic Acids Res 30:38–41
Huttenhower C, Hofmann O (2010) A quick guide to large-scale genomic data mining. PLoS Comput Biol 6:e1000779
Iacono M, Liuni S, Kersey PJ, Duarte J, Saccone C, Pesole G (2005) UTRdb and UTR site: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 33:D141–D146
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467
Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ (2003) The UCSC genome browser database. Nucleic Acids Res 31:51–54
Kin T, Yamada K, Terai G, Okida H, Yoshinari Y, Ono Y, Kojima A, Kimura Y, Komori T, Asai K (2007) fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res 35:D145–D148
Lanckriet GR, De Bie T, Cristianini N, Jordan MI, Noble WS (2004) A statistical framework for genomic data fusion. Bioinformatics 20:2626–2635
Lathe W, Williams J, Mangan M, Karolchik D (2008) Genomic data resources: challenges and promises. Nat Educ 1:2
Lee I, Marcotte EM (2008) Integrating functional genomics data. Methods Mol Biol 453:267–278
Lee JY, Yeh I, Park JY, Tian B (2007) PolyA_DB2: mRNA polyadenylation sites in vertebrate genes. Nucleic Acids Res 35:D165–D168
Leplae R, Hebrant A, Wodak SJ, Toussaint A (2004) ACLAME: a classification of mobile genetic elements. Nucleic Acids Res 32:D45–D49
Lestrade L, Weber MJ (2006) snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res 34:D158–D162
Mahillon J, Chandler M (2006) ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res 34:D32–D36
Manske HM, Kwiatkowski DP (2009) LookSeq: a browser-based viewer for deep sequencing data. Genome Res 19:2125–2132
Mantri Y, Williams KP (2004) Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities. Nucleic Acids Res 32:D55–D58
Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E (2006) TRANSFAC and its module TRANS compel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34:D108–D110
Megraw M, Sethupathy P, Corda B, Hatzigeorgiou AG (2007) miRGen: a database for the study of animal microRNA genomic organization and function. Nucleic Acids Res 35:D149–D155
Ogbe RJ, Ochalefu DO, Olaniru OB (2016) Bioinformatics advances in genomics-A review. Int J Curr Res Rev 8:5–11
Ouyang S, Buell CR (2004) The TIGR plant repeat databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res 32:D360–D363
Pang KC, Stephen S, Dinger ME, Engstrom PG, Lenhard B, Mattick JS (2007) RNAdb 2.0–an expanded database of mammalian non-coding RNAs. Nucleic Acids Res 35:D178–D182
Pruitt KD, Tatusova T, Maglott DR (2005) NCBI reference sequence (Ref-Seq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33:D501–D504
Reddy TBK, Thomas A, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos E, Kyrpides N (2015) The genomes online database (GOLD) v.5: a metadata management system based on a four level (meta) genome project classification. Nucleic Acids Res 43:D1099–D1106
Rosenblad MA, Gorodkin J, Knudsen B, Zwieb C, Samuelsson T (2003) SRPDB: Signal recognition particle database. Nucleic Acids Res 31:363–364
Ruitberg CM, Reeder DJ, Butler JM (2001) STRBase: a short tandem repeat DNA database for the human identity testing community. Nucleic Acids Res 29:320–322
Snustad DP, Simmons MJ (2015) Principles of genetics. 7th edn. John-Wiley & Sons Inc, USA
Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA (2006) ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res 34:D46–D55
Takeda J, Suzuki Y, Nakao M, Kuroda T, Sugano S, Gojobori T, Imanishi T (2007) H-DBAS: alternative splicing database of completely sequenced and manually annotated full-length cDNAs based on H-invitational. Nucleic Acids Res 35:D104–D109
Varma BSC, Paul K, Balakrishnan M (2016) Architecture exploration of FPGA based accelerators for BioInformatics applications. Springer, Singapore, pp 1–121
Wang J, Kong L, Gao G, Luo J (2013) A brief introduction to web-based genome browsers. Brief Bioinform 14:131–143
Washietl S, Hofacker IL (2010) Nucleic acid sequence and structure databases. Methods Mol Biol 609:3–15
Wolfsberg TG (2007) Using the NCBI map viewer to browse genomic sequence data. Curr Protoc Bioinforma, Chapter 1:Unit 1.5.1–25
Zhou Y, Lu C, Wu QJ, Wang Y, Sun ZT, Deng JC, Zhang Y (2008) GISSD: group I intron sequence and structure database. Nucleic Acids Res 36:D31–D37
Acknowledgments
The authors (Mohd. Sayeed Akhtar and Ibrahim A. Alaraidh) are highly grateful to the Department of Botany, Gandhi Faiz-e-Aam College, Shahajahanpur, U.P., India, and the Botany and Microbiology Department, Science College, King Saud University, Riyadh, Kingdom of Saudi Arabia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Akhtar, M.S., Swamy, M.K., Alaraidh, I.A., Panwar, J. (2017). Genomic Data Resources and Data Mining. In: Hakeem, K., Malik, A., Vardar-Sukan, F., Ozturk, M. (eds) Plant Bioinformatics. Springer, Cham. https://doi.org/10.1007/978-3-319-67156-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-67156-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67155-0
Online ISBN: 978-3-319-67156-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)