Skip to main content

Genomic Data Resources and Data Mining

  • Chapter
  • First Online:
Plant Bioinformatics

Abstract

Genome is considered as the carrier of hereditary information and the operative system of an organism. The genomic data are in the form of sequentially arranged nucleotide base pairs. The data mining of genome resources is chiefly based on computational tools acknowledged as computational genome annotation. The computational genome annotation may be either structural or functional. The structural annotation refers to the identification of hypothetical genes in a DNA sequence using computational algorithms, while the functional annotation is assigned as the functions to the predicted genes using sequence similarity searches against other genes of known function. The aim of this chapter is to focus on the genomic resources and mining of genomic databases using the computational tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

References

  • Babu PA, Udyama J, Kumar RK, Boddepalli R, Mangala DS, Rao GP (2007) DoD2007: 1082 molecular biology databases. Bioinformation 2:64–67

    Article  PubMed  PubMed Central  Google Scholar 

  • Bakheet T, Williams BR, Khabar KS (2006) ARED 3.0: the large and diverse AU-rich transcriptome. Nucleic Acids Res 34:D111–D114

    Article  CAS  PubMed  Google Scholar 

  • Brown JW (1999) The ribonuclease P database. Nucleic Acids Res 27:314

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bryne JC, Valen E, Tang MH, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A (2008) JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res 36:D102–D106

    Article  CAS  PubMed  Google Scholar 

  • Chen YB, Chattopadhyay A, Bergen P, Gadd C, Tannery N (2007) The online bioinformatics resources collection at the university of Pittsburgh health sciences library system–a one-stop gateway to online bioinformatics databases and software tools. Nucleic Acids Res 35:D780–D785

    Article  CAS  PubMed  Google Scholar 

  • Dai L, Toor N, Olson R, Keeping A, Zimmerly S (2003) Database for mobile group II introns. Nucleic Acids Res 31:424–426

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fernandez-Suarez X, Birney E (2008) Advanced genomic data mining. PLoS Comput Biol 4:e1000121

    Article  PubMed  PubMed Central  Google Scholar 

  • Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ (2008) miRBase: tools for microRNA genomics. Nucleic Acids Res 36:D154–D158

    Article  CAS  PubMed  Google Scholar 

  • He S, Liu C, Skogerbo G, Zhao H, Wang J, Liu T, Bai B, Zhao Y, Chen R (2008) NONCODE v2.0: decoding the non-coding. Nucleic Acids Res 36:D170–D172

    Article  CAS  PubMed  Google Scholar 

  • Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I, Clamp M (2002) The Ensembl genome database project. Nucleic Acids Res 30:38–41

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Huttenhower C, Hofmann O (2010) A quick guide to large-scale genomic data mining. PLoS Comput Biol 6:e1000779

    Article  PubMed  PubMed Central  Google Scholar 

  • Iacono M, Liuni S, Kersey PJ, Duarte J, Saccone C, Pesole G (2005) UTRdb and UTR site: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 33:D141–D146

    Article  PubMed  Google Scholar 

  • Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467

    Article  CAS  PubMed  Google Scholar 

  • Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ (2003) The UCSC genome browser database. Nucleic Acids Res 31:51–54

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kin T, Yamada K, Terai G, Okida H, Yoshinari Y, Ono Y, Kojima A, Kimura Y, Komori T, Asai K (2007) fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res 35:D145–D148

    Article  CAS  PubMed  Google Scholar 

  • Lanckriet GR, De Bie T, Cristianini N, Jordan MI, Noble WS (2004) A statistical framework for genomic data fusion. Bioinformatics 20:2626–2635

    Article  CAS  PubMed  Google Scholar 

  • Lathe W, Williams J, Mangan M, Karolchik D (2008) Genomic data resources: challenges and promises. Nat Educ 1:2

    Google Scholar 

  • Lee I, Marcotte EM (2008) Integrating functional genomics data. Methods Mol Biol 453:267–278

    Article  CAS  PubMed  Google Scholar 

  • Lee JY, Yeh I, Park JY, Tian B (2007) PolyA_DB2: mRNA polyadenylation sites in vertebrate genes. Nucleic Acids Res 35:D165–D168

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Leplae R, Hebrant A, Wodak SJ, Toussaint A (2004) ACLAME: a classification of mobile genetic elements. Nucleic Acids Res 32:D45–D49

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Lestrade L, Weber MJ (2006) snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res 34:D158–D162

    Article  CAS  PubMed  Google Scholar 

  • Mahillon J, Chandler M (2006) ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res 34:D32–D36

    Article  PubMed  Google Scholar 

  • Manske HM, Kwiatkowski DP (2009) LookSeq: a browser-based viewer for deep sequencing data. Genome Res 19:2125–2132

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mantri Y, Williams KP (2004) Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities. Nucleic Acids Res 32:D55–D58

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E (2006) TRANSFAC and its module TRANS compel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34:D108–D110

    Article  CAS  PubMed  Google Scholar 

  • Megraw M, Sethupathy P, Corda B, Hatzigeorgiou AG (2007) miRGen: a database for the study of animal microRNA genomic organization and function. Nucleic Acids Res 35:D149–D155

    Article  CAS  PubMed  Google Scholar 

  • Ogbe RJ, Ochalefu DO, Olaniru OB (2016) Bioinformatics advances in genomics-A review. Int J Curr Res Rev 8:5–11

    Google Scholar 

  • Ouyang S, Buell CR (2004) The TIGR plant repeat databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res 32:D360–D363

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Pang KC, Stephen S, Dinger ME, Engstrom PG, Lenhard B, Mattick JS (2007) RNAdb 2.0–an expanded database of mammalian non-coding RNAs. Nucleic Acids Res 35:D178–D182

    Article  CAS  PubMed  Google Scholar 

  • Pruitt KD, Tatusova T, Maglott DR (2005) NCBI reference sequence (Ref-Seq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33:D501–D504

    Article  CAS  PubMed  Google Scholar 

  • Reddy TBK, Thomas A, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos E, Kyrpides N (2015) The genomes online database (GOLD) v.5: a metadata management system based on a four level (meta) genome project classification. Nucleic Acids Res 43:D1099–D1106

    Article  CAS  PubMed  Google Scholar 

  • Rosenblad MA, Gorodkin J, Knudsen B, Zwieb C, Samuelsson T (2003) SRPDB: Signal recognition particle database. Nucleic Acids Res 31:363–364

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ruitberg CM, Reeder DJ, Butler JM (2001) STRBase: a short tandem repeat DNA database for the human identity testing community. Nucleic Acids Res 29:320–322

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Snustad DP, Simmons MJ (2015) Principles of genetics. 7th edn. John-Wiley & Sons Inc, USA

    Google Scholar 

  • Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA (2006) ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res 34:D46–D55

    Article  CAS  PubMed  Google Scholar 

  • Takeda J, Suzuki Y, Nakao M, Kuroda T, Sugano S, Gojobori T, Imanishi T (2007) H-DBAS: alternative splicing database of completely sequenced and manually annotated full-length cDNAs based on H-invitational. Nucleic Acids Res 35:D104–D109

    Article  CAS  PubMed  Google Scholar 

  • Varma BSC, Paul K, Balakrishnan M (2016) Architecture exploration of FPGA based accelerators for BioInformatics applications. Springer, Singapore, pp 1–121

    Google Scholar 

  • Wang J, Kong L, Gao G, Luo J (2013) A brief introduction to web-based genome browsers. Brief Bioinform 14:131–143

    Article  PubMed  Google Scholar 

  • Washietl S, Hofacker IL (2010) Nucleic acid sequence and structure databases. Methods Mol Biol 609:3–15

    Article  CAS  PubMed  Google Scholar 

  • Wolfsberg TG (2007) Using the NCBI map viewer to browse genomic sequence data. Curr Protoc Bioinforma, Chapter 1:Unit 1.5.1–25

    Google Scholar 

  • Zhou Y, Lu C, Wu QJ, Wang Y, Sun ZT, Deng JC, Zhang Y (2008) GISSD: group I intron sequence and structure database. Nucleic Acids Res 36:D31–D37

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The authors (Mohd. Sayeed Akhtar and Ibrahim A. Alaraidh) are highly grateful to the Department of Botany, Gandhi Faiz-e-Aam College, Shahajahanpur, U.P., India, and the Botany and Microbiology Department, Science College, King Saud University, Riyadh, Kingdom of Saudi Arabia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohd Sayeed Akhtar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Akhtar, M.S., Swamy, M.K., Alaraidh, I.A., Panwar, J. (2017). Genomic Data Resources and Data Mining. In: Hakeem, K., Malik, A., Vardar-Sukan, F., Ozturk, M. (eds) Plant Bioinformatics. Springer, Cham. https://doi.org/10.1007/978-3-319-67156-7_10

Download citation

Publish with us

Policies and ethics