Genomic Data Resources and Data Mining

Akhtar, Mohd Sayeed; Swamy, Mallappa Kumara; Alaraidh, Ibrahim A.; Panwar, Jitendra

doi:10.1007/978-3-319-67156-7_10

Mohd Sayeed Akhtar⁵,
Mallappa Kumara Swamy^6,7,
Ibrahim A. Alaraidh⁸ &
…
Jitendra Panwar⁹

1378 Accesses
2 Citations

Abstract

Genome is considered as the carrier of hereditary information and the operative system of an organism. The genomic data are in the form of sequentially arranged nucleotide base pairs. The data mining of genome resources is chiefly based on computational tools acknowledged as computational genome annotation. The computational genome annotation may be either structural or functional. The structural annotation refers to the identification of hypothetical genes in a DNA sequence using computational algorithms, while the functional annotation is assigned as the functions to the predicted genes using sequence similarity searches against other genes of known function. The aim of this chapter is to focus on the genomic resources and mining of genomic databases using the computational tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Update on Genomic Databases and Resources at the National Center for Biotechnology Information

Development of Biological Databases for Genomic Research

Bioinformatics in Next-Generation Genome Sequencing

References

Babu PA, Udyama J, Kumar RK, Boddepalli R, Mangala DS, Rao GP (2007) DoD2007: 1082 molecular biology databases. Bioinformation 2:64–67
Article PubMed PubMed Central Google Scholar
Bakheet T, Williams BR, Khabar KS (2006) ARED 3.0: the large and diverse AU-rich transcriptome. Nucleic Acids Res 34:D111–D114
Article CAS PubMed Google Scholar
Brown JW (1999) The ribonuclease P database. Nucleic Acids Res 27:314
Article CAS PubMed PubMed Central Google Scholar
Bryne JC, Valen E, Tang MH, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A (2008) JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res 36:D102–D106
Article CAS PubMed Google Scholar
Chen YB, Chattopadhyay A, Bergen P, Gadd C, Tannery N (2007) The online bioinformatics resources collection at the university of Pittsburgh health sciences library system–a one-stop gateway to online bioinformatics databases and software tools. Nucleic Acids Res 35:D780–D785
Article CAS PubMed Google Scholar
Dai L, Toor N, Olson R, Keeping A, Zimmerly S (2003) Database for mobile group II introns. Nucleic Acids Res 31:424–426
Article CAS PubMed PubMed Central Google Scholar
Fernandez-Suarez X, Birney E (2008) Advanced genomic data mining. PLoS Comput Biol 4:e1000121
Article PubMed PubMed Central Google Scholar
Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ (2008) miRBase: tools for microRNA genomics. Nucleic Acids Res 36:D154–D158
Article CAS PubMed Google Scholar
He S, Liu C, Skogerbo G, Zhao H, Wang J, Liu T, Bai B, Zhao Y, Chen R (2008) NONCODE v2.0: decoding the non-coding. Nucleic Acids Res 36:D170–D172
Article CAS PubMed Google Scholar
Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, Durbin R, Eyras E, Gilbert J, Hammond M, Huminiecki L, Kasprzyk A, Lehvaslaiho H, Lijnzaad P, Melsopp C, Mongin E, Pettett R, Pocock M, Potter S, Rust A, Schmidt E, Searle S, Slater G, Smith J, Spooner W, Stabenau A, Stalker J, Stupka E, Ureta-Vidal A, Vastrik I, Clamp M (2002) The Ensembl genome database project. Nucleic Acids Res 30:38–41
Article CAS PubMed PubMed Central Google Scholar
Huttenhower C, Hofmann O (2010) A quick guide to large-scale genomic data mining. PLoS Comput Biol 6:e1000779
Article PubMed PubMed Central Google Scholar
Iacono M, Liuni S, Kersey PJ, Duarte J, Saccone C, Pesole G (2005) UTRdb and UTR site: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 33:D141–D146
Article PubMed Google Scholar
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467
Article CAS PubMed Google Scholar
Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ (2003) The UCSC genome browser database. Nucleic Acids Res 31:51–54
Article CAS PubMed PubMed Central Google Scholar
Kin T, Yamada K, Terai G, Okida H, Yoshinari Y, Ono Y, Kojima A, Kimura Y, Komori T, Asai K (2007) fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res 35:D145–D148
Article CAS PubMed Google Scholar
Lanckriet GR, De Bie T, Cristianini N, Jordan MI, Noble WS (2004) A statistical framework for genomic data fusion. Bioinformatics 20:2626–2635
Article CAS PubMed Google Scholar
Lathe W, Williams J, Mangan M, Karolchik D (2008) Genomic data resources: challenges and promises. Nat Educ 1:2
Google Scholar
Lee I, Marcotte EM (2008) Integrating functional genomics data. Methods Mol Biol 453:267–278
Article CAS PubMed Google Scholar
Lee JY, Yeh I, Park JY, Tian B (2007) PolyA_DB2: mRNA polyadenylation sites in vertebrate genes. Nucleic Acids Res 35:D165–D168
Article CAS PubMed PubMed Central Google Scholar
Leplae R, Hebrant A, Wodak SJ, Toussaint A (2004) ACLAME: a classification of mobile genetic elements. Nucleic Acids Res 32:D45–D49
Article CAS PubMed PubMed Central Google Scholar
Lestrade L, Weber MJ (2006) snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res 34:D158–D162
Article CAS PubMed Google Scholar
Mahillon J, Chandler M (2006) ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res 34:D32–D36
Article PubMed Google Scholar
Manske HM, Kwiatkowski DP (2009) LookSeq: a browser-based viewer for deep sequencing data. Genome Res 19:2125–2132
Article CAS PubMed PubMed Central Google Scholar
Mantri Y, Williams KP (2004) Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities. Nucleic Acids Res 32:D55–D58
Article CAS PubMed PubMed Central Google Scholar
Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E (2006) TRANSFAC and its module TRANS compel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34:D108–D110
Article CAS PubMed Google Scholar
Megraw M, Sethupathy P, Corda B, Hatzigeorgiou AG (2007) miRGen: a database for the study of animal microRNA genomic organization and function. Nucleic Acids Res 35:D149–D155
Article CAS PubMed Google Scholar
Ogbe RJ, Ochalefu DO, Olaniru OB (2016) Bioinformatics advances in genomics-A review. Int J Curr Res Rev 8:5–11
Google Scholar
Ouyang S, Buell CR (2004) The TIGR plant repeat databases: a collective resource for the identification of repetitive sequences in plants. Nucleic Acids Res 32:D360–D363
Article CAS PubMed PubMed Central Google Scholar
Pang KC, Stephen S, Dinger ME, Engstrom PG, Lenhard B, Mattick JS (2007) RNAdb 2.0–an expanded database of mammalian non-coding RNAs. Nucleic Acids Res 35:D178–D182
Article CAS PubMed Google Scholar
Pruitt KD, Tatusova T, Maglott DR (2005) NCBI reference sequence (Ref-Seq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 33:D501–D504
Article CAS PubMed Google Scholar
Reddy TBK, Thomas A, Stamatis D, Bertsch J, Isbandi M, Jansson J, Mallajosyula J, Pagani I, Lobos E, Kyrpides N (2015) The genomes online database (GOLD) v.5: a metadata management system based on a four level (meta) genome project classification. Nucleic Acids Res 43:D1099–D1106
Article CAS PubMed Google Scholar
Rosenblad MA, Gorodkin J, Knudsen B, Zwieb C, Samuelsson T (2003) SRPDB: Signal recognition particle database. Nucleic Acids Res 31:363–364
Article CAS PubMed PubMed Central Google Scholar
Ruitberg CM, Reeder DJ, Butler JM (2001) STRBase: a short tandem repeat DNA database for the human identity testing community. Nucleic Acids Res 29:320–322
Article CAS PubMed PubMed Central Google Scholar
Snustad DP, Simmons MJ (2015) Principles of genetics. 7th edn. John-Wiley & Sons Inc, USA
Google Scholar
Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA (2006) ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res 34:D46–D55
Article CAS PubMed Google Scholar
Takeda J, Suzuki Y, Nakao M, Kuroda T, Sugano S, Gojobori T, Imanishi T (2007) H-DBAS: alternative splicing database of completely sequenced and manually annotated full-length cDNAs based on H-invitational. Nucleic Acids Res 35:D104–D109
Article CAS PubMed Google Scholar
Varma BSC, Paul K, Balakrishnan M (2016) Architecture exploration of FPGA based accelerators for BioInformatics applications. Springer, Singapore, pp 1–121
Google Scholar
Wang J, Kong L, Gao G, Luo J (2013) A brief introduction to web-based genome browsers. Brief Bioinform 14:131–143
Article PubMed Google Scholar
Washietl S, Hofacker IL (2010) Nucleic acid sequence and structure databases. Methods Mol Biol 609:3–15
Article CAS PubMed Google Scholar
Wolfsberg TG (2007) Using the NCBI map viewer to browse genomic sequence data. Curr Protoc Bioinforma, Chapter 1:Unit 1.5.1–25
Google Scholar
Zhou Y, Lu C, Wu QJ, Wang Y, Sun ZT, Deng JC, Zhang Y (2008) GISSD: group I intron sequence and structure database. Nucleic Acids Res 36:D31–D37
Article CAS PubMed Google Scholar

Download references

Acknowledgments

The authors (Mohd. Sayeed Akhtar and Ibrahim A. Alaraidh) are highly grateful to the Department of Botany, Gandhi Faiz-e-Aam College, Shahajahanpur, U.P., India, and the Botany and Microbiology Department, Science College, King Saud University, Riyadh, Kingdom of Saudi Arabia.

Author information

Authors and Affiliations

Department of Botany, Gandhi Faiz-e-Aam College, Shahjahanpur, 242001, Uttar Pradesh, India
Mohd Sayeed Akhtar
Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, 43400, Serdang, Selangor, Malaysia
Mallappa Kumara Swamy
Department of Biotechnology, Padmashree Institute of Management and Sciences, Kommagatta, Kengeri, Bangalore, 560060, Karnataka, India
Mallappa Kumara Swamy
Botany and Microbiology Department, King Saud University, Science College, P.O. Box 2455, Riyadh, 11451, Saudi Arabia
Ibrahim A. Alaraidh
Department of Biological Sciences, Birla Institute of Technology and Science, Pilani, 333031, Rajasthan, India
Jitendra Panwar

Authors

Mohd Sayeed Akhtar
View author publications
You can also search for this author in PubMed Google Scholar
Mallappa Kumara Swamy
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim A. Alaraidh
View author publications
You can also search for this author in PubMed Google Scholar
Jitendra Panwar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohd Sayeed Akhtar .

Editor information

Editors and Affiliations

Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
Khalid Rehman Hakeem
Department of Microbiology and Molecular Biology, Chungnam National University, Daejeon, Korea (Republic of)
Adeel Malik
Department of Bioengineering, Faculty of Engineering, Ege University, Bornova, İzmir, Turkey
Fazilet Vardar-Sukan
Centre for Environmental Studies & Botany Department, Ege University, Bornova, İzmir, Turkey
Munir Ozturk

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Akhtar, M.S., Swamy, M.K., Alaraidh, I.A., Panwar, J. (2017). Genomic Data Resources and Data Mining. In: Hakeem, K., Malik, A., Vardar-Sukan, F., Ozturk, M. (eds) Plant Bioinformatics. Springer, Cham. https://doi.org/10.1007/978-3-319-67156-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-67156-7_10
Published: 22 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67155-0
Online ISBN: 978-3-319-67156-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Genomic Data Resources and Data Mining

Abstract

Access this chapter

Similar content being viewed by others

Update on Genomic Databases and Resources at the National Center for Biotechnology Information

Development of Biological Databases for Genomic Research

Bioinformatics in Next-Generation Genome Sequencing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Genomic Data Resources and Data Mining

Abstract

Access this chapter

Similar content being viewed by others

Update on Genomic Databases and Resources at the National Center for Biotechnology Information

Development of Biological Databases for Genomic Research

Bioinformatics in Next-Generation Genome Sequencing

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation