Skip to main content

Data Mining, Quality and Management in the Life Sciences

  • Protocol
  • First Online:
Data Mining Techniques for the Life Sciences

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2449))

Abstract

With the evermore emphasis put on open science and its invaluable benefits to the scientific community, it is no longer the case where a research project simply ends with a scientific publication. The benefits of data sharing and reproducibility of results have taken the centerpiece within the life science research supported by FAIR principles that firmly underline the importance of open data. The current data-intensive multidisciplinary research has also highlighted the significance of how data is mined and managed. Here we describe some of the features adopted by EMBL-EBI data resources to support data mining, data quality, and data management. We also highlight how EMBL-EBI has responded to the current pandemic through its data resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf.

  2. 2.

    https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf.

  3. 3.

    https://erc.europa.eu/sites/default/files/document/file/ERC_info_document-Open_Research_Data_and_Data_Management_Plans.pdf.

  4. 4.

    https://www.springernature.com/gp/authors/research-data-policy.

  5. 5.

    https://www.ncbi.nlm.nih.gov/snp/.

  6. 6.

    https://www.ebi.ac.uk/eva/.

  7. 7.

    https://www.darwintreeoflife.org/.

  8. 8.

    https://vertebrategenomesproject.org/.

  9. 9.

    http://www.imageh2020.eu/.

References

  1. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE (2015) Big data: astronomical or genomical? PLoS Biol 13(7):e1002195. https://doi.org/10.1371/journal.pbio.1002195

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Goncalves RS, Musen MA (2019) The variable quality of metadata about biological samples used in biomedical experiments. Sci Data 6:190021. https://doi.org/10.1038/sdata.2019.21

    Article  PubMed  PubMed Central  Google Scholar 

  3. Cantelli G, Cochrane G, Brooksbank C, McDonagh E, Flicek P, McEntyre J, Birney E, Apweiler R (2021) The European bioinformatics institute: empowering cooperation in response to a global health crisis. Nucleic Acids Res 49(D1):D29–D37. https://doi.org/10.1093/nar/gkaa1077

    Article  CAS  PubMed  Google Scholar 

  4. Sayers EW, Beck J, Bolton EE, Bourexis D, Brister JR, Canese K, Comeau DC, Funk K, Kim S, Klimke W, Marchler-Bauer A, Landrum M, Lathrop S, Lu Z, Madden TL, O'Leary N, Phan L, Rangwala SH, Schneider VA, Skripchenko Y, Wang J, Ye J, Trawick BW, Pruitt KD, Sherry ST (2021) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 49(D1):D10–D17. https://doi.org/10.1093/nar/gkaa892

    Article  CAS  PubMed  Google Scholar 

  5. Provost F, Fawcett T (2013) Data science and its relationship to big data and data-driven decision making. Big Data 1(1):51–59. https://doi.org/10.1089/big.2013.1508

    Article  PubMed  Google Scholar 

  6. Navarro FCP, Mohsen H, Yan C, Li S, Gu M, Meyerson W, Gerstein M (2019) Genomics and data science: an application within an umbrella. Genome Biol 20(1):109. https://doi.org/10.1186/s13059-019-1724-1

    Article  PubMed  PubMed Central  Google Scholar 

  7. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018. https://doi.org/10.1038/sdata.2016.18

    Article  PubMed  PubMed Central  Google Scholar 

  8. Vines TH, Albert AYK, Andrew RL, Debarre F, Bock DG, Franklin MT, Gilbert KJ, Moore JS, Renaut S, Rennison DJ (2014) The availability of research data declines rapidly with article age. Curr Biol 24(1):94–97. https://doi.org/10.1016/j.cub.2013.11.014

    Article  CAS  PubMed  Google Scholar 

  9. Cook CE, Lopez R, Stroe O, Cochrane G, Brooksbank C, Birney E, Apweiler R (2019) The European bioinformatics institute in 2018: tools, infrastructure and training. Nucleic Acids Res 47(D1):D15–D22. https://doi.org/10.1093/nar/gky1124

    Article  CAS  PubMed  Google Scholar 

  10. Vamathevan J, Apweiler R, Birney E (2019) Biomolecular data resources: bioinformatics infrastructure for biomedical data science. Annu Rev Biomed Data Sci 2(1):199–222. https://doi.org/10.1146/annurev-biodatasci-072018-021321

    Article  Google Scholar 

  11. Williams E, Moore J, Li SW, Rustici G, Tarkowska A, Chessel A, Leo S, Antal B, Ferguson RK, Sarkans U, Brazma A, Salas REC, Swedlow JR (2017) The image data resource: a bioimage data integration and publication platform. Nat Methods 14(8):775–781. https://doi.org/10.1038/nmeth.4326

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Sarkans U, Gostev M, Athar A, Behrangi E, Melnichuk O, Ali A, Minguet J, Rada JC, Snow C, Tikhonov A, Brazma A, McEntyre J (2018) The BioStudies database-one stop shop for all data supporting a life sciences study. Nucleic Acids Res 46(D1):D1266–D1270. https://doi.org/10.1093/nar/gkx965

    Article  CAS  PubMed  Google Scholar 

  13. Iudin A, Korir PK, Salavert-Torres J, Kleywegt GJ, Patwardhan A (2016) EMPIAR: a public archive for raw electron microscopy image data. Nat Methods 13(5):387–388. https://doi.org/10.1038/nmeth.3806

    Article  CAS  PubMed  Google Scholar 

  14. Sarkans U, Fullgrabe A, Ali A, Athar A, Behrangi E, Diaz N, Fexova S, George N, Iqbal H, Kurri S, Munoz J, Rada J, Papatheodorou I, Brazma A (2021) From ArrayExpress to BioStudies. Nucleic Acids Res 49(D1):D1502–D1506. https://doi.org/10.1093/nar/gkaa1062

    Article  CAS  PubMed  Google Scholar 

  15. Esvelt KM, Wang HH (2013) Genome-scale engineering for systems and synthetic biology. Mol Syst Biol 9:641. https://doi.org/10.1038/msb.2012.66

    Article  PubMed  PubMed Central  Google Scholar 

  16. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, McMahon A, Morales J, Mountjoy E, Sollis E, Suveges D, Vrousgou O, Whetzel PL, Amode R, Guillen JA, Riat HS, Trevanion SJ, Hall P, Junkins H, Flicek P, Burdett T, Hindorff LA, Cunningham F, Parkinson H (2019) The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47(D1):D1005–D1012. https://doi.org/10.1093/nar/gky1120

    Article  CAS  PubMed  Google Scholar 

  17. Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, McMahon A, Abraham G, Chapman M, Parkinson H, Danesh J, MacArthur JAL, Inouye M (2021) The polygenic score catalog as an open database for reproducibility and systematic evaluation. Nat Genet 53(4):420–425. https://doi.org/10.1038/s41588-021-00783-5

    Article  CAS  PubMed  Google Scholar 

  18. Wand H, Lambert SA, Tamburro C, Iacocca MA, O’Sullivan JW, Sillari C, Kullo IJ, Rowley R, Dron JS, Brockman D, Venner E, McCarthy MI, Antoniou AC, Easton DF, Hegele RA, Khera AV, Chatterjee N, Kooperberg C, Edwards K, Vlessis K, Kinnear K, Danesh JN, Parkinson H, Ramos EM, Roberts MC, Ormond KE, Khoury MJ, Janssens ACJW, Goddard KAB, Kraft P, MacArthur JAL, Inouye M, Wojcik G (2021) Improving reporting standards for polygenic scores in risk prediction studies. Nature 591(7849):211–219. https://doi.org/10.1101/2020.04.23.20077099

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Durinx C, McEntyre J, Appel R, Apweiler R, Barlow M, Blomberg N, Cook C, Gasteiger E, Kim JH, Lopez R, Redaschi N, Stockinger H, Teixeira D, Valencia A (2016) Identifying ELIXIR Core Data Resources. F1000Res:5. https://doi.org/10.12688/f1000research.9656.2

  20. Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J, Billis K, Boddu S, Charkhchi M, Cummins C, Da Rin FL, Davidson C, Dodiya K, El Houdaigui B, Fatima R, Gall A, Garcia Giron C, Grego T, Guijarro-Clarke C, Haggerty L, Hemrom A, Hourlier T, Izuogu OG, Juettemann T, Kaikala V, Kay M, Lavidas I, Le T, Lemos D, Gonzalez Martinez J, Marugan JC, Maurel T, McMahon AC, Mohanan S, Moore B, Muffato M, Oheh DN, Paraschas D, Parker A, Parton A, Prosovetskaia I, Sakthivel MP, Salam AIA, Schmitt BM, Schuilenburg H, Sheppard D, Steed E, Szpak M, Szuba M, Taylor K, Thormann A, Threadgold G, Walts B, Winterbottom A, Chakiachvili M, Chaubal A, De Silva N, Flint B, Frankish A, Hunt SE, IIsley GR, Langridge N, Loveland JE, Martin FJ, Mudge JM, Morales J, Perry E, Ruffier M, Tate J, Thybert D, Trevanion SJ, Cunningham F, Yates AD, Zerbino DR, Flicek P (2021) Ensembl 2021. Nucleic Acids Res 49(D1):D884–D891. https://doi.org/10.1093/nar/gkaa942

    Article  CAS  PubMed  Google Scholar 

  21. Brunak S, Danchin A, Hattori M, Nakamura H, Shinozaki K, Matise T, Preuss D (2002) Nucleotide sequence database policies. Science 298(5597):1333. https://doi.org/10.1126/science.298.5597.1333b

    Article  CAS  PubMed  Google Scholar 

  22. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P, Cunningham F (2016) The Ensembl variant effect predictor. Genome Biol 17(1):122. https://doi.org/10.1186/s13059-016-0974-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ochoa D, Hercules A, Carmona M, Suveges D, Gonzalez-Uriarte A, Malangone C, Miranda A, Fumis L, Carvalho-Silva D, Spitzer M, Baker J, Ferrer J, Raies A, Razuvayevskaya O, Faulconbridge A, Petsalaki E, Mutowo P, Machlitt-Northen S, Peat G, McAuley E, Ong CK, Mountjoy E, Ghoussaini M, Pierleoni A, Papa E, Pignatelli M, Koscielny G, Karim M, Schwartzentruber J, Hulcoop DG, Dunham I, McDonagh EM (2021) Open Targets Platform: supporting systematic drug-target identification and prioritisation. Nucleic Acids Res 49(D1):D1302–D1310. https://doi.org/10.1093/nar/gkaa1027

  24. Ghoussaini M, Mountjoy E, Carmona M, Peat G, Schmidt EM, Hercules A, Fumis L, Miranda A, Carvalho-Silva D, Buniello A, Burdett T, Hayhurst J, Baker J, Ferrer J, Gonzalez-Uriarte A, Jupp S, Karim MA, Koscielny G, Machlitt-Northen S, Malangone C, Pendlington ZM, Roncaglia P, Suveges D, Wright D, Vrousgou O, Papa E, Parkinson H, MacArthur JAL, Todd JA, Barrett JC, Schwartzentruber J, Hulcoop DG, Ochoa D, McDonagh EM, Dunham I (2021) Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res 49(D1):D1311–D1320. https://doi.org/10.1093/nar/gkaa840

  25. Golestan Hashemi FS, Razi Ismail M, Rafii Yusop M, Golestan Hashemi MS, Nadimi Shahraki MH, Rastegari H, Miah G, Aslani F (2017) Intelligent mining of large-scale bio-data: bioinformatics applications. Biotechnol Biotechnol Equip 32(1):10–29. https://doi.org/10.1080/13102818.2017.1364977

    Article  CAS  Google Scholar 

  26. Lan K, Wang DT, Fong S, Liu LS, Wong KKL, Dey N (2018) A survey of data mining and deep learning in bioinformatics. J Med Syst 42(8):139. https://doi.org/10.1007/s10916-018-1003-9

    Article  PubMed  Google Scholar 

  27. Ferguson C, Araujo D, Faulk L, Gou Y, Hamelers A, Huang Z, Ide-Smith M, Levchenko M, Marinos N, Nambiar R, Nassar M, Parkin M, Pi X, Rahman F, Rogers F, Roochun Y, Saha S, Selim M, Shafique Z, Sharma S, Stephenson D, Talo F, Thouvenin A, Tirunagari S, Vartak V, Venkatesan A, Yang X, McEntyre J (2021) Europe PMC in 2020. Nucleic Acids Res 49(D1):D1507–D1514. https://doi.org/10.1093/nar/gkaa994

    Article  CAS  PubMed  Google Scholar 

  28. Mitchell AL, Almeida A, Beracochea M, Boland M, Burgin J, Cochrane G, Crusoe MR, Kale V, Potter SC, Richardson LJ, Sakharova E, Scheremetjew M, Korobeynikov A, Shlemov A, Kunyavskaya O, Lapidus A, Finn RD (2020) MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res 48(D1):D570–D578. https://doi.org/10.1093/nar/gkz1035

    Article  CAS  PubMed  Google Scholar 

  29. Leonelli S (2019) The challenges of big data biology. elife 8. https://doi.org/10.7554/eLife.47381

  30. Cai L, Zhu Y (2015) The challenges of data quality and data quality assessment in the big data era. Data Sci J 14. https://doi.org/10.5334/dsj-2015-002

  31. Leonelli S (2017) Global data quality assessment and the situated nature of “best” research practices in biology. Data Sci J 16. https://doi.org/10.5334/dsj-2017-032

  32. Burge S, Attwood TK, Bateman A, Berardini TZ, Cherry M, O'Donovan C, Xenarios L, Gaudet P (2012) Biocurators and biocuration: surveying the 21st century challenges. Database (Oxford) 2012:bar059. https://doi.org/10.1093/database/bar059

    Article  CAS  Google Scholar 

  33. Perrier L, Blondal E, Ayala AP, Dearborn D, Kenny T, Lightfoot D, Reka R, Thuna M, Trimble L, MacDonald H (2017) Research data management in academic institutions: a scoping review. PLoS One 12(5):e0178261. https://doi.org/10.1371/journal.pone.0178261

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Pinfield S, Cox AM, Smith J (2014) Research data management and libraries: relationships, activities, drivers and influences. PLoS One 9(12):e114734. https://doi.org/10.1371/journal.pone.0114734

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Haug K, Cochrane K, Nainala VC, Williams M, Chang J, Jayaseelan KV, O'Donovan C (2020) MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res 48(D1):D440–D444. https://doi.org/10.1093/nar/gkz1019

    Article  CAS  PubMed  Google Scholar 

  36. Consortium P-K (2020) PDBe-KB: a community-driven resource for structural and functional annotations. Nucleic Acids Res 48(D1):D344–D353. https://doi.org/10.1093/nar/gkz853

    Article  CAS  Google Scholar 

  37. Harrison PW, Ahamed A, Aslam R, Alako BTF, Burgin J, Buso N, Courtot M, Fan J, Gupta D, Haseeb M, Holt S, Ibrahim T, Ivanov E, Jayathilaka S, Balavenkataraman Kadhirvelu V, Kumar M, Lopez R, Kay S, Leinonen R, Liu X, O'Cathail C, Pakseresht A, Park Y, Pesant S, Rahman N, Rajan J, Sokolov A, Vijayaraja S, Waheed Z, Zyoud A, Burdett T, Cochrane G (2021) The European Nucleotide Archive in 2020. Nucleic Acids Res 49(D1):D82–D85. https://doi.org/10.1093/nar/gkaa1028

  38. Courtot M, Cherubin L, Faulconbridge A, Vaughan D, Green M, Richardson D, Harrison P, Whetzel PL, Parkinson H, Burdett T (2019) BioSamples database: an updated sample metadata hub. Nucleic Acids Res 47(D1):D1172–D1178. https://doi.org/10.1093/nar/gky1061

    Article  PubMed  Google Scholar 

  39. Perez-Riverol Y, Csordas A, Bai J, Bernal-Llinares M, Hewapathirana S, Kundu DJ, Inuganti A, Griss J, Mayer G, Eisenacher M, Perez E, Uszkoreit J, Pfeuffer J, Sachsenberg T, Yilmaz S, Tiwary S, Cox J, Audain E, Walzer M, Jarnuczak AF, Ternent T, Brazma A, Vizcaino JA (2019) The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 47(D1):D442–D450. https://doi.org/10.1093/nar/gky1106

    Article  CAS  PubMed  Google Scholar 

  40. Athar A, Fullgrabe A, George N, Iqbal H, Huerta L, Ali A, Snow C, Fonseca NA, Petryszak R, Papatheodorou I, Sarkans U, Brazma A (2019) ArrayExpress update—from bulk to single-cell expression data. Nucleic Acids Res 47(D1):D711–D715. https://doi.org/10.1093/nar/gky964

    Article  CAS  PubMed  Google Scholar 

  41. UniProt C (2021) UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 49(D1):D480–D489. https://doi.org/10.1093/nar/gkaa1100

    Article  CAS  Google Scholar 

  42. Lappalainen I, Almeida-King J, Kumanduri V, Senf A, Spalding JD, Ur-Rehman S, Saunders G, Kandasamy J, Caccamo M, Leinonen R, Vaughan B, Laurent T, Rowland F, Marin-Garcia P, Barker J, Jokinen P, Torres AC, de Argila JR, Llobet OM, Medina I, Puy MS, Alberich M, de la Torre S, Navarro A, Paschall J, Flicek P (2015) The European Genome-phenome Archive of human data consented for biomedical research. Nat Genet 47(7):692–695. https://doi.org/10.1038/ng.3312

  43. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29(4):365–371. https://doi.org/10.1038/ng1201-365

    Article  CAS  PubMed  Google Scholar 

  44. McEntyre J, Sarkans U, Brazma A (2015) The BioStudies database. Mol Syst Biol 11(12):847. https://doi.org/10.15252/msb.20156658

    Article  PubMed  PubMed Central  Google Scholar 

  45. Gabella C, Durinx C, Appel R (2017) Funding knowledgebases: Towards a sustainable funding model for the UniProt use case. F1000Res 6. https://doi.org/10.12688/f1000research.12989.2

  46. Anderson W, Apweiler R, Bateman A, Bauer GA, Berman H, Blake JA, Blomberg N, Burley SK, Cochrane G, Di Francesco V, Donohue T, Durinx C, Game A, Green ED, Gojobori T, Goodhand P, Hamosh A, Hermjakob H, Kanehisa M, Kiley R, McEntyre J, McKibbin R, Miyano S, Pauly B, Perrimon N, Ragan MA, Richards G, Teo YY, Westerfield M, Westhof E, Lasko PF (2017) Towards coordinated international support of core data resources for the life sciences. bioRxiv. https://doi.org/10.1101/110825

  47. Drysdale R, Cook CE, Petryszak R, Baillie-Gerritsen V, Barlow M, Gasteiger E, Gruhl F, Haas J, Lanfear J, Lopez R, Redaschi N, Stockinger H, Teixeira D, Venkatesan A, Elixir Core Data Resource Forum, Blomberg N, Durinx C, McEntyre J (2020) The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences. Bioinformatics 36(8):2636–2642. https://doi.org/10.1093/bioinformatics/btz959

  48. Abbott S, Iudin A, Korir PK, Somasundharam S, Patwardhan A (2018) EMDB web resources. Curr Protoc Bioinformatics 61(1):5.10.1–5.10.12. https://doi.org/10.1002/cpbi.48

    Article  Google Scholar 

  49. Harrison PW, Lopez R, Rahman N, Allen SG, Aslam R, Buso N, Cummins C, Fathy Y, Felix E, Glont M, Jayathilaka S, Kadam S, Kumar M, Lauer KB, Malhotra G, Mosaku A, Edbali O, Park YM, Parton A, Pearce M, Estrada Pena JF, Rossetto J, Russell C, Selvakumar S, Sitja XP, Sokolov A, Thorne R, Ventouratou M, Walter P, Yordanova G, Zadissa A, Cochrane G, Blomberg N, Apweiler R (2021) The COVID-19 Data Portal: accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing. Nucleic Acids Res. https://doi.org/10.1093/nar/gkab417

  50. Mendez D, Gaulton A, Bento AP, Chambers J, De Veij M, Felix E, Magarinos MP, Mosquera JF, Mutowo P, Nowotka M, Gordillo-Maranon M, Hunter F, Junco L, Mugumbate G, Rodriguez-Lopez M, Atkinson F, Bosc N, Radoux CJ, Segura-Cabrera A, Hersey A, Leach AR (2019) ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res 47(D1):D930–D940. https://doi.org/10.1093/nar/gky1075

    Article  CAS  PubMed  Google Scholar 

  51. The RC (2019) RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res 47(D1):D221–D229. https://doi.org/10.1093/nar/gky1034

    Article  CAS  Google Scholar 

  52. Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RD (2021) The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 49(D1):D344–D354. https://doi.org/10.1093/nar/gkaa977

    Article  CAS  PubMed  Google Scholar 

  53. Kalvari I, Nawrocki EP, Ontiveros-Palacios N, Argasinska J, Lamkiewicz K, Marz M, Griffiths-Jones S, Toffano-Nioche C, Gautheret D, Weinberg Z, Rivas E, Eddy SR, Finn RD, Bateman A, Petrov AI (2021) Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res 49(D1):D192–D200. https://doi.org/10.1093/nar/gkaa1047

    Article  CAS  PubMed  Google Scholar 

  54. Huntley RP, Sawford T, Mutowo-Meullenet P, Shypitsyna A, Bonilla C, Martin MJ, O'Donovan C (2015) The GOA database: gene ontology annotation updates for 2015. Nucleic Acids Res 43(Database issue):D1057–D1063. https://doi.org/10.1093/nar/gku1113

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Gaia Cantelli and Jessica Vamathevan for help with preparing the manuscript and Spencer Phillips for artwork.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amonida Zadissa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Zadissa, A., Apweiler, R. (2022). Data Mining, Quality and Management in the Life Sciences. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 2449. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2095-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2095-3_1

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2094-6

  • Online ISBN: 978-1-0716-2095-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics