Skip to main content

Omics Data Management and Annotation

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 719))

Abstract

Technological Omics breakthroughs, including next generation sequencing, bring avalanches of data which need to undergo effective data management to ensure integrity, security, and maximal knowledge-gleaning. Data management system requirements include flexible input formats, diverse data entry mechanisms and views, user friendliness, attention to standards, hardware and software platform definition, as well as robustness. Relevant solutions elaborated by the scientific community include Laboratory Information Management Systems (LIMS) and standardization protocols facilitating data sharing and managing. In project planning, special consideration has to be made when choosing relevant Omics annotation sources, since many of them overlap and require sophisticated integration heuristics. The data modeling step defines and categorizes the data into objects (e.g., genes, articles, disorders) and creates an application flow. A data storage/warehouse mechanism must be selected, such as file-based systems and relational databases, the latter typically used for larger projects. Omics project life cycle considerations must include the definition and deployment of new versions, incorporating either full or partial updates. Finally, quality assurance (QA) procedures must validate data and feature integrity, as well as system performance expectations. We illustrate these data management principles with examples from the life cycle of the GeneCards Omics project (http://www.genecards.org), a comprehensive, widely used compendium of annotative information about human genes. For example, the GeneCards infrastructure has recently been changed from text files to a relational database, enabling better organization and views of the growing data. Omics data handling benefits from the wealth of Web-based information, the vast amount of public domain software, increasingly affordable hardware, and effective use of data management and annotation principles as outlined in this chapter.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Liolios, K., Mavromatis, K., Tavernarakis, N., and Kyrpides, N. C. (2008) The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 36, 475–9.

    Article  Google Scholar 

  2. Data Management International, http://www.dama.org/i4a/pages/index.cfm?pageid=1.

  3. Tech FAQ. What is Data Management?, http://www.tech-faq.com/data-management.shtml.

  4. Chaussabel, D., Ueno, H., Banchereau, J., and Quinn, C. (2009) Data management: it starts at the bench. Nat Immunol 10, 1225–7.

    Article  PubMed  CAS  Google Scholar 

  5. Aebersold, R., and Mann, M. (2003) Mass spectrometry-based proteomics. Nature 422, 198–207.

    Article  PubMed  CAS  Google Scholar 

  6. Batley, J., and Edwards, D. (2009) Genome sequence data: management, storage, and visualization. Biotechniques 46, 333–6.

    Article  PubMed  CAS  Google Scholar 

  7. Wilkins, M. R., Pasquali, C., Appel, R. D., Ou, K., Golaz, O., Sanchez, J. C., Yan, J. X., Gooley, A. A., Hughes, G., Humphery-Smith, I., Williams, K. L., and Hochstrasser, D. F. (1996) From proteins to proteomes: large scale protein identification by two-dimensional electrophoresis and amino acid analysis. Biotechnology (NY) 14, 61–5.

    Article  CAS  Google Scholar 

  8. Field, D., Sansone, S. A., Collis, A., Booth, T., Dukes, P., Gregurick, S. K., Kennedy, K., Kolar, P., Kolker, E., Maxon, M., Millard, S., Mugabushaka, A. M., Perrin, N., Remacle, J. E., Remington, K., Rocca-Serra, P., Taylor, C. F., Thorley, M., Tiwari, B., and Wilbanks, J. (2009) Megascience. ‘Omics data sharing’. Science 326, 234–6.

    Article  PubMed  CAS  Google Scholar 

  9. Field, D., Garrity, G., Gray, T., Morrison, N., Selengut, J., Sterk, P., Tatusova, T., Thomson, N., Allen, M. J., Angiuoli, S. V., Ashburner, M., Axelrod, N., Baldauf, S., Ballard, S., Boore, J., Cochrane, G., Cole, J., Dawyndt, P., De Vos, P., DePamphilis, C., Edwards, R., Faruque, N., Feldman, R., Gilbert, J., Gilna, P., Glockner, F. O., Goldstein, P., Guralnick, R., Haft, D., Hancock, D., Hermjakob, H., Hertz-Fowler, C., Hugenholtz, P., Joint, I., Kagan, L., Kane, M., Kennedy, J., Kowalchuk, G., Kottmann, R., Kolker, E., Kravitz, S., Kyrpides, N., Leebens-Mack, J., Lewis, S. E., Li, K., Lister, A. L., Lord, P., Maltsev, N., Markowitz, V., Martiny, J., Methe, B., Mizrachi, I., Moxon, R., Nelson, K., Parkhill, J., Proctor, L., White, O., Sansone, S. A., Spiers, A., Stevens, R., Swift, P., Taylor, C., Tateno, Y., Tett, A., Turner, S., Ussery, D., Vaughan, B., Ward, N., Whetzel, T., San Gil, I., Wilson, G., and Wipat, A. (2008) The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 26, 541–7.

    Article  PubMed  CAS  Google Scholar 

  10. Li, R., Fan, W., Tian, G., Zhu, H., He, L., Cai, J., Huang, Q., Cai, Q., Li, B., Bai, Y., Zhang, Z., Zhang, Y., Wang, W., Li, J., Wei, F., Li, H., Jian, M., Li, J., Zhang, Z., Nielsen, R., Li, D., Gu, W., Yang, Z., Xuan, Z., Ryder, O. A., Leung, F. C., Zhou, Y., Cao, J., Sun, X., Fu, Y., Fang, X., Guo, X., Wang, B., Hou, R., Shen, F., Mu, B., Ni, P., Lin, R., Qian, W., Wang, G., Yu, C., Nie, W., Wang, J., Wu, Z., Liang, H., Min, J., Wu, Q., Cheng, S., Ruan, J., Wang, M., Shi, Z., Wen, M., Liu, B., Ren, X., Zheng, H., Dong, D., Cook, K., Shan, G., Zhang, H., Kosiol, C., Xie, X., Lu, Z., Zheng, H., Li, Y., Steiner, C. C., Lam, T. T., Lin, S., Zhang, Q., Li, G., Tian, J., Gong, T., Liu, H., Zhang, D., Fang, L., Ye, C., Zhang, J., Hu, W., Xu, A., Ren, Y., Zhang, G., Bruford, M. W., Li, Q., Ma, L., Guo, Y., An, N., Hu, Y., Zheng, Y., Shi, Y., Li, Z., Liu, Q., Chen, Y., Zhao, J., Qu, N., Zhao, S., Tian, F., Wang, X., Wang, H., Xu, L., Liu, X., Vinar, T., Wang, Y., Lam, T. -W., Yiu, S. -M., Liu, S., Zhang, H., Li, D., Huang, Y., Wang, X., Yang, G., Jiang, Z., Wang, J., Qin, N., Li, L., Li, J., Bolund, L., Kristiansen, K., Wong, G. K., Olson, M., Zhang, X., Li, S., Yang, H., Wang, J., and Wang, J. (2009) The sequence and de novo assembly of the giant panda genome. Nature 463, 311–7.

    Article  PubMed  Google Scholar 

  11. (2008) Big Data special issue. Nature 455.

    Google Scholar 

  12. Howe, D., Costanzo, M., Fey, P., Gojobori, T., Hannick, L., Hide, W., Hill, D. P., Kania, R., Schaeffer, M., St Pierre, S., Twigger, S., White, O., and Rhee, S. Y. (2008) Big data: the future of biocuration. Nature 455, 47–50.

    Article  PubMed  CAS  Google Scholar 

  13. Haquin, S., Oeuillet, E., Pajon, A., Harris, M., Jones, A. T., van Tilbeurgh, H., Markley, J. L., Zolnai, Z., and Poupon, A. (2008) Data management in structural genomics: an overview. Methods Mol Biol 426, 49–79.

    Article  PubMed  CAS  Google Scholar 

  14. Gribskov, M. (2003) Challenges in data management for functional genomics. OMICS 7, 3–5.

    Article  PubMed  CAS  Google Scholar 

  15. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2006) GenBank. Nucleic Acids Res 34, D16–20.

    Article  PubMed  CAS  Google Scholar 

  16. Birney, E., Andrews, T. D., Bevan, P., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cuff, J., Curwen, V., Cutts, T., Down, T., Eyras, E., Fernandez-Suarez, X. M., Gane, P., Gibbins, B., Gilbert, J., Hammond, M., Hotz, H. R., Iyer, V., Jekosch, K., Kahari, A., Kasprzyk, A., Keefe, D., Keenan, S., Lehvaslaiho, H., McVicker, G., Melsopp, C., Meidl, P., Mongin, E., Pettett, R., Potter, S., Proctor, G., Rae, M., Searle, S., Slater, G., Smedley, D., Smith, J., Spooner, W., Stabenau, A., Stalker, J., Storey, R., Ureta-Vidal, A., Woodwark, K. C., Cameron, G., Durbin, R., Cox, A., Hubbard, T., and Clamp, M. (2004) An overview of Ensembl. Genome Res 14, 925–8.

    Article  PubMed  CAS  Google Scholar 

  17. Boutet, E., Lieberherr, D., Tognolli, M., Schneider, M., and Bairoch, A. (2007) UniProtKB/Swiss-Prot. Methods Mol Biol 406, 89–112.

    Article  PubMed  CAS  Google Scholar 

  18. Schofield, P. N., Bubela, T., Weaver, T., Portilla, L., Brown, S. D., Hancock, J. M., Einhorn, D., Tocchini-Valentini, G., Hrabe de Angelis, M., and Rosenthal, N. (2009) Post-publication sharing of data and tools. Nature 461, 171–3.

    Article  PubMed  CAS  Google Scholar 

  19. Pennisi, E. (2009) Data sharing. Group calls for rapid release of more genomics data. Science 324, 1000–1.

    Article  PubMed  CAS  Google Scholar 

  20. Merali, Z., and Giles, J. (2005) Databases in peril. Nature 435, 1010–1.

    Article  PubMed  CAS  Google Scholar 

  21. Human Genomes Project, http://www.1000genomes.org.

  22. Smigielski, E. M., Sirotkin, K., Ward, M., and Sherry, S. T. (2000) dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res 28, 352–5.

    Article  PubMed  CAS  Google Scholar 

  23. Frazer, K. A., Ballinger, D. G., Cox, D. R., Hinds, D. A., Stuve, L. L., Gibbs, R. A., Belmont, J. W., Boudreau, A., Hardenbol, P., Leal, S. M., Pasternak, S., Wheeler, D. A., Willis, T. D., Yu, F., Yang, H., Zeng, C., Gao, Y., Hu, H., Hu, W., Li, C., Lin, W., Liu, S., Pan, H., Tang, X., Wang, J., Wang, W., Yu, J., Zhang, B., Zhang, Q., Zhao, H., Zhou, J., Gabriel, S. B., Barry, R., Blumenstiel, B., Camargo, A., Defelice, M., Faggart, M., Goyette, M., Gupta, S., Moore, J., Nguyen, H., Onofrio, R. C., Parkin, M., Roy, J., Stahl, E., Winchester, E., Ziaugra, L., Altshuler, D., Shen, Y., Yao, Z., Huang, W., Chu, X., He, Y., Jin, L., Liu, Y., Sun, W., Wang, H., Wang, Y., Xiong, X., Xu, L., Waye, M. M., Tsui, S. K., Xue, H., Wong, J. T., Galver, L. M., Fan, J. B., Gunderson, K., Murray, S. S., Oliphant, A. R., Chee, M. S., Montpetit, A., Chagnon, F., Ferretti, V., Leboeuf, M., Olivier, J. F., Phillips, M. S., Roumy, S., Sallee, C., Verner, A., Hudson, T. J., Kwok, P. Y., Cai, D., Koboldt, D. C., Miller, R. D., Pawlikowska, L., Taillon-Miller, P., Xiao, M., Tsui, L. C., Mak, W., Song, Y. Q., Tam, P. K., Nakamura, Y., Kawaguchi, T., Kitamoto, T., Morizono, T., Nagashima, A., Ohnishi, Y., Sekine, A., Tanaka, T., Tsunoda, T., Deloukas, P., Bird, C. P., Delgado, M., Dermitzakis, E. T., Gwilliam, R., Hunt, S., Morrison, J., Powell, D., Stranger, B. E., Whittaker, P., Bentley, D. R., Daly, M. J., de Bakker, P. I., Barrett, J., Chretien, Y. R., Maller, J., McCarroll, S., Patterson, N., Pe’er, I., Price, A., Purcell, S., Richter, D. J., Sabeti, P., Saxena, R., Schaffner, S. F., Sham, P. C., Varilly, P., Stein, L. D., Krishnan, L., Smith, A. V., Tello-Ruiz, M. K., Thorisson, G. A., Chakravarti, A., Chen, P. E., Cutler, D. J., Kashuk, C. S., Lin, S., Abecasis, G. R., Guan, W., Li, Y., Munro, H. M., Qin, Z. S., Thomas, D. J., McVean, G., Auton, A., Bottolo, L., Cardin, N., Eyheramendy, S., Freeman, C., Marchini, J., Myers, S., Spencer, C., Stephens, M., Donnelly, P., Cardon, L. R., Clarke, G., Evans, D. M., Morris, A. P., Weir, B. S., Mullikin, J. C., Sherry, S. T., Feolo, M., Skol, A., Zhang, H., Matsuda, I., Fukushima, Y., Macer, D. R., Suda, E., Rotimi, C. N., Adebamowo, C. A., Ajayi, I., Aniagwu, T., Marshall, P. A., Nkwodimmah, C., Royal, C. D., Leppert, M. F., Dixon, M., Peiffer, A., Qiu, R., Kent, A., Kato, K., Niikawa, N., Adewole, I. F., Knoppers, B. M., Foster, M. W., Clayton, E. W., Watkin, J., Muzny, D., Nazareth, L., Sodergren, E., Weinstock, G. M., Yakub, I., Birren, B. W., Wilson, R. K., Fulton, L. L., Rogers, J., Burton, J., Carter, N. P., Clee, C. M., Griffiths, M., Jones, M. C., McLay, K., Plumb, R. W., Ross, M. T., Sims, S. K., Willey, D. L., Chen, Z., Han, H., Kang, L., Godbout, M., Wallenburg, J. C., L’Archeveque, P., Bellemare, G., Saeki, K., An, D., Fu, H., Li, Q., Wang, Z., Wang, R., Holden, A. L., Brooks, L. D., McEwen, J. E., Guyer, M. S., Wang, V. O., Peterson, J. L., Shi, M., Spiegel, J., Sung, L. M., Zacharia, L. F., Collins, F. S., Kennedy, K., Jamieson, R., and Stewart, J. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–61.

    Article  PubMed  CAS  Google Scholar 

  24. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25–9.

    Article  PubMed  CAS  Google Scholar 

  25. Smith, B., Ashburner, M., Rosse, C., Bard, J., Bug, W., Ceusters, W., Goldberg, L. J., Eilbeck, K., Ireland, A., Mungall, C. J., Leontis, N., Rocca-Serra, P., Ruttenberg, A., Sansone, S. A., Scheuermann, R. H., Shah, N., Whetzel, P. L., and Lewis, S. (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25, 1251–5.

    Article  PubMed  CAS  Google Scholar 

  26. ClearForest, Text Analytics Solutions, http://www.clearforest.com/index.asp.

  27. novo|seek, http://www.novoseek.com/Welcome.action.

  28. DDBJ: DNA Data Bank of Japan, http://www.ddbj.nig.ac.jp.

  29. Cochrane, G., Aldebert, P., Althorpe, N., Andersson, M., Baker, W., Baldwin, A., Bates, K., Bhattacharyya, S., Browne, P., van den Broek, A., Castro, M., Duggan, K., Eberhardt, R., Faruque, N., Gamble, J., Kanz, C., Kulikova, T., Lee, C., Leinonen, R., Lin, Q., Lombard, V., Lopez, R., McHale, M., McWilliam, H., Mukherjee, G., Nardone, F., Pastor, M. P., Sobhany, S., Stoehr, P., Tzouvara, K., Vaughan, R., Wu, D., Zhu, W., and Apweiler, R. (2006) EMBL Nucleotide Sequence Database: developments in 2005. Nucleic Acids Res 34, D10–5.

    Article  PubMed  CAS  Google Scholar 

  30. Sussman, J. L., Lin, D., Jiang, J., Manning, N. O., Prilusky, J., Ritter, O., and Abola, E. E. (1998) Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr D Biol Crystallogr 54, 1078–84.

    Article  PubMed  CAS  Google Scholar 

  31. Rebhan, M., Chalifa-Caspi, V., Prilusky, J., and Lancet, D. (1998) GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14, 656–64.

    Article  PubMed  CAS  Google Scholar 

  32. Safran, M., Chalifa-Caspi, V., Shmueli, O., Olender, T., Lapidot, M., Rosen, N., Shmoish, M., Peter, Y., Glusman, G., Feldmesser, E., Adato, A., Peter, I., Khen, M., Atarot, T., Groner, Y., and Lancet, D. (2003) Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE. Nucleic Acids Res 31, 142–6.

    Article  PubMed  CAS  Google Scholar 

  33. Safran, M., Solomon, I., Shmueli, O., Lapidot, M., Shen-Orr, S., Adato, A., Ben-Dor, U., Esterman, N., Rosen, N., Peter, I., Olender, T., Chalifa-Caspi, V., and Lancet, D. (2002) GeneCards 2002: towards a complete, object-oriented, human gene compendium. Bioinformatics 18, 1542–3.

    Article  PubMed  CAS  Google Scholar 

  34. Stelzer, G., Inger, A., Olender, T., Iny-Stein, T., Dalah, I., Harel, A., Safran, M., and Lancet, D. (2009) GeneDecks: paralog hunting and gene-set distillation with GeneCards annotation. OMICS 13, 477–87.

    Article  PubMed  CAS  Google Scholar 

  35. Harel, A., Inger, A., Stelzer, G., Strichman-Almashanu, L., Dalah, I., Safran, M., and Lancet, D. (2009) GIFtS: annotation ­landscape analysis with GeneCards. BMC Bioinformatics 10, 348.

    Article  PubMed  Google Scholar 

  36. Liebel, U., Kindler, B., and Pepperkok, R. (2004) ‘Harvester’: a fast meta search engine of human protein resources. Bioinformatics 20, 1962–3.

    Article  PubMed  CAS  Google Scholar 

  37. Pang, K. C., Stephen, S., Engstrom, P. G., Tajul-Arifin, K., Chen, W., Wahlestedt, C., Lenhard, B., Hayashizaki, Y., and Mattick, J. S. (2005) RNAdb – a comprehensive mammalian noncoding RNA database. Nucleic Acids Res 33, D125–30.

    Article  PubMed  CAS  Google Scholar 

  38. Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A., and McKusick, V. A. (2005) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33, D514–7.

    Article  PubMed  CAS  Google Scholar 

  39. Laboratory information management system, http://en.wikipedia.org/wiki/Laboratory_information_management_system.

  40. Morris, J. A., Gayther, S. A., Jacobs, I. J., and Jones, C. (2008) A Perl toolkit for LIMS development. Source Code Biol Med 3, 4.

    Article  PubMed  Google Scholar 

  41. Genome Canada LIMS, http://wishart.biology.ualberta.ca/labm/index.htm

  42. Parkinson, J., Anthony, A., Wasmuth, J., Schmid, R., Hedley, A., and Blaxter, M. (2004) PartiGene – constructing partial genomes. Bioinformatics 20, 1398–404.

    Article  PubMed  CAS  Google Scholar 

  43. Schmid, R., and Blaxter, M. (2009) EST processing: from trace to sequence. Methods Mol Biol 533, 189–220.

    PubMed  CAS  Google Scholar 

  44. The maxd software: supporting genomic expression analysis, http://www.bioinf.manchester.ac.uk/microarray/maxd.

  45. Gribskov, M., Fana, F., Harper, J., Hope, D. A., Harmon, A. C., Smith, D. W., Tax, F. E., and Zhang, G. (2001) PlantsP: a functional genomics database for plant phosphorylation. Nucleic Acids Res 29, 111–3.

    Article  PubMed  CAS  Google Scholar 

  46. Predict-IV, www.predict-iv.toxi.uni-wuerzburg.de/participants/participant_7.

  47. Harris, M., and Jones, T. A. (2002) Xtrack – a web-based crystallographic notebook. Acta Crystallogr D Biol Crystallogr 58, 1889–91.

    Article  PubMed  Google Scholar 

  48. Zolnai, Z., Lee, P. T., Li, J., Chapman, M. R., Newman, C. S., Phillips, G. N., Jr., Rayment, I., Ulrich, E. L., Volkman, B. F., and Markley, J. L. (2003) Project management system for structural and functional proteomics: Sesame. J Struct Funct Genomics 4, 11–23.

    Article  PubMed  CAS  Google Scholar 

  49. Prilusky, J., Oueillet, E., Ulryck, N., Pajon, A., Bernauer, J., Krimm, I., Quevillon-Cheruel, S., Leulliot, N., Graille, M., Liger, D., Tresaugues, L., Sussman, J. L., Janin, J., van Tilbeurgh, H., and Poupon, A. (2005) HalX: an open-source LIMS (Laboratory Information Management System) for small- to large-scale laboratories. Acta Crystallogr D Biol Crystallogr 61, 671–8.

    Article  PubMed  Google Scholar 

  50. Goh, C. S., Lan, N., Echols, N., Douglas, S. M., Milburn, D., Bertone, P., Xiao, R., Ma, L. C., Zheng, D., Wunderlich, Z., Acton, T., Montelione, G. T., and Gerstein, M. (2003) SPINE 2: a system for collaborative structural proteomics within a federated database framework. Nucleic Acids Res 31, 2833–8.

    Article  PubMed  CAS  Google Scholar 

  51. ProteinScapeTM, http://www.protagen.de/index.php?option=com_content&task=view&id=95&Itemid=288.

  52. Stein, L. D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J. E., Harris, T. W., Arva, A., and Lewis, S. (2002) The generic genome browser: a building block for a model organism system database. Genome Res 12, 1599–610.

    Article  PubMed  CAS  Google Scholar 

  53. Karolchik, D., Baertsch, R., Diekhans, M., Furey, T. S., Hinrichs, A., Lu, Y. T., Roskin, K. M., Schwartz, M., Sugnet, C. W., Thomas, D. J., Weber, R. J., Haussler, D., and Kent, W. J. (2003) The UCSC Genome Browser Database. Nucleic Acids Res 31, 51–4.

    Article  PubMed  CAS  Google Scholar 

  54. Brazma, A. (2001) On the importance of standardisation in life sciences. Bioinformatics 17, 113–4.

    Article  PubMed  CAS  Google Scholar 

  55. Taylor, C. F., Field, D., Sansone, S. A., Aerts, J., Apweiler, R., Ashburner, M., Ball, C. A., Binz, P. A., Bogue, M., Booth, T., Brazma, A., Brinkman, R. R., Michael Clark, A., Deutsch, E. W., Fiehn, O., Fostel, J., Ghazal, P., Gibson, F., Gray, T., Grimes, G., Hancock, J. M., Hardy, N. W., Hermjakob, H., Julian, R. K., Jr., Kane, M., Kettner, C., Kinsinger, C., Kolker, E., Kuiper, M., Le Novere, N., Leebens-Mack, J., Lewis, S. E., Lord, P., Mallon, A. M., Marthandan, N., Masuya, H., McNally, R., Mehrle, A., Morrison, N., Orchard, S., Quackenbush, J., Reecy, J. M., Robertson, D. G., Rocca-Serra, P., Rodriguez, H., Rosenfelder, H., Santoyo-Lopez, J., Scheuermann, R. H., Schober, D., Smith, B., Snape, J., Stoeckert, C. J., Jr., Tipton, K., Sterk, P., Untergasser, A., Vandesompele, J., and Wiemann, S. (2008) Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 26, 889–96.

    Article  PubMed  CAS  Google Scholar 

  56. Jones, A. R., Miller, M., Aebersold, R., Apweiler, R., Ball, C. A., Brazma, A., Degreef, J., Hardy, N., Hermjakob, H., Hubbard, S. J., Hussey, P., Igra, M., Jenkins, H., Julian, R. K., Jr., Laursen, K., Oliver, S. G., Paton, N. W., Sansone, S. A., Sarkans, U., Stoeckert, C. J., Jr., Taylor, C. F., Whetzel, P. L., White, J. A., Spellman, P., and Pizarro, A. (2007) The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat Biotechnol 25, 1127–33.

    Article  PubMed  CAS  Google Scholar 

  57. Sansone, S. A., Rocca-Serra, P., Brandizi, M., Brazma, A., Field, D., Fostel, J., Garrow, A. G., Gilbert, J., Goodsaid, F., Hardy, N., Jones, P., Lister, A., Miller, M., Morrison, N., Rayner, T., Sklyar, N., Taylor, C., Tong, W., Warner, G., and Wiemann, S. (2008) The first RSBI (ISA-TAB) workshop: “can a simple format work for complex studies?”. OMICS 12, 143–9.

    Article  PubMed  CAS  Google Scholar 

  58. Field, D., Garrity, G., Morrison, N., Selengut, J., Sterk, P., Tatusova, T., and Thomson, N. (2005) eGenomics: cataloguing our complete genome collection. Comp Funct Genomics 6, 363–8.

    Article  PubMed  CAS  Google Scholar 

  59. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C. A., Causton, H. C., Gaasterland, T., Glenisson, P., Holstege, F. C., Kim, I. F., Markowitz, V., Matese, J. C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J., and Vingron, M. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29, 365–71.

    Article  PubMed  CAS  Google Scholar 

  60. Webb, S. C., Attwood, A., Brooks, T., Freeman, T., Gardner, P., Pritchard, C., Williams, D., Underhill, P., Strivens, M. A., Greenfield, A., and Pilicheva, E. (2004) LIMaS: the JAVA-based application and database for microarray experiment tracking. Mamm Genome 15, 740–7.

    Article  PubMed  Google Scholar 

  61. Ball, C. A., Awad, I. A., Demeter, J., Gollub, J., Hebert, J. M., Hernandez-Boussard, T., Jin, H., Matese, J. C., Nitzberg, M., Wymore, F., Zachariah, Z. K., Brown, P. O., and Sherlock, G. (2005) The Stanford Microarray Database accommodates additional microarray platforms and data formats. Nucleic Acids Res 33, D580–2.

    Article  PubMed  CAS  Google Scholar 

  62. Pajon, A., Ionides, J., Diprose, J., Fillon, J., Fogh, R., Ashton, A. W., Berman, H., Boucher, W., Cygler, M., Deleury, E., Esnouf, R., Janin, J., Kim, R., Krimm, I., Lawson, C. L., Oeuillet, E., Poupon, A., Raymond, S., Stevens, T., van Tilbeurgh, H., Westbrook, J., Wood, P., Ulrich, E., Vranken, W., Xueli, L., Laue, E., Stuart, D. I., and Henrick, K. (2005) Design of a data model for developing laboratory information management and analysis systems for protein production. Proteins 58, 278–84.

    Article  PubMed  CAS  Google Scholar 

  63. Orchard, S., Hermjakob, H., Binz, P. A., Hoogland, C., Taylor, C. F., Zhu, W., Julian, R. K., Jr., and Apweiler, R. (2005) Further steps towards data standardisation: the Proteomic Standards Initiative HUPO 3(rd) annual congress, Beijing 25-27(th) October, 2004. Proteomics 5, 337–9.

    Article  PubMed  CAS  Google Scholar 

  64. Lindon, J. C., Nicholson, J. K., Holmes, E., Keun, H. C., Craig, A., Pearce, J. T., Bruce, S. J., Hardy, N., Sansone, S. A., Antti, H., Jonsson, P., Daykin, C., Navarange, M., Beger, R. D., Verheij, E. R., Amberg, A., Baunsgaard, D., Cantor, G. H., Lehman-McKeeman, L., Earll, M., Wold, S., Johansson, E., Haselden, J. N., Kramer, K., Thomas, C., Lindberg, J., Schuppe-Koistinen, I., Wilson, I. D., Reily, M. D., Robertson, D. G., Senn, H., Krotzky, A., Kochhar, S., Powell, J., van der Ouderaa, F., Plumb, R., Schaefer, H., and Spraul, M. (2005) Summary recommendations for standardization and reporting of metabolic analyses. Nat Biotechnol 23, 833–8.

    Article  PubMed  CAS  Google Scholar 

  65. Digital Curation Centre, http://www.dcc.ac.uk.

  66. Biosharing, http://biosharing.org.

  67. Joyce, A. R., and Palsson, B. Ø. (2006) The model organism as a system: integrating ‘omics’ data sets. Nat Rev Mol Cell Biol 7, 198–210.

    Article  PubMed  CAS  Google Scholar 

  68. Omes and Omics, http://omics.org/index.php/Omes_and_Omics.

  69. Mounicou, S., Szpunar, J., and Lobinski, R. (2009) Metallomics: the concept and methodology. Chem Soc Rev 38, 1119–38.

    Article  PubMed  CAS  Google Scholar 

  70. Ippolito, J. E., Xu, J., Jain, S., Moulder, K., Mennerick, S., Crowley, J. R., Townsend, R. R., and Gordon, J. I. (2005) An integrated functional genomics and metabolomics approach for defining poor prognosis in human neuroendocrine cancers. Proc Natl Acad Sci USA 102, 9901–6.

    Article  PubMed  CAS  Google Scholar 

  71. Pefkaros, K. 2008 Using object-oriented analysis and design over traditional structured analysis and design. International Journal of Business Research. International Academy of Business and Economics. HighBeam Research. http://www.highbeam.com. 2 Jan. 2011.

  72. Whitten, J. L., Bentley, L. D., and Dittman, K. C. (2004) Systems Analysis and Design Methods, 6th ed. McGraw-Hill Irwin, New York.

    Google Scholar 

  73. Todman, C. (2001) Designing a Data Warehouse: Supporting Customer Relationship Management, 1st ed., pp 25–58. Prentice-Hall PTR, New Jersey.

    Google Scholar 

  74. CIS 3400 Database Management Systems Course – Baruch College CUNY, http://cisnet.baruch.cuny.edu/holowczak/classes/3400.

  75. MySQL, http://dev.mysql.com.

  76. Perl, http://www.perl.org.

  77. BioPerl, http://www.bioperl.org.

  78. Glimpse, http://www.webglimpse.org.

  79. Lucene, http://lucene.apache.org.

  80. HGNC, http://www.genenames.org.

  81. Entrez gene, http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene.

  82. Ensembl, http://www.ensembl.org/index.html.

  83. Universal Protein Resource (UniProtKB), http://www.uniprot.org.

  84. GeneCards sources, http://www.genecards.org/sources.shtml.

  85. Eyre, T. A., Ducluzeau, F., Sneddon, T. P., Povey, S., Bruford, E. A., and Lush, M. J. (2006) The HUGO Gene Nomenclature Database, 2006 updates. Nucleic Acids Res 34, D319–21.

    Article  PubMed  CAS  Google Scholar 

  86. Rosen, N., Chalifa-Caspi, V., Shmueli, O., Adato, A., Lapidot, M., Stampnitzky, J., Safran, M., and Lancet, D. (2003) GeneLoc: exon-based integration of human genome maps. Bioinformatics 19, i222–4.

    Article  PubMed  Google Scholar 

  87. phpMyAdmin, http://www.phpmyadmin.net/home_page/index.php.

  88. Solr, http://lucene.apache.org/solr.

  89. Propel, http://propel.phpdb.org/trac.

  90. Bugzilla – server software for managing software development, http://www.bugzilla.org.

  91. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J Mol Biol 215, 403–10.

    PubMed  CAS  Google Scholar 

  92. Trace at NCBI, http://www.ncbi.nlm.nih.gov/Traces.

  93. Perl for bioinformatics and internet, http://bip.weizmann.ac.il/course/prog.

  94. Artemis, http://www.sanger.ac.uk/Software/Artemis.

  95. Extensible Markup Language (XML), http://www.w3.org/XML.

  96. Concurrent Versions System (CVS) Overview, http://www.thathost.com/wincvs-howto/cvsdoc/cvs_1.html#SEC1.

  97. Eclipse project, http://www.eclipse.org/eclipse.

  98. Sequel Pro, http://www.sequelpro.com.

Download references

Acknowledgments

We thank the members of the GeneCards team: Iris Bahir, Tirza Doniger, Tsippi Iny Stein, Hagit Krugh, Noam Nativ, Naomi Rosen, and Gil Stelzer. The GeneCards project is funded by Xennex Inc., the Weizmann Institute of Science Crown Human Genome Center, and the EU SYNLET (FP6 project number 043312) and SysKID (FP7 project number 241544) grants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Doron Lancet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Harel, A., Dalah, I., Pietrokovski, S., Safran, M., Lancet, D. (2011). Omics Data Management and Annotation. In: Mayer, B. (eds) Bioinformatics for Omics Data. Methods in Molecular Biology, vol 719. Humana Press. https://doi.org/10.1007/978-1-61779-027-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-027-0_3

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-026-3

  • Online ISBN: 978-1-61779-027-0

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics