Bioinformatics: Databasing and Gene Annotation

  • Lyle D. Burgoon
  • Timothy R. Zacharewski
Part of the Methods in Molecular Biology™ book series (MIMB, volume 460)


“Omics” experiments amass large amounts of data requiring integration of several data sources for data interpretation. For instance, microarray, metabolomic, and proteomic experiments may at most yield a list of active genes, metabolites, or proteins, respectively. More generally, the experiments yield active features that represent subsequences of the gene, a chemical shift within a complex mixture, or peptides, respectively. Thus, in the best-case scenario, the investigator is left to identify the functional significance, but more likely the investigator must first identify the larger context of the feature (e.g., which gene, metabolite, or protein is being represented by the feature). To completely annotate function, several different databases are required, including sequence, genome, gene function, protein, and protein interaction databases. Because of the limited coverage of some microarrays or experiments, biological data repositories may be consulted, in the case of microarrays, to complement results. Many of the data sources and databases available for gene function characterization, including tools from the National Center for Biotechnology Information, Gene Ontology, and UniProt, are discussed.


bioinformatics databases functional genomics gene annotation protein interaction toxicogenomics 

1 Introduction

Genomic experiments amass large data sets, requiring the integration of supportive information from several other sources, including the most recent gene annotations, to facilitate biological interpretation. Typically, after microarray analysis and identification of the most active, or significant, genes, further investigation must be performed to elucidate the relevant pathways and networks involved in eliciting the phenotype (e.g., toxicity). Thus, investigators must integrate complementary information including gene names, abbreviations, and aliases for literature searches; cellular and extracellular locations; functional annotation; disease processes the gene participates in; and biological interaction data (e.g., protein-protein interactions) in order to comprehensively interpret the data. This information is oftentimes available in a variety of biological databases each serving a particular purpose or devoted to a specific data domain.

This chapter will describe six broad categories of databases as they relate to genomic data integration, including genome level, sequence level, protein level, functional annotation, protein interaction, and microarray databases (Fig. 1 ). Excluded are the metabolomic-related domains as reporting standards have yet to emerge, although they are in development (, All of the databases exist in a complex data exchange continuum, where some databases rely entirely upon others for their information, others are nearly independent of the rest, and the remaining integrate data from several different levels.
Fig. 1

The Biological Database Universe. Six biological database levels are depicted as they pertain to genomic data analysis and interpretation. Genome-level databases catalog data with respect to the sequence of the full genome. Sequence-level databases catalog sequence reads from cells, including genomic sequence and expressed sequence tags (ESTs). Annotation databases provide functional information about genes and their products. Protein-level databases provide information on protein sequences, families, and domain structures. Protein interaction databases provide interaction data concerning proteins, genes, chemicals, and small molecules. Microarray databases include local laboratory information management systems (LIMS) and data repositories. The arrows depict possible interactions between different database domains, where information from one level may exist in another to allow for cross-domain integration.

In general, genome sequences, from databases such as Ensembl (1,2), Entrez Genomes (3), and the University of California Santa Cruz (UCSC) Genome Browser (4), are the root of the universe. From these genomic templates, expressed sequence tags (ESTs) and cDNAs in GenBank (3) can be clustered together and associated with genes (i.e., UniGene; Ref. 3), and exemplary, representative full-length sequences can be identified from GenBank and mapped back to locations in the genome (i.e., RefSeq; Ref. 3). These genes are then annotated in databases such as Entrez Gene (5), where functional information (Gene Ontology; Ref. 6), and disease information (Online Mendelian Inheritance in Man [OMIM]; Ref. 3) are integrated to provide a more comprehensive summary of the function of a gene. Similarly, elements from sequence-level databases (e.g., ESTs) can be associated with features printed on a microarray and related to a gene through its GenBank Accession number facilitating the annotation of gene expression profiles from the microarray experiments. Integration of genomic and proteomic data is also possible through sequence relationships, from the mRNA to the translated protein sequence. This facilitates further functional predictions, by providing protein domain and family information that may reveal functional characteristics, and protein-protein interaction data from databases such as BIND (Biomolecular Interaction Network Database) (7) and DIP (the Database of Interacting Proteins) (8).

Currently, there is significant effort in the development of public repositories such as the Chemical Effects in Biological Systems Knowledgebase (CEBS) (9), ArrayExpress (10,11), and the Gene Expression Omnibus (GEO) (12) to facilitate data integration across multiple domains and to ensure public accessibility, as well as to support the development of comprehensive networks and computational models capable of predicting toxicity.

2 Genome-Level Databases

Genome-level databases manage, at the very least, genome sequence data. However, they differ in their integration of other types of data and often in their assignment of computationally defined genes. The three primary genome-level databases are the Ensembl database (1,2), the Entrez Genomes database (3), and the UCSC Genome Browser (4). Each uses a different technique for predicting genes and gene structures (e.g., untranslated regions [UTR], regulatory regions, introns, and exons) from genome sequence data.

The Ensembl database uses several methods for the prediction of genes and gene structures that are biased toward the alignment of species-specific proteins and cDNAs, and using orthologous protein and cDNA alignments when necessary (13). The use of the protein and cDNA alignments to the genome sequence facilitates the identification of exonic and intronic sequences and UTRs (Fig. 2 ). A putative transcription start site (TSS) can be obtained by defining the end of the upstream region.
Fig. 2

Ensembl genome annotation. This simplified view illustrates the method used by the Ensembl genome annotation system for identifying gene structures, such as the untranslated region (UTR), exons, and introns, by combining genome, mRNA, and protein alignments.

The National Center for Biotechnology Information (NCBI) Entrez Genomes database annotates genes based on the RefSeq database of reference, exemplary sequences. RefSeq sequences are initially aligned to the genomic sequence using the MegaBLAST algorithm to identify genes; mRNAs and ESTs are aligned through MegaBLAST to identify additional genes (\#contig).

The UCSC Genome Browser uses the NCBI genome builds for its annotation, thus, there are no differences between the human genome builds at UCSC and NCBI. However, prior to the December 2001 human genome freeze, the UCSC created its own genome builds, separate from the NCBI. Previously, the primary difference between the two methods was in their genome assemblies, where Entrez Genomes used sequence entries from the GenBank database to drive assemblies, whereas the UCSC Genome Browser used BAC clones and mRNA sequences, resulting in differences in the genome assemblies (14). For other genomes, such as the mouse (i.e., C57BL/6), rat (i.e., Norwegian Brown Rat), chimpanzee, rhesus monkey, and dog (i.e., Boxer), the UCSC uses builds from the respective genome authorities (see for further details).

To annotate the genome builds, NCBI uses the MegaBLAST algorithm for alignments to genomes, whereas the UCSC efforts use the BLAT (BLAST-like alignment tool) for alignment of mRNA, EST, and RefSeq sequences to the genome. This means that although both sources use the same build for the human genome (i.e., the NCBI genome build), there could still be differences in annotation (i.e., assignment of genes and functions to the genomic sequence). Assuming both use the same GenBank and RefSeq versions, differences may be attributed to the different alignment algorithms. In addition, the UCSC Genome Browser also incorporates gene predictions from other sources, such as Ensembl and Acembly (4), and users can also upload their own annotations for display in the browser.

3 Sequence-Level Databases

Sequence-level databases manage EST and cDNA sequence read data. Some databases, such as GenBank and RefSeq, deal with these sequences directly, whereas others manage them on a larger scale, where multiple sequences are grouped together, as in UniGene. Generally, these databases provide the first level of annotation for microarray studies, as the sequences are directly represented on the microarrays as printed features.

When a sequence read is generated, it is generally submitted to the GenBank database and assigned a GenBank Accession Number, a unique identifier representing that sequence and is typically the most commonly used identifier for probes represented on cDNA microarrays (3). The UniGene database creates nonredundant gene clusters based on GenBank sequences (3). Clusters are built by sequence alignment and annotated relative to genes in the Entrez Gene database. Consequently, UniGene clusters can be thought of as collections of GenBank sequences that most likely describe the same gene.

The RefSeq database provides exemplary transcript and protein sequences based either on hand curation or based on information from a genome authority (e.g., the Jackson Labs) (3,15). RefSeq accession numbers follow a PREFIX\_NUMBER format (e.g., NM_123456, or NM_123456789). All curated RefSeq transcript accessions are prefixed by an NM, whereas XM prefixes represent accessions that have been generated by automated methods. Some of the NM transcript accessions are generated by automated methods, but are mature, and have undergone some level of review. RefSeq records also contain one of seven status codes, illustrating the state of maturity of the annotation (Table 1 ).
Table 1

RefSeq Status Codes and Their Level of Annotation*

RefSeq status code

Level of annotation

Genome annotation

Records that are aligned to the annotated genome


Predicted to exist based on genome analysis, but no known mRNA/EST exists within GenBank


Predicted based on computational gene prediction methods; a transcript sequence may or may not exist within GenBank


Sequences from genes of unknown function


Sequences represent genes with known functions, however they have not been verified by NCBI personnel


Provisional sequences that have undergone a preliminary review by NCBI personnel


Validated sequences that represent genes of known function that have been verified by NCBI personnel

4 Annotation Databases

Annotation databases provide functional information for genes and may also catalogue the structure of the gene. They serve as an initial point for data interpretation of microarray data and hypothesis generation.

Entrez Gene is a part of NCBI’s Entrez suite of bioinformatics tools. It provides information on genes that have a RefSeq or have been annotated by a genome annotation authority (e.g., Jackson Labs for mice) for several toxicology relevant species, including human, mouse, rat, and dog (5). Consequently, entries within Entrez Gene may have an associated NM (mature) or the XM (nonreviewed) RefSeq, or may not have an exemplary RefSeq sequence associated with it.

Entrez Gene serves as a focal point for the integration of gene annotation data from many sources, including databases outside NCBI. Some data integration is achieved through hyperlinks to the appropriate database entries, and others are catalogued on the detail page for that gene. Table 2 lists several of the annotation categories and their sources. The most basic form of gene annotation is the gene name and the abbreviation, which can be used to initiate the functional annotation of a gene through literature searches. Entrez Gene also integrates data from the RefSeq, Gene Ontology (GO), Gene Expression Omnibus (GEO), Gene References into Function (GeneRIF), and GenBank databases. The RefSeq sequences, both mRNA and protein, facilitate sequence-based searching, such as identifying homologous genes, or identifying putative function based on protein domains. The GO database catalogues genes by their molecular function, cellular location, and biological process. Information regarding the tissue expression of genes can be obtained from the GenBank database, where the tissue localization for an EST is recorded, as well as the GEO-NCBI’s gene expression repository (3). GeneRIFs provide curated functional data and literature references, although it may not be the most up-to-date functional annotation available in the literature. Investigators are encouraged to facilitate GeneRIF updates by submitting suggestions directly to the NCBI through their update form:
Table 2

Entrez Gene Annotation Categories and Sources*

Annotation categories


Gene names and abbreviations/symbols

Publications and genome authorities

RefSeq sequence

RefSeq database

Genome position and gene structures

Genome databases

Gene function

Gene Ontology (GO) database, Gene References into Function (GeneRIF)

Expression data

Gene Expression Omnibus (GEO), EST tissue expression from GenBank

*Adapted from Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. (2005) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33 (Database Issue), D54–58.

For human studies, the Online Mendelian Inheritance in Man (OMIM) database, the online version of the Mendelian Inheritance in Man (16), provides linkages between human genes and diseases (3,17). Output pages from the Entrez Gene provide links to OMIM, which is also searchable through the NCBI Entrez system. For many of the diseases within OMIM, a synopsis of the clinical presentation is provided in addition to links to the genes associated with the disease. PubMed citations are also made available through OMIM, with hyperlinks to the PubMed database entries. OMIM also contains information on known allelic variants and some polymorphisms (17).

The Gene Ontology (GO; (6) database is another source of gene functional annotative information. The database consists of an ontology (i.e., a catalogue of existents/ideas/concepts and their interrelationships; Ref. 18) where terms exist within a directed acyclic graph (DAG; Fig. 3 ). DAGs are graphical structures that cannot exist as loops, thus, a child node (i.e., an object or concept) may not also serve as its own predecessor (i.e., parent, grandparent, great-grandparent, etc.). Any child node within a DAG may have any number of parents and any number of paths to get to the child. For example, Fig. 3 shows two paths leading to the same child, GO:0045814 (negative regulation of gene expression, epigenetic). The DAG illustrates that epigenetic negative regulation of gene expression is both a regulation process and critical in development. GO entries that exist at the same level relative to the root, or starting node, do not necessarily reflect the same level of specificity. The level of specificity afforded must be taken on a per DAG basis and not relative to the other DAGs. Thus, a fourth order node (a node that is four levels below the root node) in one DAG has no specificity relationship with regards to a sixth order node in a different DAG. At each mode within the GO there may exist a list of genes. As the annotation for a gene improves, it may change node associations. For example, if gene X were previously GO:0040029 (regulation of gene expression, epigenetic), and new experimental data suggested gene X was a negative regulator of gene expression through an epigenetic mechanism, it would be reassigned to GO:0045814 (negative regulation of gene expression, epigenetic).
Fig. 3

Example of a Gene Ontology (GO) directed acyclic graph (DAG). This DAG shows two paths to reach the same GO entry, GO:0045814. It is important to note that the DAG travels from the most general case and becomes more specific with entries that are farther down the DAG.

The GO Consortium maintains the mappings between genes and the GO terms. It is important to note that each gene may have multiple associated GO terms and that the assignment of a GO number has no other significance other than being a unique identifier.

5 Protein-Level Databases

In many instances, the gene annotation databases mentioned above provide hyperlinks to protein annotation databases to identify the proteins encoded by the genes of interest. Recently, several protein-level databases were merged into one primary protein resource, the Universal Protein Resource (UniProt). UniProt combines the Swiss-Prot, TrEBML, and PIR-PSD databases into one resource, consisting of three related databases: (1) the UniProt Archive, (2) the UniProt Knowledgebase, and (3) the UniRef database.

The UniProt Archive (UniParc) is a database of nonredundant protein sequences obtained from (1) translation of sequences within the gene sequence level databases (e.g., GenBank), (2) RefSeq, (3) FlyBase, (4) WormBase, (5) Ensembl, (6) the International Protein Index, (7) patent applications, and (8) the Protein Data Bank (19). The UniProt Knowledgebase (UniProt) provides functional annotation of the sequences within the UniParc. Examples of the annotation include the protein name, listing of protein domains and families from the InterPro database, containing protein family, domain, and functional information (, (20), Enzyme Commission identifier, and Gene Ontology identifiers. Proteins represented within the UniParc and UniProt Knowledgebase are then gathered automatically to create the UniProt reference database (UniRef), a database of reference, exemplary sequences based on sequence identity. Three different versions of the UniRef database exist (i.e., UniRef100, UniRef90, and UniRef50), where the number denotes the percent identity required for sequences to be merged, from across all species represented in the parent databases, into a single reference protein sequence. Thus, UniRef50 requires only 50% identity for proteins to be merged. UniRef50 and 90 provide faster sequence searches for identifying probable protein domains and functions by decreasing the size of the search space.

The RefSeq database also contains reference protein sequences, similar in concept to the reference mRNA sequences. These are available through the Entrez Gene system when querying for a gene. For more information on RefSeq, see Section 3.

6 Protein Interaction Databases

Protein interaction databases such as the Biomolecular Interaction Network Data (BIND) database, the Database of Interacting Proteins (DIP), the Molecular Interaction database (MINT), and the IntAct database provide information on the interaction of proteins with other proteins, genes, and small molecules. Both the BIND (21) and DIP (8) manage data from protein interaction experiments, including yeast-two-hybrid and co-immunoprecipitation experiments. This data is submitted to the databases either directly or as a result of database curators scouring the literature. The data is provided to the public through querying of the Web sites or in interaction files available in the Protein Standards Initiative (PSI) Molecular Interaction (PSI-MI) XML format.

Visualization of these data sets is made possible through tools such as Osprey (22) and Cytoscape (23), which generate protein interaction networks based on input data from protein interaction databases or from other sources. Cytoscape has the additional functionality of allowing the overlay of gene expression data on the protein interaction map (23). These visualization tools provide initial support in the elucidation of pathways that may be altered after treatment, facilitating the generation of new hypotheses and the identification of biomarkers of exposure and toxicity.

7 Microarray Databases

Microarray databases ensure data are being properly managed, support analysis, archive data for long-term use, and facilitate sharing with collaborators or deposition in public repositories. The Minimum Information About a Microarray Experiment (MIAME) standards provide guidance on the types of information that must be captured and reported in support of a microarray study in order to ensure independent investigators can replicate and properly interpret the data (24). This includes information regarding the clones, genes, protocols, and samples associated with the study. Several journals require microarray submissions to adhere to the MIAME standard, and the MGED (Microarray Gene Expression Data) Society is encouraging journals to require that microarray data sets, in support of published articles, also be submitted to repositories as a condition of publication, similar to requirements that novel sequences be submitted to GenBank prior to publication (25,26). Submission of microarray data sets to the NCBI Gene Expression Omnibus (GEO) (12) or the ArrayExpress (10,11) at the European Bioinformatics Institute (EBI) fulfills this requirement. Recently, more specialized repository efforts have been undertaken, such as the Chemical Effects in Biological Systems (CEBS) Knowledgebase (9,27), which will catalogue gene expression data from chemical exposures with the associated pathology and toxicology data.

With the emergence of more pharmacology and toxicology domain specific data management systems, the International Life Sciences Institute (ILSI) Health and Environmental Sciences Institute (HESI) Technical Committee on the Application of Genomics to Mechanism-Based Risk Assessment, in cooperation with the MGED Society, began work on a toxicology-specific MIAME standard (MIAME/Tox) (28). MIAME/Tox is expected to further specify the minimum information required to replicate a toxicogenomics experiment, which will also serve to facilitate data sharing among the toxicogenomics community. Moreover, it is expected that these databases will be extended to include the management of complementary proteomic and metabolomic data as well as other toxicology relevant data such as chemical/drug structure information, adsorption, distribution, metabolism, and excretion.

8 Conclusion

The use of genomic technologies in the mechanistic understanding of drug and chemical effects in biological tissues requires effective gene annotation. Several annotation sources exist; however, no database captures all of the data, making toxicogenomic data interpretation and network development difficult. For example, information concerning the function of a gene exists within Entrez Gene, however, protein family and structure information exist within the UniProt, and protein interaction data exist within databases such as BIND, DIP, and MINT. Ideally, the integration of data from these disparate sources into a single database would allow a more comprehensive interpretation of the available data. Moreover, a centralized comprehensive knowledgebase would also facilitate the identification of mechanistically based biomarkers for human toxicity and the development of computational models with greater predictive power, which could be used to support and improve quantitative risk assessments.


  1. 1.
    Clamp, M., Andrews, D., Barker, D., Bevan, P., Cameron, G., Chen, Y., et al. (2003) Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res. 31, 38–42.CrossRefPubMedGoogle Scholar
  2. 2.
    Hubbard, T., Andrews, D., Caccamo, M., Cameron, G., Chen, Y., Clamp, M., et al. (2005) Ensembl 2005. Nucleic Acids Res. 33(Database Issue), D447–453.CrossRefPubMedGoogle Scholar
  3. 3.
    Wheeler, D. L., Church, D. M., Edgar, R., Federhen, S., Helmberg, W., Madden, T. L., et al. (2004) Database resources of the National Center for Biotechnology Information: update. Nucleic Acids Res. 32, D35–40.CrossRefPubMedGoogle Scholar
  4. 4.
    Karolchik, D., Baertsch, R., Diekhans, M., Furey, T. S., Hinrichs, A., Lu, Y. T., et al. (2003) The UCSC Genome Browser Database. Nucleic Acids Res. 31, 51–54.CrossRefPubMedGoogle Scholar
  5. 5.
    Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. (2005) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33(Database Issue), D54–58.CrossRefPubMedGoogle Scholar
  6. 6.
    Harris, M. A., Clark, J., Ireland, A., Lomax, J., Ashburner, M., Foulger, R., et al. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–261.CrossRefPubMedGoogle Scholar
  7. 7.
    Bader, G. D. and Hogue, C. W. (2000) BIND—a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bio-\ informatics 16, 465–477.Google Scholar
  8. 8.
    Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., and Eisenberg, D. (2000) DIP: the database of interacting proteins. Nucleic Acids Res. 28, 289–291.CrossRefPubMedGoogle Scholar
  9. 9.
    Waters, M., Boorman, G., Bushel, P., Cunningham, M., Irwin, R., Merrick, A., et al. (2003) Systems toxicology and the Chemical Effects in Biological Systems (CEBS) knowledge base. EHP Toxicogenomics 111, 15–28.PubMedGoogle Scholar
  10. 10.
    Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Vilo, J., Abeygunawardena, N., et al. (2003) ArrayExpress—a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71.CrossRefPubMedGoogle Scholar
  11. 11.
    Rocca-Serra, P., Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., Contrino, S., et al. (2003) ArrayExpress: a public database of gene expression data at EBI. C. R. Biol. 326, 1075–1078.CrossRefPubMedGoogle Scholar
  12. 12.
    Edgar, R., Domrachev, M., and Lash, A. E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210.CrossRefPubMedGoogle Scholar
  13. 13.
    Curwen, V., Eyras, E., Andrews, T. D., Clarke, L., Mongin, E., Searle, S. M., and Clamp, M. (2004) The Ensembl automatic gene annotation system. Genome Res. 14, 942–950.CrossRefPubMedGoogle Scholar
  14. 14.
    Rouchka, E. C., Gish, W., and States, D. J. (2002) Comparison of whole genome assemblies of the human genome. Nucleic Acids Res. 30, 5004–5014.CrossRefPubMedGoogle Scholar
  15. 15.
    Pruitt, K. D. and Maglott, D. R. (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 29, 137–140.CrossRefPubMedGoogle Scholar
  16. 16.
    McKusick, V. A. (1998) Mendelian Inheritance in Man. A Catalog of Human Genes and Genetic Disorders. 12th ed. Johns Hopkins University Press, Baltimore.Google Scholar
  17. 17.
    Hamosh, A., Scott, A. F., Amberger, J., Bocchini, C., Valle, D., and McKusick, V. A. (2002) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 30, 52–55.CrossRefPubMedGoogle Scholar
  18. 18.
    Cox, C. (1999) Nietzsche: Naturalism and Interpretation. University of California Press, Berkeley.Google Scholar
  19. 19.
    Bairoch, A., Apweiler, R., Wu, C. H., Barker, W. C., Boeckmann, B., Ferro, S., et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res 33(Database Issue), D154–159.CrossRefPubMedGoogle Scholar
  20. 20.
    Mulder, N. J., Apweiler, R., Attwood, T. K., Bairoch, A., Barrell, D., Bateman, A., et al. (2003) The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Res. 31, 315–318.CrossRefPubMedGoogle Scholar
  21. 21.
    Alfarano, C., Andrade, C. E., Anthony, K., Bahroos, N., Bajec, M., Bantoft, K., et al. (2005) The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 33(Database Issue), D418–424.CrossRefPubMedGoogle Scholar
  22. 22.
    Breitkreutz, B. J., Stark, C., and Tyers, M. (2003) Osprey: a network visualization system. Genome Biol. 4, R22.CrossRefPubMedGoogle Scholar
  23. 23.
    Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504.CrossRefPubMedGoogle Scholar
  24. 24.
    Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., et al. (2001) Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29, 365–371.CrossRefPubMedGoogle Scholar
  25. 25.
    Ball, C. A., Brazma, A., Causton, H., Chervitz, S., Edgar, R., Hingamp, P., et al. (2004) Submission of microarray data to public repositories. PLoS Biol. 2, E317.CrossRefPubMedGoogle Scholar
  26. 26.
    Ball, C. A., Sherlock, G., and Brazma, A. (2004) Funding high-throughput data sharing. Nat. Biotechnol. 22, 1179–1183.CrossRefPubMedGoogle Scholar
  27. 27.
    Waters, M. D., Olden, K., and Tennant, R. W. (2003) Toxicogenomic approach for assessing toxicant-related disease. Mutat. Res. 544, 415–424.CrossRefPubMedGoogle Scholar
  28. 28.
    Mattes, W. B., Pettit, S. D., Sansone, S. A., Bushel, P. R., and Waters, M. D. (2004) Database development in toxicogenomics: issues and efforts. Environ. Health Perspect. 112, 495–505.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Lyle D. Burgoon
    • 1
  • Timothy R. Zacharewski
    • 1
  1. 1.Department of Biochemistry & Molecular BiologyMichigan State UniversityEast Lansing

Personalised recommendations