Introduction

The relationship between genetic variation and phenotype is at the heart of the model organism approach to the study of human disease. In recent years the mouse has become the model organism of choice for the study of human disease, partly as a consequence of its physiologic and genomic similarities, but also because of the developments in mouse genetics, that now provide powerful tools for the manipulation of the mouse genome (Rosenthal and Brown 2007). The last five years have also seen rapid advances in the instrumentation and technology available for detailed phenotyping, and these factors together provide enormous potential for the advancement of our understanding of gene function in health and disease.

The torrent of phenotype data currently being generated from both gene-driven and phenotype-driven experimental approaches to functional genomics will accelerate over the next few years. With the accumulation of data now emerging from the large ethyl-nitrosourea (ENU) mutagenesis projects (Auwerx et al. 2004) and the ambitious whole mouse genome mutagenesis projects represented by the International Knockout Mouse Consortium (Collins et al. 2007), there is the risk that this will overwhelm our ability to retain, share, and exploit the resulting information. The challenges presented by the collection and analysis of this volume of phenotype data are unprecedented, not only because of the quantity, but also the range and depth of the information. This requires specifically tailored approaches to the capture and representation of radically different types of data, for example, craniofacial morphology and blood chemistry (Brown et al. 2006; Gkoutos et al. 2005). The dominant approach to this set of problems is exemplified by that adopted by the EUMORPHIA consortium using EmPRESS (Green et al. 2005), where phenotype is represented by a standard assay, which then defines a set of measurements or descriptions derived from formal description frameworks and ontologies (Mallon et al. 2008). The power of this approach is that it allows for high-resolution data to be captured on individual mice for one or more assays and then combined to provide data that can be compared with that from background or control strains. Relating this accumulated variant phenotype data to genetic information is then a matter for new computational tools and resources, many of which are newly available or under development (Chen et al. 2007; Groth et al. 2007; Swertz et al. 2004).

Crucial to the utility of this data is that it is presented in a formalized way to facilitate data sharing, which requires that databases use standard data structures and semantics. Currently, two databases present raw data for individual mouse strains: the Mouse Phenome Database (Bogue et al. 2007) (http://www.jax.org/phenome) and the EuroPhenome Database (http://www.europhenome.org) (Mallon et al. 2008).

Pathology is an essential aspect of phenotyping that requires labor-intensive workup and detailed knowledge of laboratory mouse anatomy, physiology, and genetics to be fully effective. There are two major problems with recording this aspect of phenotype: standardization of pathology data, and the availability of pathology expertise to derive and interpret that data. The latter is a well-recognized problem: “The importance of pathology in mouse phenotyping cannot be underestimated. However, the laborious nature of pathology analysis and the dependence on a small cadre of experts continues to represent a significant stumbling block to unraveling the mouse phenome” (Brown et al. 2006). Such expertise is not easy to find (Barthold et al. 2007; Cardiff et al.2008; Valli et al. 2007) and the perils of “DIY pathology” are well illustrated in the article by Cardiff et al. (2008). The gold standard is represented in the systematic pathology segment of the German Mouse Clinics phenotyping process where there is standardized morphologic phenotyping of potential mouse models (Mossbrugger et al. 2007).

The depth of data captured, data structure, and description semantics are not yet fully standardized and require not only community agreement on the minimal information needed to record a phenotype but also data capture tools that allow for rapid and accurate recording of data in a form in which it may be uploaded to central databases (Mouse Phenotype Database Integration Consortium 2007). The terminology for lesions in widespread use is a mixture of veterinary and human diagnostic names that do not always correspond, although recent recommendations by the Mouse Models of Human Cancer Consortium (MMHCC) have gone some way toward standardization of nomenclature for neoplastic diseases (Cardiff et al. 2000; Kogan et al. 2002; Nikitin et al. 2004a, b; Shappell et al. 2004). Unfortunately, adoption of these recommendations has been slow among pathologists working in different environments and traditions. Much needed resources are being developed to provide standard reference vocabularies for mouse anatomy at the gross level (mouse anatomy ontology) and disease processes (mouse pathology ontology) useful at both the gross and microscopic levels. Integration of these with annotated and labeled line drawings, gross photographs or photomicrographs, and literature references provides tools that can be rapidly used for reference and for training the next generation of mouse specialist pathologists. These are adjuncts to, not replacements for, traditional training and mentorship approaches (Barthold et al. 2007; Sundberg et al. 2004). Unfortunately these types of resources are spread all over the world at many different institutions and if online are often unlinked.

Diagnostic laboratories face record-keeping problems that can be overwhelming. Using traditional approaches to diagnostic case record-keeping linked to flat files of anatomy (Hayamizu et al. 2005) and pathology ontologies (Schofield et al. 2005) provides one approach to rapidly coding case materials, standardizing the diagnoses, and retrieving all case materials. Development of a disease diagnostic field with assessment of the disease severity provides a definitive answer that is also semiquantitative. These are necessary for quantitative trait locus analysis (QTL mapping) as well as for defining the pathogenesis of a novel disease in a mouse model system.

Web-based systems can now integrate all of these activities to allow a pathologist to review slides from a case and rapidly enter the diagnosis, which is automatically accurately coded and can be exported in a defined format, e.g. XML, to other databases. More importantly, hyperlinks to the appropriate web site provide access to photomicrographs of representative cases in other genetically engineered mice or inbred strains and provides the pathologist with reference information, descriptions, and original papers on the disease process. This approach provides tools for verification of a diagnosis, training for those not familiar with laboratory mice, and a means to improve the quality of the service to the molecular biologist submitting the samples. Furthermore, because panels of veterinary and physician pathologists volunteer to maintain the quality of these databases, it is possible to access expertise not otherwise available to accurately define diseases and make appropriate comparisons with human diseases.

In this article we describe MoDIS (Mouse Disease Information System), a system that integrates all of the above-mentioned tools using a Microsoft Access-based database and which is open source and freely available (The Jackson Laboratory, Bar Harbor, ME; http://research.jax.org/faculty/sundberg/index.html). This provides a valuable tool for setting up a mouse pathology phenotyping program.

Materials and methods

Case materials

Routine disease surveillance cases received by The Jackson Laboratory’s Laboratory Animal Health Disease Surveillance Program from 1987 to 2000 were used to develop the original medical records database (Sundberg and Sundberg 1990, 2000), which used a traditional free-text diagnostic field. In 2006 this was converted so that the MPATH pathology ontology could be used to standardize detailed histopathologic phenotyping methods for defining and describing diseases. This conversion was done to expedite large-scale phenotyping and haplotype mapping of chronic diseases in the most important inbred strains of mice used today in biomedical research. Complete systematic necropsies (Seymour et al. 2004) were performed on 15 males and 15 females of each of the 31 inbred strains designated in the Mouse Phenome Database (http://phenome.jax.org) at 12 and 20 months of age. All tissues were screened by one pathologist (JPS) to standardize the first screen interpretation (R. Yuan et al., unpublished). Additional protocols from the major international research consortiums doing phenotyping can be accessed through a common website (www.interphenome.org) (Mouse Phenotype Database Integration Consortium 2007).

Databases

MoDIS is implemented on a Microsoft Access database (Microsoft Corp., Redman, WA) platform and is the descendent of earlier versions of our pathology medical records database, which were converted from dBASE III Plus (Ashton-Tate, Torrance, CA) (Sundberg and Sundberg 1990) to FoxPro 2.6 for Windows (Microsoft) (Sundberg and Sundberg 2000) and then to its current form. Accumulated practical experience of using this database for pathology data capture in a large institutional setting has greatly improved its development. MoDIS stores information from individual mice and cross references to images captured in an image database, e.g., Extensis Portfolio (http://www.extensis.com). The database is structured to record information associated with husbandry, pedigree, strain, assays performed, location of material, and pathologic diagnosis. The primary entity in the database is a necropsy case consisting of one mouse. A mouse can have many diagnoses and several special tests such as immunohistochemistry or microbiology associated with the record. Each mouse case can have many types of materials associated with it (histology, photographs, frozen tissue) (Fig. 1).

Fig. 1
figure 1

Diagram of work and data flow. Investigators/collaborators submit a mouse. The animal is necropsied, at which point samples are collected for histopathology and special tests. When all results are received they are added to the case file, including diagnoses for all lesions and final diagnoses. The finalized report can then be emailed to the submitter or printed out and signed to create a legal diagnostic medical report

The diagnostic information includes a “Disease Description” field, in which an extendable controlled vocabulary containing high-level diagnostic terms is used by the pathologist to input a summative pathologic diagnosis. This allows for locally preferred terminology to be defined and recorded. The recording of standardized pathologic terminology uses terms from the MPATH and MA ontologies for each lesion. It is possible to record several Disease Descriptions and several MA/MPATH pairs of terms for each mouse.

An additional field grades severity of lesions that builds on the commonly used adjectives no lesions (0), mild (1), moderate (2), severe (3), and extreme (4). This provides an estimate of the variation of severity within a group of mice of the same strain and genotype or treatment group which provides a semiquantitative set of parameters for comparison which can be used in quantitative trait locus analyses.

Ontologies and controlled vocabularies

MoDIS uses the MPATH (Schofield et al. 2005) and MA ontologies (Hayamizu et al. 2005) downloaded from the OBO foundry web site (http://www.obofoundry.org/) as flat files. Terms from the two ontologies are used to specify the intersection of anatomy and pathology for each lesion. Strains and genotypes are recorded in compliance with standard nomenclature (http://www.informatics.jax.org/mgihome/nomen/gene.shtml). Special tests, organisms found in testing, submitters, and housing locations may also be recorded in free text or locally controlled vocabulary (CV). The facility is available to build a local controlled vocabulary (CV) for high-level summative disease diagnoses.

Results

MoDIS is currently designed for local installation but with the facility to output files to other databases and programs. The local MoDIS database at The Jackson Laboratory now contains nearly 40,000 records and forms the core resource for other databases and resources at The Jackson Laboratory, such as the Mouse Tumor Biology Database (Begley et al. 2007), and elsewhere. These clinical records can be quickly searched for individual cases or case series by disease, organ, and strain to output to comma-separated values (CSV) or a Microsoft Excel file that can be further sorted and analyzed. All case materials are linked by a common accession number code for each animal in other databases of the laboratory. Individual or groups of images representing example sections have been placed online with annotations. Summaries of studies are also being put online as they are completed and curated [Mouse Phenome Database (MPD), http://phenome.jax.org (Bogue et al. 2007; Mouse Phenotype Database Integration Consortium 2007); Mouse Tumor Biology Database (MTB), http://tumor.informatics.jax.org (Krupke et al. 2008); and Pathbase, http://www.Pathbase.net (Schofield et al. 2004a, b)]. Complete tables of spontaneous background diseases, although currently published only for strain disease surveillance (Mikaelian et al. 2004; Sundberg and Ichiki 2005a), will soon be online with links to the specific photomicrographs of lesions from the strain in question.

Pathologic diagnosis recording is complex and depends to a great extent on the tradition in which the diagnostician was trained. This problem has generated much discussion recently about the standardization of the semantics of pathologic diagnoses. From experience we believe that the solution is to use standard defined ontologies for formal recording but to leave the clinician with the ability to make local annotations in other formalisms. This means that eventual export of key data to central public databases such as Europhenome (Mallon et al. 2008) can be semantically and syntactically compatible with accepted standards and can be achieved automatically. The MPATH ontology is continuously under review by a panel of pathologists at annual meetings of the Pathbase European Mouse Pathology Consortium. The pathologists (both veterinarians and physicians) review new terms, edit them, and arrive at a consensus on the terms and their definitions. Similarly, MA is under constant revision and refinement. Thus, there is an ongoing system of checks and balances of the terminology used as well as a formal means to upgrade the system, especially to expand to a higher level of sophistication and utility.

MoDIS as a training resource

When online and if the ontology flat files used for coding are current and linked to Pathbase (http://www.pathbase.net), it is possible to move from the ontology term to retrieve a formal definition with literature or web references and, where appropriate, annotated images of similar lesions that are posted by pathologists who work with mice worldwide on Pathbase (Fig. 2).

Fig. 2
figure 2

Steps to move from a diagnosis to definitions, references, and images. Once a diagnosis is arrived at and MoDIS is linked to Pathbase, one can move to the MPATH definition and from there to images of lesions given this diagnosis. In this way one can quickly verify the tentative diagnosis using virtual mentoring

Discussion

Standardization and integration

The capture of detailed primary data from phenotyping experiments, with appropriate structure, is a sine qua non for high-throughput large-scale studies. The development of more structured descriptions of phenotypes will allow data to be processed and interpreted in a consistent manner and facilitate the development of new computational software. The capture of primary data from individual mice allows for its reanalysis and reuse in the light of new hypotheses and new information and maximizes the added value of the studies. Development of tools that address this is a recognized need. For example, the MPHASYS system (Calder et al. 2007) is designed to capture and integrate phenotype information, though not in a format that recognizes ontologies and structure. This example emphasizes the importance of semantic and syntactic standardization for the general application of such data capture tools, and the importance of adherence to community consensus standards, which we have begun to implement in MoDIS. Removal of multiple manual curation steps in data entry reduces the risk of error and the cost of large database curation, which can be substantial. Therefore, data capture tools need to be intuitive and readily usable by the “phenotyper,” in this case the pathologist, and consideration of the user’s expertise is an important aspect of their design.

Training and referencing

To help pathologists interpret lesions that develop in inbred, genetically engineered, or experimentally manipulated mice, small pathology programs are being set up in medical centers and universities worldwide. While ideally these programs would be peer-driven, mentor-based programs (Sundberg et al. 2004), in fact they are usually run by isolated junior-level pathologists or clinicians with marginal training in histopathology. While many books are now available (Bannasch and Goessner 1994; Bannasch and Gossner 1994; Frith and Ward 1988; Kaufman et al. in press; Kaufman 1992; Kaufman and Bard 1999; Maronpot et al. 1999; Mohr 2001; Mohr et al. 1996; Smith et al. 2002; Sundberg 1994; Sundberg and Boggess 2000; Sundberg and Ichiki 2005b; Ward et al. 2000), these are only a partial substitute for a team of pathologists with whom one can consult. Formal national training programs (Barthold et al. 2007; Sundberg et al. 2007), mentored by established senior pathologists or other organ or disease experts, can provide support for these junior pathologists. The high volume of case materials with which many are faced, combined with the fact that senior pathologists with expertise in rodent pathology are not readily available at many institutions, results in less than optimal working conditions. We provide here a freely available system that addresses the problem of record-keeping and case retrieval, continuing education, and confirmation (second opinions) on the cases. We provide a relatively simple system that can be easily integrated into larger databases or the data can be downloaded into formats for use in larger, more comprehensive databases. This database and full documentation on how to use it are available free online (http://research.jax.org/faculty/sundberg/index.html).

Investigators usually want a summative diagnosis rather than a list of lesions. Pathologists understand that lesions can be independent of each other or linked, and that they can be specific to the strain used, the husbandry conditions, or be related to the experimental manipulation done on the animals. For those mapping complex genetic traits, keeping lesions separated by organ system and quantified is critical for these types of analyses. By adding a field for disease severity by simply converting adjectives commonly used by pathologists to describe lesions (mild, moderate, severe, and extreme) to a graded scale (1, 2, 3, and 4, respectively), one now has a simple semiquantitative scale for all lesions. If this is done consistently one can immediately run a quantitative trait analysis. We have used this successfully for many years for mapping inflammatory bowel disease severity and resistant genes in the mouse (Bleich et al. 2004; Bristol et al. 2000; Farmer et al. 2001; Mahler et al. 1998, 1999, 2002).

Future developments

While the use of small locally instituted MoDIS databases can be useful for a wide range of users, migration to a server-based relational database management system (RDBMS) such as MySQL would open up a range of further possibilities with regard to sophistication of structure, access, and interoperability. Live linkage of ontologies to the Ontology Lookup Service (OLS) (http://www.ebi.ac.uk/ontology-lookup/) would maintain currency with the standard ontologies and remove any requirement for manual updating. Standards for the reporting of environmental and husbandry conditions as well as other assays are now being developed and inclusion of compatibility with these standards will enhance interoperability with larger databases and computational tools to generate a data capture, coding, and uploading resource of wide utility.