Background

Recording detailed information on collection, processing and storage of samples is crucial both for efficient reporting on any biomedical study and for subsequent data analysis [1]. Collecting and storing this information in a systematic way is particularly important in the context of high-throughput applications, such as proteomics and genomics technologies. Thus, systems facilitating patient and sample data management are in high demand.

We have developed an open-source software system for recording, storing, and providing access to information on biosamples. This system – P atient a nd S ample S ystem for I nformation M anagement (PASSIM) – allows researchers to track information pertinent to sample collection, processing, location, transportation and storage conditions. PASSIM provides an efficient solution to confidentiality issues by separate storage of non-identifiable sample information and records of research participants. The system is web-based, which means that non-identifiable information is kept on a server and can be securely accessed on-line for queries or new submissions by authorised users via web-browser. PASSIM is simple and generic, and thus can be customized for various types of biological studies.

It is worth noting that several publicly available systems include sample-related information in their data models (MiMiR [2], MIMAS [3], ArrayExpress [4]) in order to deepen the integration of sample and experiment data. These data models work well within specific domains (mostly for microarray analysis), but do not allow for effective analysis, integrating various "-omics" data. In principle it might be possible to generalise one of such systems for other types of high-throughput data, however that would further complicate what is already a complex system. We believe that to make such a system more simple and generic, the module used for storage of experiment metadata and results should be separate from the one for the sample information, though they should be interoperable. To the best of our knowledge very few systems of this type are publicly available, e.g. caTISSUE, Open Infrastructure for Outcomes (OIO) [57].

The system we present here is a generic version of a system developed for an international collaborative project – Molecular Phenotyping to Accelerate Genomic Epidemiology (MolPAGE). MolPAGE includes 18 academic institutions, biotechnology and pharmaceutical companies (see [8]). In this paper we briefly describe the design principles and the functionality of PASSIM and discuss how the biomedical community can benefit from using such a system and learn from our experience.

Implementation

General structure

PASSIM has two main modes:

  1. 1)

    Data submission – entering and editing the information on samples and individuals;

  2. 2)

    Data access – browsing and querying this information, and generating reports.

The submission form is concise, many of the parameters can be reused in a vast spectrum of studies, and more specific ones can be modified or added to the form. At the same time, PASSIM also supports the retrieval of the information, thus representing an effective means of communication and data transfer between sample collection sites and experimentalists.

To deal with the conflicting needs of local researchers (who might wish to retain linkage to non-anonymized subject IDs e.g. as part of ongoing studies) whilst avoiding the potential breaches of security associated with making such data available over the web, we adopted a two-tier solution, consisting of two subsystems:

  1. 1)

    Stand-alone Person Management Tool (PMT), used on-site by the staff collecting the samples [see Additional file 1];

  1. 2)

    Sample Management Database (Sample DB) accessible through the web-based interface [see Additional file 2, figure 1]

Figure 1
figure 1

The general structure of PASSIM.

PMT is intended for registering confidential information about the research subjects from whom samples have been taken. As already mentioned, the system assigns a unique anonymous identifier to each individual, which is then used for the individual identification in the Sample DB. Each sample collection site hosts its local copy of the PMT. It is worth noting that keeping identifiable information separately from de-identified information might not be a suitable solution for the studies that require inclusion of identifiable private information into the accessible dataset.

The Sample DB is accessible online through a web-based interface (using the Java Server Pages technology, and the Apache Tomcat servlet container) and contains non-confidential information about samples. It allows registration of samples and aliquots as well as the subsequent tracing of aliquot locations (see Figure 2). The system is built so that it can work with any traditional relational databases; it was tested on Oracle or MySQL DBMS.

Figure 2
figure 2

The Sample Database interface.

Object model

Sample management system is designed around three main classes: PERSONS, SAMPLES and ALIQUOTS (see Figure 3).

Figure 3
figure 3

Central classes and some attributes of the sample management system. Each sample in the database is associated with only one person; similarly each aliquot is associated with only one sample. There can be an arbitrary number of samples per person and an arbitrary number of aliquots per sample. The total number of tables is 22.

All descriptions are entered using controlled vocabularies. Relations (such as "parent", "sibling" etc) between persons are modelled with two additional tables RELATIONS and RELATION_TYPES (Figure 3) and allow storage of an arbitrary number of relations for each person. Details on the storage and transport conditions and on the sample state at its reception are shared between samples and aliquots.

In addition to the three main tables, there is a table USERS that contains information about the system users and their access rights. Table 'Users' contains information about all the users who can log into the system. Fields Login name and User password contain the information for logging into the system and should not be empty. Passwords are stored in the database in unencrypted form available to administrator for editing or reminding a forgotten password to a user. There are 4 types of access rights: view-only, access for editing by individual or group users and full access; with an option of editing the administrative tables. For details on the differentiation of the access rights, please see Additional file 1: Table 1. PASSIM is designed for collaborating groups which collect samples in a number of locations and then send them to a different location for analysis. The system allows to register and trace these samples and each group of users has appropriate access to information that is relevant for the group.

Table 1 User rights.

Functionality and customisability

Similarly to the object model, web interface design of Sample DB is based on three main pages: "Persons", "Samples" and "Aliquots", where the corresponding information can be entered, edited or deleted. There is an option for batch submission of several aliquots of the same sample or of samples taken from the same subject. The properties of an existing aliquot entry can be transferred and assigned to a newly created aliquot entry. Such "submitter-friendly" design makes the submission process easy as well as decreases the possibility of mistakes coming from retyping the same information. There is also a possibility to edit the same parameter for many aliquots simultaneously, which may help coordinate transportation and storage of samples across locations.

In addition to submission capabilities, the system provides advanced search engine capabilities. The data can be filtered by such properties as date of birth, gender, source, type, disease state, location and storage conditions. For complex queries there is an option of generating a report using a pre-downloaded copy of the Sample Management Database.

Another important feature of PASSIM is that potential values of the parameters specific to aliquots, samples or research subjects can be changed via the same web interface. Thus, the metadata terms (pre-defined vocabularies) can always be adjusted to fit a preferred ontology or controlled vocabulary by users with sufficient access rights. Additional file 2: Table 2 provides more detailed information on what properties for each parameter are editable. A user intending to edit a parameter name would need to have administrator access rights (A0 or A1). Addition of more parameters or metadata terms can be done, but require direct modification of the database as well as some changes in programs.

Table 2 Configuration of web pages.

Such a design makes the system flexible towards developing and changing biological vocabularies. Complete guide to the configuration of access rights and web pages is available at the PASSIM website [9].

Discussion

The initial specifications for the system were developed by the MolPAGE Consortium members. The main aim of MolPAGE is to develop methods to support genomic epidemiology: that is the measurement, manipulation and analysis of "omics"-scale data in large-scale epidemiological samples. The specifications defined a limited number of properties and variables for individuals, samples and aliquots, which were to be recorded. The sample collection took place at 4 collection sites across 3 different countries. The Patient management system, installed at the sample collection sites, was populated with the clinical data. Then, a unique identifier was generated for each patient and this identifier was transferred to the Sample database. This anonymous identifier constituted a basis of the sample and aliquot IDs. The centralised Sample management system was used through a secure web-interface by both the submitters of the sample information and by the partners analysing the samples. The access rights were diversified to meet the needs of various groups of users. The work within the MolPAGE Consortium revealed a few areas for further development of the system, among which were generation of reports, batch uploading and batch editing.

As PASSIM has proven to be successful, we implemented a generic version of this system for a broader scientific community to use it in other biomedical projects of a similar nature. Information management support for consistent reporting on biomedical research is the rationale behind the creation of PASSIM. This system can potentially assist in a wide range of studies, in which the results cannot be interpreted accurately without sufficient sample information, such as studies of genetic or plasma biomarkers. LIMS systems are conventionally designed to capture the experimental routine from sample collection to data analysis, and these systems are often not the optimal ones to be used specifically for sample-related data and metadata. PASSIM, on the other hand, is a much lighter software solution than LIMS, designed for capturing, storing and browsing sample-related metadata.

Apart from expanded functionality, the application of PASSIM in the MolPAGE project had another important outcome – an object model, which can serve as a basis for a simple home-made relational database, or as a model for standardized data exchange format.

Standardization of reporting on the results is important in many biomedical studies, for instance in epidemiological studies. It imposes new requirements on day-to-day routine information management [10], thus calling for an effective means for the capture and retrieval of sample-related data. At the moment, there are a number of initiatives controlling the manner in which an investigator reports on a newly discovered biomarker or a newly developed diagnostic test [1113]. There are also Clinical Data Interchange Consortium [14] and Clinical Data Architecture of Health Level Seven program [15]. Should scientific journals endorse the standards for reporting on such studies (similarly to how, for example, it has been done for microarray studies [16]), the level of details required for related publications would necessitate utilization of LIMS or similar tools for metadata recording in any biomedical research group. Unfortunately, commercial software solutions are expensive and not every lab can afford such a system. We feel that PASSIM or systems that can be derived from our approach can close this gap. In future, we plan to link it to the system for storage of high-throughput experiment data, which is currently under development.

Conclusion

The open-source nature of PASSIM means that, first, it is an affordable solution for data management and, second, more importantly, its source code is available for external inspections and modifications. It can be customized for needs of a particular laboratory. To the best of our knowledge, it is the only open-source system of this kind.

Availability and requirements

The PASSIM system along with supporting information can be obtained on the http://passim.sourceforge.net. The on-line tutorial provides assistance in training of potential users of the system. Installation guide and system information can help set up and customize PASSIM for a particular project.

Both parts of PASSIM – Sample Management Database and Person Management Tool – can be also downloaded from http://bioinf.mii.lu.lv/PASSIM/.

Project name: P atient and S ample S ystem for I nformation M anagement (PASSIM)

Project home page: http://passim.sourceforge.net

Operating system(s): platform independent

Programming language: Java

Other requirements: Tomcat 5.0 or more, JDK 1.4.2 or more, Apache Ant 1.6.5 or more; the supplied version of the system is configured for MySQL, additional jdbc driver is required for different databases.

License: open source, non-restricted

Any restrictions to use by non-academics: no restrictions