Standardized data collection
In order to assure comparability of the data collected in different centers, a common standard must be defined which all participating centers have to follow. The standard for the TRI database is based on a consensus for patient assessment and outcome measurement found by tinnitus experts from many countries during an international tinnitus conference in Regensburg in 2006 [26]. The participants at this conference agreed that for the sake of inter-study comparability, a minimum of standardized assessments during diagnostic and therapeutic evaluation of tinnitus is needed. Methods for tinnitus patient assessment and treatment outcome measurement in the database were chosen according to this consensus. These core assessments consist of a detailed tinnitus and medical history, otological examination, psycho-acoustic measures of tinnitus and a variety of validated questionnaires assessing tinnitus severity and quality of life and became part of standardized case report forms (CRF). The CRF is now available in a number of languages such as English, Flemish, French, German, Italian, Portuguese, Spanish. In most cases validated translations of the questionnaires were available and included in the CRF. If necessary, translations were provided. Following the lead of accepted translation procedures [27, 28] translators had to be experts in the medical field of tinnitus, native speakers of the target language (e.g. Italian) and fluent in the source language (i.e. English). These experts provided a forward translation that was checked by a second person with the same level of expertise. In case of occasional disagreements, these were resolved by consensus (reconciliation process). The TRI database allows for such an analysis of the psychometric properties of these newly translated questionnaires.
Data from all forms of treatment interventions (e.g., pharmacological, psychotherapeutic, auditory or brain stimulation interventions, etc.) will be entered into the database. Treatment forms included so far in the database encompass pharmaceutical interventions (63% of all patients), brain stimulation techniques (i.e., transcranial magnetic or electrical stimulation; 23% of all patients) and cognitive behavioral therapeutic approaches (about 12%). In the future, further treatments may be added to the database. The only preconditions are that treatment interventions have to be clearly defined and performed in a standardized way. According to various durations of the different treatment interventions, the CRF was organized flexibly in terms of both the number of visits and the intervals between visits. Also the use of additional assessment methods is supported by the database. However, the core assessments are identical for all visits and all interventions to allow cross-comparison. Table 1 gives an example of a standardized CRF in English and how it is used for a pharmacological trial of 12 weeks' duration. The design of the CRF allows the documentation of both cross-sectional and longitudinal data. Up to now, 13 centers from 8 countries (Argentina, Brazil, Belgium, France, Germany, Italy, Spain, and New Zealand) have participated in the database project. CRFs are currently available in English, Flemish, French, German, Italian, Portuguese and Spanish. Translations into further languages are being prepared. Finally, the database is open to anyone who is interested in participating in the project (either by collecting data or addressing research questions and accessing the whole dataset). Scientific agreements define the rights and duties of each participating center. As a general rule, every participating center has full access to its own dataset. In addition, each center may have access to the whole dataset under predefined conditions. Research questions and access to the whole dataset will be discussed within the TRI database scientific committee and has to be approved by it. Further information on how to participate may be found on http://database.tinnitusresearch.org or by sending an email to database@tinnitusresearch.org.
Table 1 Overview of the standardized content of a CRF used for pharmacological studies according to the consensus. Measurements at each visit are graded according to essential to collect (A) or highly recommended (B).
Database construction and technical details
A novel approach to entering data for anonymous patients has been created with strict observation of the guidelines of Good Clinical Practice (GCP) and Federal Drug Administration (FDA) regulations. This approach defines each patient with a unique hash code made up of a 40 cipher string. Since this string is not really legible in routine use for data entry, a substitute rule has been defined to identify the patient more easily: a combination of the patient number (6 ciphers) and the center identification number (ID) (3 ciphers), linked by a hyphen. Thus, in all cases the patient relative to the database was anonymous, and the only people able to recognize the patients behind the numbers were the data entry staff members.
The environment surrounding this approach had been made of a v5.x MySQL database and a PHP v4.x and - after migration onto a more modern hosting server - v5.x based application using some JavaScript functionality (scarcely). For all database entries, modification, and updating modes, PHP-based transactions have been newly designed so as to enable both a complete usage and error tracking system. Thus, revisions and other changes were easy to follow and made up one part of the users' administration in order to modify the users' activity (e. g. active, inactive, banned).
So as to minimize the efforts to manually enter all CRFs, a system was designed using the German application FormPro® to automatically scan the CRFs through a high-volume scanner and import the recognized data (after some corrections, if necessary) into the database.
In order to optimize and simplify data evaluation, some of the most relevant calculations had been put into the system first hand. The rest of the data structures were adapted to fit the mathematical needs of data analysis.
In order to not be jeopardized by any kind of data loss or unwanted data change, the system was finalized by a separately developed backup system, that both incrementally and completely did its job every night and put the one-day evaluation backup file (packed and encrypted) on a separate, specially secured FTP (SFTP) server for downloading by the data evaluation staff.
The final enrichment of the system was done by the development of a criteria-based validation system: Only validated data was incorporated into the evaluation backup file.
Although initially the use of a virtual private network (VPN) had been recommended, a https://approach using encrypted user logins and passwords was considered safe. For security reasons, however, the hardening of the underlying Linux operating system (openSuSE 10.3, later 11.1) was undertaken so as to prevent any kind of external hacking or cracking trials. Furthermore, the system's logging function was set from debug (9) to paranoid mode (10) automatically sending the log files to the system administrator for both automatic and manual tracking.
Data handling/quality
High emphasis has been placed on the standardization of data collection and on assuring the quality of data handling. After completing a manual CRF data entry, each center sends the original CRF to the central database management at the University of Regensburg, where the digital data entry is conducted. A copy of each CRF remains in the individual study center. The Center for Clinical Studies at the University Hospital in Regensburg developed a data validation strategy and generated a detailed data handling plan, which defines the action to be taken in case of missing values, implausible or illegible data, incomplete data or self-evident corrections according to GCP guidelines. The data handling plan is the foundation for data entry and contains several approaches to ensure the validity of the data entered.
In addition, automatic computer-based checks during data entry (i.e. defined value ranges, field type controls) and regular manual checks of missing and implausible values in the database minimize errors and upgrade the data quality. Based on subject identification numbers, each missing and implausible value can be located and a query to the relevant study center is generated (query-management).
Statistical analysis approaches
The primary goal of the database is the definition of subgroups of tinnitus patients, who respond to a specific treatment according to their tinnitus-specific characteristics and concomitant medical conditions. Analyses start with descriptive statistics using counts, proportions (percentages), means and standard deviations, medians and ranges. The focus will be on a backward-oriented analysis strategy aiming to characterize patients responding to any intervention. Responders will be determined via changes in their tinnitus scores from baseline to end of treatment, mostly after 8 or 12 weeks. The criteria of minimally important clinical changes will be established by cross-validation of the different assessment instruments.
Responders will be compared to non-responders in order to detect differences in demographic or clinical variables. These analyses will be performed for responders versus non-responders across therapies as well as for each single intervention separately. Significant factors indicating treatment response can then be tested in the future in specifically designed studies. Due to the exploratory nature of the project and the large amount of potential important variables, the a priori specification of a detailed statistical analysis plan is not feasible. The statistical analysis plan will be developed as the data base grows and results from initial analyses become available. Ideally, results will be the basis for a decision support tool using a set of pre-defined clinical and demographic criteria that will help to tailor the proper therapy for a given patient.