Introduction

The successful development of pharmaceutical agents targeting the treatment of dry eye syndrome (keratoconjunctivitis sicca) requires a definitive demonstration that the drug can induce a significant improvement in both a sign and a symptom of the disease. To this end, the reproducible and sensitive assessment of signs and symptoms of dry eye is central to the drug development process. Although our knowledge of dry eye has grown immensely over the last two decades [1],“gold standard” clinical tests to diagnose dry eye syndrome or gauge therapeutic effectiveness for dry eye do not exist since no agents have been approved by the FDA to date for the treatment of both a sign and symptom of the disease.

Common symptoms experienced by the dry eye patient vary significantly and include foreign body sensation, discomfort, dryness, stinging and blurred vision. The diverse array of dry eye symptoms and their impact on quality of life has prompted the use of numerous patient questionnaires, such as the ocular surface disease index (OSDI®) [2], the dry eye questionnaire [3] and the IDEEL [4], to objectively quantify symptom improvement in response to a treatment strategy. A functional improvement in the amelioration of a sign of the disease adds an additional level of complexity compared to symptom assessment due to the fact that there is lack of agreement as to “what sign” is most relevant. Typically, a measure of tear production or corneal staining represents the endpoint upon which drug efficacy will be assessed, but the clear lack of uniform validated methods [1, 5] to assess either of these endpoints have limited the utility of these endpoints for assessing therapeutic efficacy.

Attempts to develop a simple, reproducible clinical test for the measurement of functional tear production that may include secretion, flow and/or residual volume date back over 100 years. At the present time, use of tear clearance/fluorophotometry, Schirmer strips and phenol red threads are all recognized as applicable methods. However, each of these techniques appears to have limitations that may prevent the accurate interpretation of a drug’s clinical benefit. Here, we will attempt to highlight the strengths and weaknesses of these tests to gain objective insight into their utility in a drug development paradigm.

The Schirmer test

Clinically, the Schirmer test is most common and Schirmer scores, representing the length of wetting (in mm) on the strip, are routinely used as a key diagnostic criteria for dry eye. The test involves the insertion of a small piece of filter paper into the lower fornix of the eye. There are two variations of the Schirmer test: Schirmer I measures total tear secretion (basal and reflex). Schirmer II is a measure of reflex secretion only and involves nasal stimulation following insertion of the strip. A variation of the Schirmer I that may allow measurement of basal secretion involves the application of topical anesthetic prior to strip insertion. Although performing the Schirmer I with anesthetic may provide a more accurate picture of basal secretion, the utility and overall effectiveness of anesthetic administration in conjunction with the Schirmer is controversial.

Numerous investigators have examined the question of anesthetic use and although it is clear that tear secretion generally decreases following topical application, there are many variables associated with this procedure. Two of the most relevant are the degree of anesthesia achieved and the efficiency of blotting residual fluid from the cul-de-sac after instillation. The general consensus in the literature is that Schirmer I with anesthesia does not measure true basal secretion [67]. Incomplete anesthesia (including sensation of the lower lid) and psychogenic variables often serve to maintain a level of reflex tearing after instillation of the anesthetic [69]. Additionally, use of anesthetic has been implicated in disruption of cell junctions which may increase surface staining leading to erroneous conclusions during the evaluation of surface integrity [10]. As corneal and/or conjunctival staining is routinely performed after the Schirmer test, the inclusion of anesthetic may inadvertently result in misclassification of the presence and/or severity of dry eye.

Whether with or without the use of anesthetic, numerous review and research papers have documented high variability, low reproducibility and poor correlation with other signs and symptoms of dry eye [1116]. Although some of the variability associated with the Schirmer I test may be minimized in moderate to severe aqueous deficient dry-eyed individuals [13], poor reproducibility severely limits the utility of the this test as a means of quantifying efficacy relating to dry eye therapy. In an attempt to minimize variability, numerous variations of the Schirmer test (in addition to anesthetic use), have been proposed including closing the eyes, using dim light, reducing the test time and the use of different filter materials. To date, no such alteration in methodology has resulted in a consistent improvement in Schirmer reproducibility or diagnostic sensitivity.

Additional limitations associated with the Schirmer test have been described in the past 20 years [11, 16]. Included in this list is the fact that the testing time (5 min) is too long [17, 18]; the paper strip may absorb tears unevenly (depending on tear composition); there is potential for evaporative loss; the test causes discomfort and there is no true agreement for lower “cut off” wetting length limits in non dry-eyed patients or those with dry eye disease [19].

If we consider all the vagaries discussed above, it should not be surprising that there remains a lack of consensus as to what Schirmer score truly is indicative of dry eye syndrome. Scores collected without anesthesia ranging from 5 to <10 mm and <8 mm with anesthesia are generally regarded as abnormal, signifying the presence of dry eye. Much of the difficulty in defining wetting limits for diagnostic purposes can be summarized by a statement paraphrased from Cho [11] “Schirmer values are too variable such that no definite limit for normal tear production can be determined.” Despite significant effort, there have been only a very small number of studies that have found a wetting cut off point that is correlated with another sign or symptom of dry eye. Furthermore, the range of values is such that regardless of a cut off point, false negative and/or positive identification of dry-eyed subjects is common. Despite the paucity of reproducible data, the recent DEWS 2007 report [19] has suggested a reasonable cutoff value of ≤ 5 mm in 5 min for a dry eye diagnosis.

Phenol red thread test

The phenol red thread (PRT) test was introduced in 1982 [20] and was developed to overcome many of the disadvantages of the Schirmer test as described in the previous section including high variability, poor reproducibility, and low sensitivity for detecting dry eyes [20]. The test consists of a cotton thread impregnated with phenol red which is pH sensitive. When wet with tears, which are slightly alkaline, the thread turns from yellow to red. The “red” portion of the thread is analogous to wetting of the Schirmer strip and the length of red color in mm is recorded. Standard clinical data suggests that for a 15-s test, wetting lengths should normally be between 9 and 20 mm. Patients with dry eyes have wetting values of less than 9 mm.

Methodologically, the PRT test is similar to the Schirmer test, although there are some potential advantages that include the fact that there is little to no sensation from the thread, thus less potential for reflex. Furthermore, the test time is only 15 s per eye, the eyes remain open and are free to blink, and no anesthetic is required [16, 2024].

Despite these potential advantages, the PRT is used rarely in clinical practice or in clinical development. Two possible reasons for this are that the threads are difficult to handle initially due to their light and flexible nature and the threads are only manufactured in Japan, making their supply costly and often requiring special ordering.

Although the PRT test is not a standard clinical test, its potential advantages over the Schirmer I test have generated interest in assessing its reproducibility and utility. Several studies have found the PRT test to be more repeatable than the Schirmer test (with and without anesthestic) as well as more reliable in diagnosing dry eye [20, 2425]. Chiang et al. [25] compared 66 normal eyes and 14 dry eyes (DE) using both Schirmer I and PRT. In 28 eyes, both tests were performed on successive days. Comparing normal to dry eyes, the following data were reported: normal PRT = 20.3 ± 8.7 mm vs DE PRT = 8.1 ± 8.0mm (P < 0.005); normal Schirmer = 10.0 ± 7.9 mm vs DE Schirmer = 14.6 ± 9.8 mm (P = 0.33). Based on these data, the authors concluded that the likelyhood of a false positive was 3% using PRT and 18% using Schirmer. An estimate of the reproducibility of the measurements was achieved through comparison of data collected on two successive days. The Pearson coefficient was 0.89 for PRT and only 0.39 for Schirmer. Patel et al. [23] concluded that the PRT test could accurately differentiate aqueous dry eyed subjects from non dry-eyed subjects, although in his study, the thread was left in place for 120 s. In this same study, it was concluded that the PRT test could not differentiate between dry eye and non-dry eye if both aqueous and lipid deficient dry-eyed individuals were lumped together.

From a limited number of studies, it generally appears that the PRT outperforms the Schirmer test in the areas of reproducibility and reliability. Global data interpretation however must proceed with caution, as several of the studies addressing repeatability were performed on non dry-eyed subjects, thus calling into question the true utility of the findings with respect to use in a dry-eyed population. Nichols et al. [13, 26] analysis of reliability and correlation of clinical measurements of dry eye have found that positive correlations do exist between (a) Schirmer and fluorescein staining, (b) PRT and both fluorescein and rose Bengal staining, and (c) Schirmer and PRT. What is disturbing, however, is that although these positive correlations were found, only 31% of dry eye patients had confirmatory tests for dry eye as evidenced by two positive tests out of the six possible tests performed to diagnose dry eye. These tests included fluorescein staining, rose Bengal staining, PRT, Schirmer, tear meniscus height and tear film break-up time (TBUT). These finding highlight the poor correlation that exists between all the tests as well as the difficulty in selecting entry criteria for a therapeutic drug study in the dry eye patient population. Saleh [15] suggests that the poor correlation between tests is as a result of the fact that each test utilizes a different mechanism to assess the ocular surface. Therefore, due to the multifactorial nature of dry eye, many mechanisms do not apply to the individual dry-eye patient. Saleh [15] further demonstrated that in a cataract population being screened for surgery, neither Schirmer nor PRT results agreed with symptoms (28/103 patients were symptomatic of dry eye based on questionnaire) and that PRT results showed no correlation with Schirmer results.

In addition to clinical utility, the question that has surfaced repeatedly is “what does the PRT actually measure?” It was originally proposed that the PRT test measured tear volume. Numerous studies now suggest that the PRT measures a representation of fluid stored in the lower cul de sac plus a component of tear secretion (basal and/or low grade reflex) [16, 2324]. A true measure of reflex secretion would require an absorption capacity far in excess of what a PRT is capable of [16]. In addition, in 15 s, only approximately 0.5 μl of tears or 7.5% of the average normal volume is collected [16], thus raising the question of how reliable an overall estimate of volume could be achieved with the PRT test. It has been suggested that the ability of the PRT to differentiate aqueous dry-eyed from non-dry-eyed subjects is due to the tests’ ability to first absorb the tears naturally present in the lacrimal lake and then continue to measure the replenishment of fluid into the lake as a result of basal flow and/or mild stimulation. Subjects with aqueous deficiency cannot replenish their fluid as quickly, hence less wetting of the thread occurs [23]. Blades and Patel [24] have argued that the length of the PRT test be extended past 15 s as they and others have demonstrated that wetting is not linear and with the threads they employed, reached equilibrium in 120 s. Further adding complexity to our understanding of “what is being measured” is the possibility that the composition of the tear fluid can influence wetting length. Both lipids and mucins can influence flow and it has been demonstrated that in comparison to migration of saline through a thread, variable tear composition likely increases the variability of both PRT and Schirmer measurements [24].

Increased test time may positively influence more than just differential tear composition and reflex tearing. It is argued that leaving the thread in for a longer time will increase the accuracy of the test. For example, the measuring scale for PRT has a resolution of 1 mm, thus if mean migration in 15 s = 9.2 mm, then resolving power = (100 × 1/9.2)% = 11%. If the test runs for 60 s and mean wetting is 18 mm, then resolving power reduces to (100 × 1/18)% = 5.6%, thus increasing the ability to detect change and/or differences among test subjects. Perhaps finding the time that provides the optimal balance between sufficient time for wetting versus minimal irritation (causing reflex tearing) is key. This may apply to both Schirmer and PRT tests [18].

Age and ethnic considerations may also be important in assessing the utility of functional testing for dry eye. Several studies have demonstrated that PRT wetting values for non-dry-eyed subjects are influenced by age (PRT reduces as age increase) and by race [PRT values are lower in American Caucasians compared to Japanese (living in Japan)] [20, 24].

In summary, although rarely used, the PRT may offer an advantage over the Schirmer test with respect to increased measurement reproducibility. To this end, it may provide both better diagnostic utility as well as serve as a more meaningful tool for therapeutic drug evaluation. The PRT test, however, still demonstrates variability and compromised reproducibility due to patient variation in the volume, depth, and shape of the lacrimal lake, the temperature of the environment in which the test is performed and the variation in tear composition among patients. Additionally, the threads are difficult to handle and insert thus necessitating appropriate training. Further research on timing of the test and perhaps the use of a closed eye [27] may further refine reproducibility.

Tear film fluorophotometry/fluorescein clearance

As noted above, neither the Schirmer nor the PRT appear ideal for the accurate and reproducible quantitation of tear production. However, alternative, more dynamic methods have been established utilizing the rate of disappearance of a tracer as the functional readout. Lacrimal scintigraphy is one such method, which involves the application of a radioactive tracer such as technetium 99 (99M Tc) into the lower marginal tear strip [28, 29]. The distribution of the tracer is monitored by a gamma counter, and the rate of change or transit time of the tracer through the system provides an estimate of tear turnover. Although various quantitative algorithms have been proposed for this approach [30], the widespread clinical use of a radioisotope in humans is both undesirable and impractical. Thus, methods utilizing fluorescence rather than radioactive decay were developed to enable assessment of tear turnover.

Following topical ocular application, fluorescein sodium is thought to be distributed homogenously on the corneal surface and conjunctival sac after several blinks. Immediately thereafter, tears containing fluorescein are removed by flow and are replaced by fresh tears not containing fluorescein. The measurement of fluorescein disappearance via fluorophotometry or other fluorescein clearance tests is used to determine tear turnover which is defined as the percentage decrease of fluorescein concentration in tears per unit of time (% minute-1). Basal tear turnover, defined as the tear turnover at the lowest level of reflex tear production possible under physiologic conditions, is then utilized as an indirect quantitative assessment of tear production.

Numerous methodological variations of fluorescein clearance have been proposed. Early studies with in vivo fluorophotometry utilized a modified slit lamp; however, commercially available instruments (i.e., the Fluorotron Master) have greatly aided in the standardization of clearance measurements. Typically, 1 to 5 μl of sodium fluorescein are applied to the ocular surface and fluorescein concentration is measured at various intervals for a duration of 15 to 30 min [3033]. The change is rate of decay of fluorescence is calculated for the entire test duration. Typically, a biphasic decay is observed, with the first 5 min representing initial reflex tearing after which (5 min outward), basal conditions of secretion are represented [34]. It is this second phase of the curve that is used to extrapolate basal tear turnover. Tear turnover data can then be transformed to flow and/or volume estimates [30].

As alternative approaches to fluorophotometry, two indirect or ex vivo measures have been proposed. The first, introduced by Xu and Tsubota [35], is a modification of the Schirmer test with anesthesia. Five minutes after co-application of fluoroescein and anesthetic, Schirmer strips are inserted. The length of wetting is recorded after 5 min and the intensity of strip staining is compared to a standard color plate. Although clinically practical, this approach has been criticized for being semi-quantitative at best. Thus, as a potential improvement, Afonso et al. [7] described a method where following instillation of fluorescein, tears were collected with a porous polyester rod. Dye is eluted from the rod then quantified in a fluorescence multiplate reader.

Data reported in the literature using each of these techniques suggest that measurement of tear clearance provides a more reliable and objective endpoint for the diagnosis of dry eye compared to Schirmer or PRT [3039]. However, numerous limitations remain. Tear film fluorophotometry remains predominantly a research tool due to the cost of the instrumentation required to perform such studies. In addition, precision and accuracy are still subject to error from several sources. First, any disturbance of the corneal surface, such as occurs in dry eye, may alter the corneal uptake of fluorescein from the tear film. This can result in a change to the monoexponential decay rate of fluorescein used in the calculation of tear production [30, 38]. Therefore, in clinical studies used to asses the efficacy of drugs to treat dry eye, an improvement of the ocular surface may in and of itself change the measured rate of tear clearance which could be falsely attributed to the effect of the test drug. Additional sources of error in tear film fluorophotometry may arise from reflex lacrimation, changes in tear film thickness, and corneal autofluorescence.

With respect to the alternative measures of tear clearance, reflex tearing following Schirmer insertion and/or tear collection pose a significant variable. Additionally, expensive instrumentation is required for quantitation of fluorescein extracted from the polyester rods and color comparison of dyed Schirmer strips is complicated by the fact that fluorescein intensity will be affected by the length of strip wetting [39]. Applicable to all tear clearance methods are anatomical influences (lid laxity, blink abnormalities, functional tear outflow obstruction) and optimal timing and dye volume variables. The last point to address is the fact that tear turnover measured by any of these tests is not a direct or independent measure of tear production [7, 30]. As described by Afonso et al. [7], delayed tear clearance occurs in both aqueous tear deficient dry eye as well as “Schirmer normal” subjects. This group and others have suggested that tear clearance values may correlate well with ocular irritation symptoms and/or inflammation of the ocular surface [7, 32], but whether or not true measures of production or volume can be made remains to be tested.

Conclusion

As evidenced by the DEWS 2007 [1] report, considerable progress has been made in our understanding of the epidemiology, pathophysiology, and diagnosis of dry eye. Clinically, it appears clear that no single test is capable of repeatedly differentiating dry-eyed from non-dry-eyed individuals. In this review, we have highlighted the fact that although tests such as the Schirmer, PRT, and fluorophotometry may all be employed to assess clinical endpoints, numerous variables associated with each may contribute to poor test reproducibility and poor correlation to the clinical improvement that accompanies the use of a therapeutic agent. Fluorophotometry offers the greatest potential for providing an accurate assessment of tear production. Although unconventional, the PRT appears to offer the next best accuracy and reproducibility followed by the Schirmer test. It is particularly disconcerting that of all these tests, few if any correlate with each other to a significant degree. From the perspective of therapeutic development, these diagnostic shortcomings translate into a significant obstacle as evidenced by the fact that regulatory agency approval of a therapeutic agent that improves both a sign and symptom endpoint has been elusive. Due to the limitations of obtaining and measuring select endpoints to assess tear function regardless of the test used, it is reasonable to ask if the main reason we do not yet have an approved drug to treat dry eye syndrome is because a given drug simply lacks clinical robustness, or rather, because there are significant quantitative limitations in the methods currently used to demonstrate the clinical robustness of a particular drug? Of concern is that regardless of the tear function test used, it appears that none of these tests may ultimately possess the reproducibility, sensitivity, and/or robustness to serve as a reliable and reproducible quantitative assessment of tear volume. Continued efforts in method development will surely be required in order to validate and characterize tear function endpoints used in clinical trials to support the approval of a drug for dry eye therapy.