Methodological requirements for valid tissue-based biomarker studies that can be used in clinical practice
- First Online:
- Cite this article as:
- True, L.D. Virchows Arch (2014) 464: 257. doi:10.1007/s00428-013-1531-0
- 556 Views
Paralleling the growth of ever more cost efficient methods to sequence the whole genome in minute fragments of tissue has been the identification of increasingly numerous molecular abnormalities in cancers—mutations, amplifications, insertions and deletions of genes, and patterns of differential gene expression, i.e., overexpression of growth factors and underexpression of tumor suppressor genes. These abnormalities can be translated into assays to be used in clinical decision making. In general terms, the result of such an assay is subject to a large number of variables regarding the characteristics of the available sample, particularities of the used assay, and the interpretation of the results. This review discusses the effects of these variables on assays of tissue-based biomarkers, classified by macromolecule—DNA, RNA (including micro RNA, messenger RNA, long noncoding RNA, protein, and phosphoprotein). Since the majority of clinically applicable biomarkers are immunohistochemically detectable proteins this review focuses on protein biomarkers. However, the principles outlined are mostly applicable to any other analyte. A variety of preanalytical variables impacts on the results obtained, including analyte stability (which is different for different analytes, i.e., DNA, RNA, or protein), period of warm and of cold ischemia, fixation time, tissue processing, sample storage time, and storage conditions. In addition, assay variables play an important role, including reagent specificity (notably but not uniquely an issue concerning antibodies used in immunohistochemistry), technical components of the assay, quantitation, and assay interpretation. Finally, appropriateness of an assay for clinical application is an important issue. Reference is made to publicly available guidelines to improve on biomarker development in general and requirements for clinical use in particular. Strategic goals are formulated in order to improve on the quality of biomarker reporting, including issues of analyte quality, experimental detail, assay efficiency and precision, and assay appropriateness.
Paralleling the growth of ever more cost efficient methods to sequence the whole genome of pieces of tissue as small as 0.01 g has been identification of increasingly numerous genetic abnormalities in cancers—mutations, amplifications, insertions, deletions, and patterns of differential expression of genes, i.e., overexpression of growth factors and underexpression of tumor suppressor genes. Knowledge of tumor-associated molecular abnormalities has stimulated the large scale screening and development of reagents (small drugs and antibody derivatives) that bind to the abnormal genes or gene products. Pharmaceutical agents that are specific for these molecular targets have greatly increased in number. Succeeding this growth industry, which produces drugs that target these molecular abnormalities and that have both greater tumor-specific effect than conventional cytotoxic agents with less severe and less frequent side effects, has been the impetus to develop tissue-based biomarkers. Generically defined, tissue biomarkers include any tissue-based feature, ranging from the histological grade of a cancer to point mutations. We shall focus on molecular biomarkers in this review.
Biomarkers can be categorized as either prognostic or predictive. Prognostic markers predict clinical outcome after × number of years. The number of years appropriate to evaluate the value of a biomarker should be proportionate to the natural history of the cancer. The fate of most patients with lung cancer will become apparent within 2 years; in contrast, the outcome of the average patient with prostate cancer would not be known with certainty for more than 5 years. Predictive markers predict response of a tumor to a drug that putatively targets a key functional molecule involved in the biology of that tumor. Given the high cost of many targeted therapies, many early phase clinical trials are requiring or, for markers that are not integral to a clinical trial, encouraging the identification and validation of predictive biomarkers of these novel agents. However, identifying the biomarker that can serve as a detector of abnormalities in a functionally critical step of the progression of the cancer can be challenging, especially if the molecular pathway contains many regulatory genes. The mandate to identify a biomarker that can readily be assessed is particularly high when the predictive biomarker is integral to the trial. An integral marker is essential for a clinical trial, i.e., is used to qualify a patient for a specific therapy.
The performance of biomarkers is of concern to the regulatory agencies, the drug manufacturers, the health care deliverers and to patients, to maximize benefit and minimize cost, and complications of therapy based on expression of the biomarker. However, the performance of many biomarkers has been relatively poor .
There is a major challenge with immunohistochemical data that is used for tissue-based biomarkers. Biopharmaceutical companies that use the biomedical literature to identify predictive biomarkers or biomarkers that are integral to early phase clinical trials with novel drugs rely upon the literature to identify candidate predictive biomarkers and molecular targets for therapy. However, no more than 50 % of studies can be replicated. Consequently, costs to develop drugs are often wasteful, incurred by companies on drugs for which the evidence of functional efficacy is poor. A theme of this review is how the quality of biomarker studies can be improved.
As attention has been directed to biomarkers, the biomed community has become aware of the effect of factors independent of the biologic state of the tumor that can lead to erroneous results of tissue-based assays. These can be categorized as preanalytical and analytical variables that affect the expression and expression levels of biomarkers. These effects, which do not reflect the biology of the tumor, but are artifacts of tissue handling and of the way that the assay is performed, can skew results of the assays from what the true biologic state of the analyte is.
This introductory section discusses the effects of these variables on assays of tissue-based biomarkers, classified by macromolecule—DNA, RNA (including micro RNA, messenger RNA, long noncoding RNA), protein, and phosphoprotein. Many studies of the effect of preanalytical and analytical variables on macromolecules in tissue have been published. A repository of manuscripts of these studies, which were published in peer-reviewed journals, is on the website of the former Office of Biospecimen and Biorepository Research of NCI. Hundreds of studies have analyzed the effect of preanalytical and analytical variables. A web-accessible library of publications, the Biospecimen Research Database (https://brd.nci.nih.gov/BRN/brnHome.seam), organizes the publications in a table organized by type of specimen (blood, serum, plasma, urine, saliva, and tissue) and type of assay for each macromolecule (DNA, RNA, protein, small molecule, morphology, cell count/volume, and peptide).
General approach of this review
This review is focused on the most common materials used in by pathology labs in clinical practice—human tissue obtained at surgery, fixed in neutral buffered formalin, and embedded in paraffin (FFPE tissue). Although the main emphasis is on the method most commonly used in clinical labs to evaluate biomarkers in FFPE tissue—immunoperoxidase stains of protein analytes assessed visually by pathologists—comments will be provided about other analytes and methods of assessment of tissue-based biomarkers. Finally, examples are based on studies of prostate cancer, the cancer with which the author is most familiar.
DNA. Although relatively resistant to degradation, the informativeness of DNA can be affected by tissue handling. Formaldehyde fragments DNA, resulting in shorter, less informative fragments wherein mutations might be missed, or, conversely, be misidentified. A relevant recent finding is that DNA sequences differ in different somatic tissues, based on whole genome sequences . This finding raises the prospect that even the sequence of DNA used as “normal control DNA” may be misleading when used as reference DNA for the DNA sequence of a tumor in that individual.
mRNA. This most labile class of molecule has been used to generate molecular signatures of the effect of environment on tumor tissue. For example, RNA signatures of warm ischemia (the length of time devascularized tissue is at body temperature), cold ischemia (the length of time tissue sits at ambient temperature until stabilization by freezing or fixation), or even preoperative diet (low vs. high protein content) have been generated. Micro RNA is a more stable nucleic acid than is messenger RNA.
Protein. One of the most widely used biomarker assays is immunohistochemical localization and assessment of expression level of specific proteins in FFPE tissue. At least 62 preanalytical variables that might affect immunohistochemical reactivity (IHR) have been evaluated . These variables cover the time course from tissue acquisition through fixation and storage of the paraffin blocks to handling of sections prior to immunostaining. The extent that preanalytical conditions affect protein levels varies with the analyte and with the study. Below is a summary of those variables that are most frequently encountered, that contribute most to preanalytical variability, and that should be considered in immunohistochemical assays.
Delay in acquiring the tissue from the surgery
Laparoscopic procedures have resulted in longer warm ischemia time (the time between tissue devascularization and removal of tissue from the patient) than in the prelaparoscopic era. Though changes in RNA profile are measurable, few studies have assessed the effect of warm ischemia on proteins. An evolving consensus is that warm ischemia time not exceed 2 h.
Delay in fixation
In general, fixation can be delayed up to 12 h without compromising IHR. In one study, there was no effect on estrogen receptor (ER) or progesterone receptor (PR) IHR with delay in fixation of up to 8 h at 19 °C or up to 12 h at 4 °C. Beyond 6 h the effect of “cold” ischemia varies with the analyte.
Time in fixative
Fixation for at least 24 h appears to be sufficient for preservation of all analytes. Though loss of immunoreactivity by underfixation is the most frequent concern, overfixation can affect staining of some analytes. For example, there is loss of signal when fixation of ER exceeds 4 days.
Postfixation tissue processing
Although publications are discrepant, immunoreactivity can be variably affected, in an analyte-independent manner, at the following stages of tissue and section handling: duration and temperature of dehydration of fixed tissue, reagents used for clearing tissue and the temperature of clearing, duration and temperature of paraffin embedding, and the temperature and duration of heating to enhance adherence of sections to slides.
Duration of block storage
The length of time that FFPE blocks can be stored without loss of immunoreactivity varies by analyte and study. Some blocks have been stored as long as 25 years without loss of IHR of ER. IHR of some other analytes decreases after 2 years of block storage.
Duration and temperature of storage of sections
Preanalytical variables include paraffin coating of sections, desiccation, storage in nitrogen, and storage at –80 °C. The extent of effect varies with the study and the analyte.
Labs are unlikely to change or control for some conditions despite weak evidence that the variable may affect IHR. These include the nature and pH of fixative (recommended: 10 % neutral formalin buffered to pH 5–7), tissue to fixative ratio (recommended: at least 1:10), and size of tissue specimen (recommended: one dimension is no greater than 0.3 cm). The effect on IHR of other variables is minimal or negligible, such as type of tissue processor. And the effect of some variables has not been assessed, i.e., storage of specimens in different fixatives. Publications concur on the effect of fixation delay, fixation, dehydration, paraffin impregnation, drying of slides, and storage conditions on immunostains, commenting that some recommendations by oncology interest groups differed from published guidelines. Fox example, delay in fixation, recommended not to exceed 1 h, can exceed 12 h without discernible effect. And sections mounted on slides should probably be stained within 6 days of preparation, though two separate guidelines cite maximum times of 1 and 6 weeks, respectively.
These studies have limitations. The methods used by the different investigators were not standardized. And a majority of studies have not been validated by independent investigations. What is needed are metrics to assess the state of retention of the in vivo state of proteins in tissue, prior to the effect on these analytes of preanalytical conditions. An initial set of markers that provide correlation, albeit weak, with quality of ER have been published. Loss of phosphorylated tyrosine immunoreactivity and increased acetylated lysine and HIF1A immunoreactivity correlates with decreased ER staining . Unknown is whether these markers, or any set of markers, can be used to assess the degree of retention of the native state of all analytes. One suspects not, hypothesizing that host factors (variance in the functionality and metabolism of proteolytic enzymes) and analyte factors (primary and secondary structure, access to proteases), influence the quality of ischemia markers.
Primary antibody. Performance characteristics of the primary antibody include sensitivity and specificity for detecting the analyte of interest in sections of FFPE tissue. Rigorous assessment of specificity involves biochemical and cell biological techniques that clinical labs are virtually never prepared to undertake. These techniques include confirmation that engineered overexpression or knockdown of the analyte in cultures and preparations of cells (such as cell line microarrays) correlates with IHR. There are suggested guidelines for assessing the quality of primary antibodies for use in IHC studies . However, since most clinical labs are not prepared for detailed evaluations of the performance of antibodies, directors of IHC labs must rely on information provided by the antibody vendors. Too often, this information lacks data relevant to IHC studies of FFPE tissue. For example, antibodies that produce a single band on Western gels and are localized to cells of interest in frozen tissue may not immunoreact with cells of interest in FFPE tissue.
Technical component of assays. As currently practiced, immunoperoxidase stains involve at least four sequential chemical reactions. In principle, conditions should be optimized for each step, i.e., antibodies should saturate binding sites and the kinetics of peroxidase catalysis of substrate diaminobenzidine should be optimized for the variables that affect the amount of substrate that is deposited on the tissue section—temperature, pH, concentration of electron donor hydrogen peroxide, and duration of reaction. Such systematic optimization of all steps of IHC has been exemplified by a study correlating a biochemically assayed protein extract from pituitary with immunoperoxidase stains quantified by densitometry . Such optimization of the chemistry is virtually never done. Assays of protein in tissue that involve fewer reactions promise to be more reproducibile. Quantum dot-conjugated immunostains are, in principle, subject to less variance since there is no enzyme substrate reaction and fewer steps are needed . Mass spectrometry analysis of protein content in cells of interest, i.e., cancer cells, is another method that is more quantitative than immunoperoxidase stains. However, there is currently no efficient way to isolate cells of interest from tissue in a volume sufficient for mass spectrometry.
Interpretive component of assays. Currently, immunoperoxidase stains are analyzed either visually by pathologists or quantified by image analysis. The advantage of visual interpretation is ready accessibility of stains without needing to acquire and devote effort using an image analysis system. However, visual cognition of the human eye varies with the task. While we can readily identify and categorize many objects that contrast with their microenvironment, such as cancer cells in tissue, human vision performs poorly in assessing degrees of difference of elements of an image, such as grey levels (http://bcs.mit.edu/people/adelson.html). Furthermore, tissue elements adjacent to the objects being analyzed influence our visual cognition . Finally, there are biologic limitations to our ability to visually “measure” light intensity. For example, the sensitivity of our retina to light differs by the wavelength of light .
Quantitation. In contrast to visual assessment, the optical density of the pixels that comprise the digitized image of an immunostain can be determined with great accuracy using an image analysis system. However, two components of image analysis limit transferability and reproducibility of the data. Both components involve a pathologist (or, plausibly, a pathology-competent individual). First, the cells of interest need to be identified and distinguished from histological mimics. No current image analysis system can reliably distinguish low-grade prostate carcinoma from such mimics as adenosis, basal cell hyperplasia, and prostatic intraepithelial neoplasia. Second, a threshold must be set to distinguish pixels that are deemed “positive” (reaction product) from those that are “negative.” Interpathologist variance in these cognitive tasks is virtually never assessed. There are additional sources of variance in image analysis systems that should be considered, and either controlled for or discarded, as appropriate . Finally, imaging systems need to include in their evaluation cells of interest that are stained below the positive threshold level. By using multiple stains (to identify nuclei, which would have to be annotated by the pathologist as cancer nuclei, and the analyte of interest), imaging systems can address many of these challenges .
Visual scoring. How immunostain interpretations are reduced to single values that can be used by clinicians in making treatment decisions is an additional source of variance. Immunostains are characterized by two parameters—the percentage of cells that express analytes at different levels of intensity. The challenge has been incorporating the two parameters of immunostains into a single number. One frequently used calculation is the “HScore” (a sum of the products of level of stain intensity and percent of cells stained at each intensity level). A consequence of using the HScore is that stains that differ markedly can have the same value. A different way to report the data is as categories of staining . Though categorical values lack the pseudo precision of HScores, their reproducibility is, in principle, higher.
Guidelines and specifications
The literature can be made more robust by adhering to a number of specifications, such as reporting all of the components of immunohistochemical stains. A specification termed MISFISHIE has been published. This specification includes detailed characterization of primary antibodies used in IHC studies. However, the performance, specificity, and sensitivity of primary antibodies is not part of the MISFISHIE specification. Publications have made the point that specificity and sensitivity of primary antibodies should be validated. With respect to the quality of published IHC studies, and thus, the probability that the results can be replicated, one cannot but think that data on preanalytical variables that affect assay results should be a required part of the publication. Based upon reviews of IHC studies, publications should consider addressing the effect (if any) of warm and cold ischemia time, length of fixation, and tumor necrosis. These parameters could be reported as an addendum or as an incorporated table. This table could also be used by the editorial staff of the journals publishing IHC studies. A second reason for the inability to replicate IHC studies has to do with the interpretation of immunostains. Little attention, in general, has been paid to how they are interpreted. Two major sources of variance in interpretation are the fact that majority are accessed visually rather than by optical density of pixels in an image digitized file. Visual cognition is sufficiently poor that one questions whether no more than two, perhaps three, categories of stain intensity should be used. A second variable is that there are virtually never a set of standard curves, referencing the intensity (or, luminosity, when fluorochromes are used as the detection agents) to reference expression levels of the analytes. A third variable has to do with heterogeneity of tumors and a question of whether the samples chosen for the immunohistochemical analysis truly are representative of the state of expression of analytes in tumors. For example, the sequences of DNA from different regions of a large renal cell carcinoma differ . The challenge is to identify the region of the tumor that has genomic changes that correlate with tumor progression, i.e., growth rate, patterns of metastasis, and resistance to therapy.
Predictive biomarkers, used to select patients for targeted therapy, will play an increasingly prominent role in cancer treatment as more drugs targeted to pathway molecules are made available. Several additional considerations pertain to these biomarkers—notably assay performance and validation, assay utility, and clinical utility.
Guidelines for tissue-based assays
A number of guidelines has been published within the past decade. MISFISHIE provides a checklist for components of tissue-based assays that are sufficiently detailed for an independent to be likely to successfully replicate the study . REMARK criteria list the expected components of a clinical trial that has a tissue biomarker component . Recently published is a set of guidelines for RNA array-based data that might be used as prognostic or predictive biomarkers . Despite availability of these guidelines, there has been little incentive for most of the biomedical community to use them, presumably since adherence to them requires more effort and expense, both by the investigators who study the biomarkers and by the reviewers of manuscripts that report biomarker studies.
Use of biomarkers in practice
To effectively incorporate predictive biomarkers into practice requires close interaction between clinical trialists/oncologists, laboratorians/pathologists, outcome statisticians, and regulatory agencies . Appreciation has grown that the level of evidence of the predictive power of a biomarker provides a basis for assessing the value of the marker in clinical practice. Four categories of level of evidence have been proposed in studies of breast cancer biomarkers . Levels of evidence range from anecdotal observations to validation of the performance of the biomarker in an independent cohort where clinicians and pathologists are blinded to the clinical outcome of patients in the study. The concept of level of evidence can be expanded to include evidence of clinical utility and validity in addition to the strength of evidence that the biomarker is predictive . Stated another way, a predictive biomarker is of value if it categorizes a significant number of patients. Conversely a biomarker is of little value if most patients do or do not respond to the targeted treatment.
Once the performance of an assay has been characterized and deemed adequate for the intended purpose, it should be validated. Validation of assays poses challenges for clinical labs, which may not have the specimens, resources, or experience to efficiently validate a tissue-based assay . An additional consideration for immunohistochemical assays is the lack of a standard method for validating quantitative immunostains. In lieu of being able to generate standard deviations, interobserver variance in quantitative values is often reported as kappa values, which is a nonparametric value .
A final step in accepting an assay for clinical use is approval by regulatory agencies, such as the FDA.
Future directions (a wish list)
Details of experiments. That publications that include tissue-based biomarkers contain all methodological information of the biomarker assay sufficient for an independent investigator to conduct and interpret the results of the assay identical to how the assay was done in the original publication . As discussed above, such a specification, with spreadsheet, which incorporates most components of tissue-based assays in a checklist form (MISFISHIE) has been published .
Efficiency. That assays be done and interpreted in the most effort-efficient manner possible for that assay.
Precision of tissue-based assays. That variability of interpretation of the assay be reported in all publications. This goal involves more than one investigator independently interpreting the assay. Although the type of assay with presumably the greatest interobserver variance is an immunoperoxidase assay that is visually interpreted, assays that use image analysis should be included since pathologists make several usually subjective decisions, as previously discussed:
• Selection of the region of interest that is digitized and analyzed.
• Setting a threshold for binarizing results.
• A method that evaluates objects that are “negative.” i.e., that stain at or below the level of background staining.
Analogous to assays of analytes in solution, the results of each pathologist could be reported as a range of values, which would reflect the precision of the assay.
Accuracy of tissue-based assays. That a set of standards be developed that includes the dynamic range of the assay and the range of the majority of values of the type of cancer for which the assay is used.
Analyte quality. That internal standards that assess the quality of protein in tissue be developed.
Assay appropriateness. That results of the assay be “fit for purpose.” Included in this wish is the expectation that the range of interobserver and intraobserver or test variance of the assay that is acceptable to the clinician be reported.
This work was supported in part by the National Cancer Institute Pacific Northwest Prostate Cancer Specialized Program of Research Excellence (SPORE; P50 CA 097186-06).
Conflict of interest
I declare that I have no conflict of interest.