Matching genes across microarray platforms is a critical step in meta-analysis. Standard practice uses UniGene to match genes. Numerous studies have found poor correlations between platforms when using UniGene matching.
We profiled samples from 33 breast cancer patients on two different microarray platforms (Affymetrix and cDNA) and investigated gene matching. Our results confirmed that UniGene-based matching led to poor correlations of gene expression between platforms. Using RefSeq, a database maintained by the National Center for Biotechnology Information (NCBI), we developed and implemented a new method to refine gene matching. We found that the correlations between gene expression measurements were substantially higher after the RefSeq matching. Our approach differs from previously reported sequence-matching approaches and retains useful expression measurements. It is a sensible approach for matching probes across platforms.
We conclude that UniGene alone is insufficient to match genes across platforms. Refined matching based on RefSeq significantly improves the quality of matches.
This is a preview of subscription content, log in to check access.
Wheeler DL, Church DM, Federhan S, et al. Database resources of the National Center for Biotechnology. Nucleic Acids Res 2003; 31: 28–33
Barczak A, Rodriguez MW, Hanspers K, et al. Spotted long oligonucleotide arrays for human gene expression analysis. Genome Res 2003; 13(7): 1775–85
Bloom G, Yang IV, Boulware D, et al. Multi-platform, multi-site, microarray-based human tumor classification. Am J Pathol 2004; 164(1): 9–16
Culhane AC, Perriere G, Higgins DG. Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 2003; 4(1): 59
Ghosh D, Barette TR, Rhodes D, et al. Statistical issues and methods for meta-analysis of microarray data: a case study in prostate cancer. Funct Integr Genomics 2003; 3(4): 180–8
Kothapalli R, Yoder SJ, Mane S, et al. Microarray results: how accurate are they? BMC Bioinformatics 2002; 3(1): 22
Kuo WP, Jenssen T, Butte AJ, et al. Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics 2002; 18(3): 405–12
Mah N, Thelin A, Lu T, et al. A comparison of oligonucleotide and cDNA-based microarray systems. Physiol Genomics 2004; 16(3): 361–70
Moreau Y, Aerts S, De Moor B, et al. Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends Genet 2003; 19(10): 570–7
Rogojina AT, Orr WE, Song BK, et al. Comparing the use of Affymetrix to spotted oligonucleotide microarrays using two retinal pigment epithelium cell lines. Mol Vis 2003; 9: 482–96
Wang J, Coombes KR, Highsmith W, et al. Differences in gene expression between B-cell chronic lymphocytic leukemia and B cells: a meta-analysis of three microarray studies. Bioinformatics 2004; 20(17): 3166–78
Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2005; 33(1): D501–4
Ji Y, Wu C, Liu P, et al. Applications of beta-mixture models in bioinformatics. Bioinformatics 2005; 21: 2118–1
Mecham BH, Wetmore DZ, Szallasi Z, et al. Increased measurement accuracy for sequence-verified microarray probes. Physiol Genomics 2004; 18: 308–15
Mecham BH, Klus GT, Strovel J, et al. Sequence-matched probes produce increased cross-platform consistency and more reproducible biological results in microarray-based gene expression measurements. Nucleic Acids Res 2004; 25: e74
Symmans W, Ayers M, Clark E. Fine needle aspiration and core needle biopsy samples of breast cancer provide similar total RNA yield, but different stromal gene expression profiles cancer. Cancer 2003; 97: 2960–71
Pusztai L, Ayers M, Stec J, et al. Gene expression profiles obtained from single passage fine needle aspirations (FNA) of breast cancer reliably identify prognostic/predictive markers such as estrogen (ER) and HER-2 receptor status and reveal large scale molecular differences between ER-negative and ER-positive tumors. Clin Cancer Res 2003; 9: 2406–15
Ayers M, Symmans FW, Stec J, et al. Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel/FAC chemotherapy in breast cancer. J Clin Oncol 2004; 22: 2284–93
Li C, Wong W. Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol 2001; 4: 1–11
Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 1977; 39: 1–38
Ali S, Coombes RK. Estrogen receptor alpha in human breast cancer: occurrence and significance. J Mammary Gland Biol Neoplasia 2002; 5: 271–81
Cunliffe H, Ringner M, Bilke S, et al. The gene expression response of breast cancer to growth regulations: patterns and correlation with tumor expression profiles. Cancer Res 2003; 63: 7158–66
Nielsen T, Hsu F, Jensen K, et al. Immunohistochemical and clinical characterization of the basal-like subtype of invasive breast carcinoma. Clin Cancer Res 2004; 10: 5367–74
Carter SL, Eklund AC, Mecham BH, et al. Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces cross-platform inconsistencies in cancer-associated gene expression measurements. BMC Bioinformatics 2005 Apr 25; 6(1): 107
Anderson K, Hess KR, Gold D, et al. Reproducibility of gene expression signature based predictions in replicate experiments. Clin Cancer Res 2006; 12(6): 1721–7
We would like to acknowledge Stephen Tirrell, James Stec, Mark Ayers and Jeffrey S Ross from Millennium Pharmaceuticals (Cambridge, MA, USA) for performing the microarray hybridisation. The Millennium Pharmaceuticals also provided research funding to Dr Pusztai to conduct the clinical trial.
This research was in part supported by the University of Texas SPORE in Lung Cancer grant CA070907 and Prostate Cancer grant CA90270.
The authors have no conflicts of interest that are directly relevant to the content of this article.
About this article
Cite this article
Ji, Y., Coombes, K., Zhang, J. et al. RefSeq Refinements of UniGene-Based Gene Matching Improve the Correlation of Expression Measurements Between Two Microarray Platforms. Appl-Bioinformatics 5, 89–98 (2006). https://doi.org/10.2165/00822942-200605020-00003
- cDNA Clone
- Expression Measurement
- cDNA Array
- Gene Expression Measurement
- Affymetrix Array