Signal, Image and Video Processing

, Volume 3, Issue 1, pp 53–61 | Cite as

The effect of microarray image compression on expression-based classification

  • Qian Xu
  • Jianping Hua
  • Zixiang Xiong
  • Michael L. Bittner
  • Edward R. DoughertyEmail author
Original Paper


Current gene-expression microarrays carry enormous amounts of information. Compression is necessary for efficient distribution and storage. This paper examines JPEG2000 compression of cDNA microarray images and addresses the accuracy of classification and feature selection based on decompressed images. Among other options, we choose JPEG2000 because it is the latest international standard for image compression and offers lossy-to-lossless compression while achieving high lossless compression ratios on microarray images. The performance of JPEG2000 has been tested on three real data sets at different compression ratios, ranging from lossless to 45:1. The effects of JPEG2000 compression/decompression on differential expression detection and phenotype classification have been examined. There is less than a 4% change in differential detection at compression rates as high as 20:1, with detection accuracy suffering less than 2% for moderate to high intensity genes, and there is no significant effect on classification at rates as high as 35:1. The supplementary material is available at


Microarray Classification Compression 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Dougherty E.R., Datta A.: Genomic signal processing: diagnosis and therapy. IEEE Signal Process. Mag. 22, 107–112 (2005)CrossRefGoogle Scholar
  2. 2.
    Dougherty E.R., Datta A., Sima C.: Research issues in genomic signal processing. IEEE Signal Process. Mag. 22, 46–68 (2005)CrossRefGoogle Scholar
  3. 3.
    Taubman D., Marcellin M.: JPEG2000: Image Compression Fundamentals, Standards, and Practice. Kluwer, Dordrecht (2001)Google Scholar
  4. 4.
    ISO/IEC 14495-1, ITU Recommendation T.87, Information technology—Lossless and near-lossless compression of continuous-tone images (1999)Google Scholar
  5. 5.
    Jornsten R., Wang W., Yu B., Ramchandran K.: Microarray image compression: Sloco and the effect of information loss. Signal Process. 83, 859–869 (2003)zbMATHCrossRefGoogle Scholar
  6. 6.
    Hua, J., Liu, Z., Xiong, Z., Wu, Q., Castleman, K.: Microarray basica: Background adjustment, segmentation, image compression and analysis of microarray images. EURASIP J. Appl. Signal Process. 92–107 (2004)Google Scholar
  7. 7.
    Zhao H., Langerød A., Ji Y., Nowels K.W., Nesland J.M., Tibshirani R., Bukholm I.K., Kåresen R., Botstein D., Børresen-Dale A., Jeffrey S.S.: Different gene expression patterns in invasive lobular and ductal carcinomas of the breast. Mol. Biol. Cell 15, 2523–2536 (2004)CrossRefGoogle Scholar
  8. 8.
    Tibshirani R., Hastie T., Narasimhan B., Chu G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99, 6567–6572 (2002)CrossRefGoogle Scholar
  9. 9.
    Lacayo N.J., Meshinchi S., Kinnunen P., Yu R., Wang Y., Stuber C.M., Douglas L., Wahab R., Becton D.L., Weinstein H., Chang M.N., Willman C.L., Radich J.P., Tibshirani R., Ravindranath Y., Sikic B.I., Dahl G.V.: Gene expression profiles at diagnosis in de novo childhood aml patients identify flt3 mutations with good clinical outcomes. Blood 104, 2646–2654 (2004)CrossRefGoogle Scholar
  10. 10.
    Ziv J., Lempel A.: Coding theorems for individual sequences via variable-rate coding. IEEE Trans. Inform. Theory 24, 530–536 (1978)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Welsh T.: A technique for high-performance data compression. IEEE Comput. Mag. 17, 8–19 (1984)Google Scholar
  12. 12.
    Weinberger M., Seroussi G., Sapiro G.: The LOCO-I lossless image compression algorithm: principles and standardization into JPEG-LS. IEEE Trans. Image Process. 9, 1309–1324 (2000)CrossRefGoogle Scholar
  13. 13.
    Strang G., Nguyen T.: Wavelets and Filter Banks. Wellesley-Cambridge Press, New York (1996)Google Scholar
  14. 14.
    Vetterli M., Kovačević J.: Wavelets and Subband Coding. Prentice-Hall, Englewood Cliffs (1995) zbMATHGoogle Scholar
  15. 15.
    Shapiro J.: Embedded image coding using zero trees of wavelet coefficients. IEEE Trans. Signal Process. 41, 3445–3463 (1993)zbMATHCrossRefGoogle Scholar
  16. 16.
    Said A., Pearlman W.: A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Trans. Circuits Syst. Video Technol. 6, 243–250 (1996)CrossRefGoogle Scholar
  17. 17.
    Taubman D.: High performance scalable image compression with EBCOT. IEEE Trans. Image Process. 9, 1158–1170 (2000)CrossRefGoogle Scholar
  18. 18.
    Chen Y., Dougherty E., Bittner M.: Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Opt. 2, 364–374 (1997)CrossRefGoogle Scholar
  19. 19.
    Troyanskaya O., Cantor M., Sherlock G. et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001)CrossRefGoogle Scholar
  20. 20.
    Pudil P., Novovičová J., Kittler J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15, 1119–1125 (1994) CrossRefGoogle Scholar
  21. 21.
    Tibshirani R., Hastie T., Narashimhan B., Chu G.: Class prediction by nearest shrunken centroids with applications to dna microarrays. Stat. Sci. 18, 104–117 (2003)zbMATHCrossRefGoogle Scholar
  22. 22.
    Ioannidis J.P.: Microarrays and molecular research: noise discovery?. Lancet 365, 454–455 (2005)Google Scholar
  23. 23.
    Dougherty E.R., Brun M.: On the number of close-to-optimal feature sets. Cancer Inform. 2, 189–196 (2006)Google Scholar
  24. 24.
    Ein-Dor L., Kela I., Getz G., Givol D., Domany E.: Outcome signature genes in breast cancer: is there a unique set?. Bioinformatics 21, 171–178 (2005)CrossRefGoogle Scholar
  25. 25.
    Grate, L.R.: Many accurate small-discriminatory feature subsets exist in microarray transcript data: biomarker discovery. BMC Bioinformatics, vol. 6, 2005Google Scholar
  26. 26.
    Sima C., Attoor S., Braga-Neto U., Lowey J., Suh E., Dougherty E.R.: Impact of Error estimation on feature-selection algorithms. Pattern Recognit. 38, 2472–2482 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Qian Xu
    • 1
  • Jianping Hua
    • 2
  • Zixiang Xiong
    • 1
  • Michael L. Bittner
    • 2
  • Edward R. Dougherty
    • 1
    • 2
    Email author
  1. 1.Department of Electrical and Computer EngineeringTexas A&M UniversityCollege StationUSA
  2. 2.Translational Genomics Research InstitutePhoenixUSA

Personalised recommendations