Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes

Huang, Hung-Chung; Zheng, Siyuan; Zhao, Zhongming

doi:10.1186/1471-2105-11-S4-P23

Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes

Poster presentation
Open access
Published: 23 July 2010

Volume 11, article number P23, (2010)
Cite this article

Download PDF

You have full access to this open access article

BMC Bioinformatics Aims and scope Submit manuscript

Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes

Download PDF

Hung-Chung Huang^1,2,
Siyuan Zheng^1,2,3 &
Zhongming Zhao^1,2,3

4825 Accesses
12 Citations
Explore all metrics

Background

DNA microarrays have been widely applied in cancer research for better diagnosis and prediction of the disease states. Traditionally, most microarray studies aim to identify differentially expressed genes (DEGs) by comparing the average gene expression levels between two groups (e.g., the treated vs. control or disease vs. non-disease) based on statistical analysis such as t-test and Significance Analysis of Microarrays (SAM) [1, 2].

Materials and methods

In this study, we defined the gene expression profile (GEP) of a gene as the distribution of the log₂ values of its normalized expression signal intensities across the samples in the similarly studied microarrays. We hypothesized that the biomarker genes that distinguish disease samples from normal samples might form distinct GEPs between comparison groups. We applied Pearson Correlation Coefficient (PCC) and Kolmogorov-Smirnov Distance (KSD) metrics to identify disease-specific biomarkers by comparing GEPs between normal and disease states and then applied this technology to disease (e.g., cancer) related studies in order to discover some disease genes as biomarker candidates. These biomarkers’ gene profiles in normal and disease samples might be used to diagnose or monitor patient's disease state via regular gene expression analysis.

Results and conclusion

We applied the PCC and KSD metrics to three prostate cancer related microarray datasets. They were generated from the same study and were available in the GEO database (a total of 81 normal samples and 90 prostate cancer samples) [3]. Using the cutoff values KSD > 0.4 and PCC < 0.7, we found 230 biomarker candidate genes. Our Gene Ontology (GO) analysis found that the top ranked biomarker candidate genes for prostate cancer were highly enriched in molecular functions such as “cytoskeletal protein binding” category. We used the top two ranked genes (ACTA1, encoding an actin subunit, and HPN, encoding hepsin) to demonstrate that prostate cancer might be diagnosed and monitored by marker genes. Furthermore, we picked top 20 significantly up-regulated and top 20 down-regulated genes based on PCC and KSD sorting. We found gene pairs comprising one up-regulated and another down-regulated had always best prediction performance (Table 1). Our study provided a promising tool to identify the potential biomarker genes for disease diagnosis and prognosis.

Table 1 Top 10 gene pairs for top prediction accuracies on PCA diagnosis.

Full size table

References

Jafari P, Azuaje F: An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Med Inform Decis Mak 2006, 6: 27. 10.1186/1472-6947-6-27
Article PubMed Central PubMed Google Scholar
Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001, 98: 5116–5121. 10.1073/pnas.091062498
Article PubMed Central CAS PubMed Google Scholar
Chandran UR, Ma C, Dhir R, Bisceglia M, Lyons-Weiler M, Liang W, Michalopoulos G, Becich M, Monzon FA: Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer 2007, 7: 64. 10.1186/1471-2407-7-64
Article PubMed Central PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Functional Genomics Shared Resource, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
Hung-Chung Huang, Siyuan Zheng & Zhongming Zhao
Bioinformatics Resource Center, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
Hung-Chung Huang, Siyuan Zheng & Zhongming Zhao
Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN, 37232, USA
Siyuan Zheng & Zhongming Zhao

Authors

Hung-Chung Huang
View author publications
You can also search for this author in PubMed Google Scholar
Siyuan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhongming Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongming Zhao.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Huang, HC., Zheng, S. & Zhao, Z. Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes. BMC Bioinformatics 11 (Suppl 4), P23 (2010). https://doi.org/10.1186/1471-2105-11-S4-P23

Download citation

Published: 23 July 2010
DOI: https://doi.org/10.1186/1471-2105-11-S4-P23

Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes

Background

Materials and methods

Results and conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application of Pearson correlation coefficient (PCC) and Kolmogorov-Smirnov distance (KSD) metrics to identify disease-specific biomarker genes

Background

Materials and methods

Results and conclusion

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation