Optimal Samples Selection from Gene Expression Microarray Data Using Relational Algebra and Clustering Technique
Real data of natural and social sciences is often very high-dimensional. Dataset handling in high-dimensional spaces presents complicated problems, such as the degradation of data accessing, data manipulating as well as query processing performance. Dimensionality reduction efficiently tackles this problem and benefited us to visualize the intrinsic properties hidden in the dataset. The proposed method first generates decision attribute by computing the class label of each gene using clustering technique and subsequently computes the score of each sample of microarray cancerous gene data based on decision attribute using the division operation of relational algebra and select the samples with score below the average score as initial reduct. The reduced dataset is grouped into k clusters by k-means algorithm where, k is the set of values of decision attribute and matching factor of reduct is computed by considering the overlapping of clusters with the original classes of genes. Other samples are added iteratively one at a time based on their increasing score provided computed matching factor improved and thus final reduct known as optimal set of samples is obtained.
KeywordsRoot Mean Square Error Singular Value Decomposition Relational Algebra Decision Attribute Gene Expression Dataset
Unable to display preview. Download preview PDF.
- 1.Aerman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 1, 6745–6750 (1999)Google Scholar
- 6.Ding, C., Peng, H.C.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. In: Proc. Second IEEE Computational Systems Bioinformatics Conf., pp. 523–528 (2004)Google Scholar
- 7.Pati, S.K., Das, A.K.: Cluster Analysis of Microarray Data Based on Singularity Measurement. International Journal of Bioinformatics Research 3(2), 207–213 (2011)Google Scholar
- 8.Silberschatz, A.: Introduction to Data base Management System. Tata McGraw Hill, New DelhiGoogle Scholar
- 11.Huffman George, J.: Estimates of Root-Mean-Square Random Error for Finite Samples of Estimated Precipitation, pp. 1191–1201. American Meteorological Society (1997)Google Scholar
- 12.Jirapech-Umpai, T., Aitken, S.: Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6(148) (2005)Google Scholar
- 14.Huynen, M., Snel, B., Lathe III, W., Bork, P.: Genome Res. 10, 1204–1210 (2000)Google Scholar
- 15.Mollr-Levet, C., Cho, S., Wolkenhauer, O.: Microarray data clustering based on temporal variation: Fcv and tsd preclustering. Applied Bioinformatics 2(1), 35–45 (2003)Google Scholar