Data Shrinking Based Feature Ranking for Protein Classification

  • Sumeet Dua
  • Sheetal Saini
Part of the Communications in Computer and Information Science book series (CCIS, volume 31)


High throughput data domains such as proteomics and expression mining frequently challenge data mining algorithms with unprecedented dimensionality and size. High dimensionality has detrimental effects on the performance of data mining algorithms. Many dimensionality reduction techniques, including feature ranking and selection, have been used to diminish the curse of dimensionality. Protein classification and feature ranking are both classical problem domains, which have been explored extensively in the past. We propose a data mining based algorithm to address the problem of ranking and selecting feature descriptors for the physiochemical properties of a protein, which are generally used for discriminative method protein classification. We present a novel data shrinking-based method of ranking and feature descriptor selection for physiochemical properties. The proposed methodology is employed to demonstrate the discriminative power of top ranked features for protein structural classification. Our experimental study shows that our top ranked feature descriptors produce competitive and superior classification results.


Protein Classification Feature Ranking Feature Selection Data Shrinking 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ong, S., Lin, H., Chen, Y., Li, Z., Cao, Z.: Efficacy of Different Protein Descriptors in Predicting Protein Functional Families. BMC Bioinformatics 8 (2007)Google Scholar
  2. 2.
    Tan, A., Gilbert, D., Deville, Y.: Multi-Class Protein Fold Classification using a New Ensemble Machine Learning Approach. Genome Informatics 14, 206–217 (2003)Google Scholar
  3. 3.
    Chinnasamy, A., Sung, W., Mittal, A.: Protein Structural and Fold Prediction using Tree-Augmented Naïve Bayesian Classifier. In: Proceedings of 9th Pacific Symposium on Biocomputing, pp. 387–398. World Scientific Press, Hawaii (2004)Google Scholar
  4. 4.
    Ding, C., Dubchak, I.: Multi-Class Protein Fold Recognition using Support Vector Machines and Neural Networks. Bioinformatics Journal 17, 349–358 (2001)CrossRefGoogle Scholar
  5. 5.
    Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Approach for Multi-Dimensional Data Analysis. In: Proceedings of 29th Very Large Data Bases Conference, pp. 440–451 (2003)Google Scholar
  6. 6.
    Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Clustering Approach for Multi-Dimensional Data. IEEE Transaction on Knowledge and Data Engineering 17, 1389–1403 (2005)CrossRefGoogle Scholar
  7. 7.
    Lin, K., Lin, C.Y., Huang, C., Chang, H., Yang, C., Lin, C.T., Tang, C., Hsu, D.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transaction on NanoBioscience 6, 186–196 (2007)CrossRefGoogle Scholar
  8. 8.
    Mhamdi, F., Rakotomalala, R., Elloumi, M.: Feature Ranking for Protein Classification. Computer Recognition Systems 30, 611–617 (2005)CrossRefGoogle Scholar
  9. 9.
    Rakotomalala, R., Mhamdi, F., Elloumi, M.: Hybrid Feature Ranking for Proteins Classification. Advanced Data Mining and Applications 3584, 610–617 (2005)CrossRefGoogle Scholar
  10. 10.
    Lin, C., Lin, K., Huang, C., Chang, H., Yang, C., Lin, C., Tang, C., Hsu, D.: Feature Selection and Combination Criteria for Improving Predictive Accuracy in Protein Structure Classification. In: Proceedings of 5th IEEE Symposium on Bioinformatics and Bioengineering, pp. 311–315 (2005)Google Scholar
  11. 11.
    Shi, Y., Song, Y., Zhang, A.: A Shrinking-Based Dimension Reduction Approach for Multi-Dimensional Data Analysis. In: Proceedings of 16th International Conference on Scientific and Statistical Database Management, Greece, pp. 427–428 (2004)Google Scholar
  12. 12.
    Kundu, S.: Gravitational Clustering: A New Approach Based on the Spatial Distribution of the Points. Pattern Recognition 32, 1149–1160 (1999)CrossRefGoogle Scholar
  13. 13.
    Ravi, T., Gowda, K.: Clustering of Symbolic Objects using Gravitational Approach. IEEE Transactions on Systems, Man, and Cybernetics –Part B: Cybernetics 29, 888–894 (1999)CrossRefGoogle Scholar
  14. 14.
    Gomez, J., Dasgupta, D., Nasraoui, O.: A New Gravitational Clustering Algorithm. In: Proceedings of 3rd SIAM International Conference on Data Mining, San Francisco (2003)Google Scholar
  15. 15.
    Georgescu, B., Shimshoni, I., Meer, P.: Mean Shift Based Clustering in High Dimensions: A Texture Classification Example. In: Proceedings of 9th IEEE International Conference on Computer Vision, vol. 1, pp. 456–464 (2003)Google Scholar
  16. 16.
    Wang, X., Qiu, W., Zamar, R.: CLUES: A Non-Parametric Clustering Method Based on Local Shrinking. Computational Statistics & Data Analysis 52, 286–298 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Duch, W., Wieczorek, T., Biesiada, J., Blachnik, M.: Comparison of Feature Ranking Methods Based on Information Entropy. In: Proceedings of IEEE International Joint Conference on Neural Networks, Budapest, pp. 1415–1419 (2004)Google Scholar
  18. 18.
    Kira, K., Rendell, L.: A Practical Approach to Feature Selection. In: Proceedings of 9th International Workshop on Machine Learning, pp. 249–256 (1992)Google Scholar
  19. 19.
    Liu, H., Setiono, R.: Chi2: Feature Selection and Descretization of Numeric Attributes. In: Proceedings of 7th International Conference on Tools with Artificial Intelligence, pp. 388–391 (1995)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Sumeet Dua
    • 1
  • Sheetal Saini
    • 1
  1. 1.Department of Computer ScienceLouisiana Tech UniversityRustonUSA

Personalised recommendations