Skip to main content

Unsupervised Feature Selection Methodology for Analysis of Bacterial Taxonomy Profiles

  • Conference paper
  • First Online:
Pattern Recognition (MCPR 2021)

Abstract

Unsupervised Feature Selection is an area of research that currently has received much attention in the scientific community due to its wide application in practical problems where unlabeled data arise. One of these problems is profiling the structure of bacterial communities in the oceans, where it is required to identify and select relevant features from unlabeled marine sediment samples. This paper introduces a methodology to identify and select a set of relevant features in this field. To select a subset of relevant features, we rely on a synergy between ranking-based unsupervised feature selection methods, an introduced internal validation index, and a clustering algorithm. According to the results obtained in our analyses, the proposed methodology can select those features that best discover cluster structures in this kind of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A feature is consistent if it takes similar values for the objects that are close to each other and dissimilar values for far apart objects.

  2. 2.

    Maximum or minimum, depending on the internal evaluation index used.

  3. 3.

    The value computed by these indices increases or decreases monotonically regarding the number of features.

References

  1. Godoy-Lozano, E.E., et al.: Bacterial diversity and the geochemical landscape in the southwestern Gulf of Mexico. Front. Microbiol. 9, 2528 (2018)

    Article  Google Scholar 

  2. Wang, Y., et al.: Comparison of the levels of bacterial diversity in freshwater, intertidal wetland, and marine sediments by using millions of illumina tags. Appl. Environ. Microbiol. 78(23), 8264–8271 (12 2012)

    Google Scholar 

  3. Dy, J.G., Brodley, C.E.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)

    MathSciNet  MATH  Google Scholar 

  4. Solorio-Fernández, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F.: A review of unsupervised feature selection methods. Artif. Intell. Rev. 53(2), 907–948 (2020)

    Google Scholar 

  5. Dash, M., Liu, H., Yao, J.: Dimensionality reduction of unsupervised data. In: Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence, pp. 532–539. IEEE Computer Society (1997)

    Google Scholar 

  6. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems 18, vol. 186, pp. 507–514 (2005)

    Google Scholar 

  7. Chung, F.R.K.: Spectral Graph Theory. Reprinted edn, vol. 92. American Mathematical Soc. (1997)

    Google Scholar 

  8. Varshavsky, R., Gottlieb, A., Linial, M., Horn, D.: Novel unsupervised feature filtering of biological data. Bioinformatics 22(14), e507–e513 (2006)

    Article  Google Scholar 

  9. Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1151–1157. ACM (2007)

    Google Scholar 

  10. Solorio-Fernández, S., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: A new unsupervised spectral feature selection method for mixed data: a filter approach. Pattern Recogn. 72, 314–326 (2017)

    Article  Google Scholar 

  11. Zhao, Z.A., Liu, H.: Spectral Feature Selection for Data Mining. CRC Press (2011)

    Google Scholar 

  12. Yang, Y., Shen, H.T., Ma, Z., Huang, Z., Zhou, X.: L2,1-Norm regularized discriminative feature selection for unsupervised learning. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 1589–1594 (2011)

    Google Scholar 

  13. Fukunaga, K.: Introduction to Statistical Pattern Recognition, vol. 22. Academic Press (1990)

    Google Scholar 

  14. Li, Z., Yang, Y., Liu, J., Zhou, X., Lu, H.: Unsupervised feature selection using nonnegative spectral analysis. Proc. Natl. Conf. Artif. Intell. 2, 1026–1032 (2012)

    Google Scholar 

  15. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2002)

    Google Scholar 

  16. Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)

    Article  Google Scholar 

  17. Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl.-Based Syst. 118, 124–139 (2017)

    Article  Google Scholar 

  18. Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)

    Google Scholar 

  19. Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J.: Understanding of internal clustering validation measures. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 911–916. IEEE (2010)

    Google Scholar 

  20. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  21. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. - Theory Methods 3(1), 1–27 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  22. Morita, M., Sabourin, R., Bortolozzi, F., Suen, C.Y.: Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In: Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings, pp. 666–670. IEEE (2003)

    Google Scholar 

  23. Solorio-Fernández, S., Carrasco-Ochoa, J., Martínez-Trinidad, J.: A new hybrid filter–wrapper feature selection method for clustering based on ranking. Neurocomputing 214 (2016)

    Google Scholar 

  24. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

The first author gratefully acknowledges the Instituto Nacional de Astrofósica, Óptica y Electrínica (INAOE) for the collaboration grant awarded for developing this research. We also thank E. Ernestina Godoy-Lozano and collaborators to provide the data for the analysis presented in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saúl Solorio-Fernández .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Solorio-Fernández, S., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F. (2021). Unsupervised Feature Selection Methodology for Analysis of Bacterial Taxonomy Profiles. In: Roman-Rangel, E., Kuri-Morales, Á.F., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A., Olvera-López, J.A. (eds) Pattern Recognition. MCPR 2021. Lecture Notes in Computer Science(), vol 12725. Springer, Cham. https://doi.org/10.1007/978-3-030-77004-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-77004-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-77003-7

  • Online ISBN: 978-3-030-77004-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics