Skip to main content

A Time Efficient Approach for Distributed Feature Selection Partitioning by Features

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9422))

Abstract

With the advent of high dimensionality, feature selection has become indispensable in real-world scenarios. However, most of the traditional methods only work in a centralized manner, which —ironically— increase the running time requirements when they are applied to this type of data. For this reason, we propose a distributed filter approach for vertically partitioned data. The idea is to split the data by features and apply a filter at each partition performing several rounds to obtain a final subset of features. Different than existing procedures to combine the partial outputs of the different partitions of data, we propose a merging process according to the theoretical complexity of these feature subsets instead of classification error. Experimental results tested in five datasets show that the running time decreases considerably. Moreover, regarding the classification accuracy, our approach was able to match, and in some cases even improve, the standard algorithms applied to the non-partitioned datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Technology agency for sciency and research. Kent ridge bio-medical dataset repository. http://datam.i2r.a-star.edu.sg/datasets/krbd/

  2. Ananthanarayana, V.S., Subramanian, D.K., Murty, M.N.: Scalable, distributed and dynamic mining of association rules. In: Prasanna, V.K., Vajapeyam, S., Valero, M. (eds.) HiPC 2000. LNCS, vol. 1970, p. 559. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  3. Bache, K., Linchman, M.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2013). http://archive.ics.uci.edu/ml/. Accessed January 2015

  4. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A distributed feature selection approach based on a complexity measure. In: International Work Conference on Artificial Neural Networks (2015, in press)

    Google Scholar 

  5. Bolón-Canedo, V., Sánchez-Maroño, N., Cerviño-Rabuñal, J.: Toward parallel feature selection from vertically partitioned data. In: European Symposium on Artificial Neural Networks, Computacional Intelligence and Machine Learning (2014)

    Google Scholar 

  6. Chan, P.K., Stolfo, S.J.: Toward parallel and distributed learning by meta-learning. In: AAAI workshop in Knowledge Discovery in Databases, pp. 227–240 (1993)

    Google Scholar 

  7. Das, K., Bhaduri, K., Kargupta, H.: A local asynchronous distributed privacy preserving feature selection algorithm for large peer-to-peer networks. Knowl. Inf. Syst. 24(3), 341–367 (2010)

    Article  Google Scholar 

  8. Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intel. 151(1), 155–176 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  9. de Haro García, A.: Scaling data mining algorithms. Application to instance and feature selection. Ph.D. thesis, Universidad de Granada (2011)

    Google Scholar 

  10. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  11. Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato (1999)

    Google Scholar 

  12. Hall, M.A., Smith, L.A.: Practical feature subset selection for machine learning. Comput. Sci. 98, 181–191 (1998)

    Google Scholar 

  13. Ho, T.K., Basu, M.: Data Complexity in Pattern Recognition. Springer, London (2006)

    MATH  Google Scholar 

  14. Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Patter Anal. Mach. Intel. 19(2), 153–158 (1997)

    Article  Google Scholar 

  15. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  16. McConnell, S., Skillicorn, D.B.: Building predictors from vertically partitioned data. In: Procededings of the 2004 Conference of the Centre for Advanced Studies on Collaborative Research, pp. 150–162. IBM Press (2004)

    Google Scholar 

  17. Skillicorn, D.B., McConell, S.M.: Distributed prediction from vertically partitioned data. J. Parallel Distrib. Comput. 68(1), 16–36 (2008)

    Article  MATH  Google Scholar 

  18. Tsoumakas, G., Vlahavas, I.: Distributed data mining of large classifier ensembles. In: Proceedings Companion Volume of the Second Hellenic Conference on Artificial Intelligence, pp. 249–256 (2002)

    Google Scholar 

  19. Vanderbilt University. Gene expression model selector. http://www.gems-system.org/

  20. Zhao, Z., Liu, H.: Searching for interacting features. IJCAI 7, 1156–1161 (2007)

    Google Scholar 

Download references

Acknowledgements

This research has been economically supported in part by the Ministerio de Economía y Competitividad of the Spanish Government through the research project TIN 2012-37954, partially funded by FEDER funds of the European Union; and by the Consellería de Industria of the Xunta de Galicia through the research project GRC2014/035. V. Bolón-Canedo acknowledges support of the Xunta de Galicia under postdoctoral Grant code ED481B 2014/164-0.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. Morán-Fernández .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Morán-Fernández, L., Bolón-Canedo, V., Alonso-Betanzos, A. (2015). A Time Efficient Approach for Distributed Feature Selection Partitioning by Features. In: Puerta, J., et al. Advances in Artificial Intelligence. CAEPIA 2015. Lecture Notes in Computer Science(), vol 9422. Springer, Cham. https://doi.org/10.1007/978-3-319-24598-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24598-0_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24597-3

  • Online ISBN: 978-3-319-24598-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics