Skip to main content

A Scalable Feature Selection Algorithm for Large Datasets – Quick Branch & Bound Iterative (QBB-I)

  • Conference paper

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 27))

Abstract

Feature selection algorithms look to effectively and efficiently find an optimal subset of relevant features in the data. As the number of features and the data size increases, new methods of reducing the complexity while maintaining the goodness of the features selected are needed. We review popular feature selection algorithms such as the probabilistic search algorithm based Las Vegas Filter (LVF) and the complete search based Automatic Branch and Bound (ABB) that use the consistency measure. The hybrid Quick Branch and Bound (QBB) algorithm first runs LVF to find a smaller subset of valid features and then performs ABB with the reduced feature set. QBB is reasonably fast, robust and handles features which are interdependent, but does not work well with large data. In this paper, we propose an enhanced QBB algorithm called QBB Iterative (QBB-I).QBB-I partitions the dataset into two, and performs QBB on the first partition to find a possible feature subset. This feature subset is tested with the second partition using the consistency measure, and the inconsistent rows, if any, are added to the first partition and the process is repeated until we find the optimal feature set. Our tests with ASSISTments intelligent tutoring dataset using over 150,000 log data and other standard datasets show that QBB-I is significantly more efficient than QBB while selecting the same subset of features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. In: Proceedings of the 9th National Conference on Artificial Intelligence (1991)

    Google Scholar 

  2. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)

    MATH  Google Scholar 

  3. Cestnik, G., Konenenko, I., Bratko, I.: Assistant-86: A Knowledge- Elicitation Tool for Sophisticated Users. In: Progress in Machine Learning, pp. 31–45. Sigma Press (1987)

    Google Scholar 

  4. Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Batabases. University of California, Department of Information and Computer Science, Irvine (1996), http://www.ics.uci.edu/mlearn/MLRepository.html

    Google Scholar 

  5. Liu, H., Motoda, H., Dash, M.: A monotonic measure for Optimal Feature Selection. In: Proceedings of the European Conference on Machine Learning, pp. 101–106 (1998)

    Google Scholar 

  6. Liu, H., Setiono, R.: A probabilistic approach to feature selection: a Filter solution. In: Proceedings of the 13th International Conference on Machine Learning, pp. 319–327 (1996)

    Google Scholar 

  7. Liu, H., Setiono, R.: Scalable feature selection for large sized databases. In: Proceedings of the 4th World Congress on Expert System, p. 6875 (1998)

    Google Scholar 

  8. Hong, Z.Q., Yang, J.Y.: Optimal Discriminant Plane for a Small Number of Samples and Design Method of Classifier on the Plane. Pattern Recognition 24, 317–324 (1991)

    Article  MathSciNet  Google Scholar 

  9. Yu, H.-F., Lo, H.-Y., Hsieh, H.-P.: Feature Engineering and Classifier Ensemble for KDD Cup 2010. In: JMLR: Workshop and Conference Proceedings, vol. 1, pp. 1–16 (2010)

    Google Scholar 

  10. John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the Eleventh International Conference in Machine Learning (1994)

    Google Scholar 

  11. Kira, K., Rendell, L.: A practical approach to feature selection. In: Proceedings of the 9th International Conference on Machine Learning, pp. 249–256 (1992)

    Google Scholar 

  12. Koedinger, K., Baker, R., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J.: A data repository for the EDM community: the pslc datashop (2010)

    Google Scholar 

  13. Kohavi, K.: Wrappers for performance enhancement and oblivious decision graphs. Phd thesis, Stanford university (1995)

    Google Scholar 

  14. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(12), 273–324 (1996)

    Google Scholar 

  15. Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(1-4), 131–156 (1997)

    Article  Google Scholar 

  16. Dash, M., Liu, H.: Hybrid search of feature subsets. In: Lee, H.-Y., Motoda, H. (eds.) PRICAI 1998. LNCS, vol. 1531, pp. 238–249. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  17. Dash, M., Liu, H., Motoda, H.: Feature Selection Using Consistency Measure. In: Arikawa, S., Nakata, I. (eds.) DS 1999. LNCS (LNAI), vol. 1721, pp. 319–320. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  18. Kudo, M., Sklansky, J.: A Comparative Evaluation of medium and largescale Feature Selectors for Pattern Classifiers. In: Proceedings of the 1st International Workshop on Statistical Techniques in Pattern Recognition, pp. 91–96 (1997)

    Google Scholar 

  19. Molina, L.P., Belanche, L., Nebot, A.: Feature selection algorithms: a survey and experimental evaluation. Universitat Politcnica de catalunya. departament de llenguatges i sistemes informtics (2002)

    Google Scholar 

  20. Noordewier, M.O., Towell, G.G., Shavlik, J.W.: Training Knowledge-Based Neural Networks to Recognize Genes in DNA Sequences. In: Advances in Neural Information Processing Systems, vol. 3 (1991)

    Google Scholar 

  21. Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  22. Schlimmer, J.S.: Concept Acquisition Through Representational Adjustment (Technical Report 87-19). Doctoral disseration, Department of Information and Computer Science, University of California, Irvine (1987)

    Google Scholar 

  23. Nguyen, T.: A Group Activity Recommendation Scheme Based on Situation Distance and User Similarity Clustering. M. Thesis, Department of Computer Science KAIST (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prema Nedungadi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Nedungadi, P., Remya, M.S. (2014). A Scalable Feature Selection Algorithm for Large Datasets – Quick Branch & Bound Iterative (QBB-I). In: Kumar Kundu, M., Mohapatra, D., Konar, A., Chakraborty, A. (eds) Advanced Computing, Networking and Informatics- Volume 1. Smart Innovation, Systems and Technologies, vol 27. Springer, Cham. https://doi.org/10.1007/978-3-319-07353-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07353-8_15

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07352-1

  • Online ISBN: 978-3-319-07353-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics