Skip to main content

Directed Undersampling Using Active Learning for Particle Identification

  • Conference paper
  • First Online:
Recent Innovations in Computing

Abstract

Accurate particle identification is an ongoing task in the European organization for nuclear research, known as CERN where the challenge remains that targeted particles/events represent tiny minorities in front of the overwhelming presence of common particles such as protons. This paper presents a directed undersampling using an active learning method named DUAL to handle the high imbalance problem present in the particle identification dataset. The proposed approach was used to reduce the training set size while maintaining classifiers’ performance. Compared against various imbalance learning approaches, the experimental results show that using DUAL as a data reduction technique with a random forest classifier enhances classification performance in terms of Macro-\(F_1\) score and decreases the training time needed to train the models, which is very relevant while dealing with large-scale datasets. Despite being experimented only with particle identification dataset, we believe that DUAL could be adopted as a generic method for multi-class imbalanced classification problems with big data scale difficulties.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.kaggle.com/naharrison/particle-identification-from-detector-responses.

  2. 2.

    Extreme Gradient Boosting (XGB), Ada Boost(ADB), Gradient Boosting (GB).

  3. 3.

    https://scikit-learn.org/stable/supervised_learning.html.

  4. 4.

    https://scikit-learn.org/stable/modules/grid_search.html.

References

  1. M. Abbas, A. Khan, A.S. Qureshi, M.W., Khan, Extracting signals of higgs boson from background noise using deep neural networks. arXiv preprint arXiv:2010.08201 (2020)

  2. B. Abelev, J. Adam, D. Adamová, M. Aggarwal, G.A. Rinella, M. Agnello, A. Agostinelli, N. Agrawal, Z. Ahammed, N. Ahmad et al., Alice collaboration. Nucl. Phys. A 931, 1211–1221 (2014)

    Article  Google Scholar 

  3. S. Agostinelli, J. Allison, K.A. Amako, J. Apostolakis, H. Araujo, P. Arce, M. Asai, D. Axen, S. Banerjee, G. Barrand, et al., Geant4-a simulation toolkit. Nuclear instruments and methods in physics research section A. Acceler., Spectrom., Detect. Assoc. Equipm. 506(3), 250–303 (2003)

    Google Scholar 

  4. R. Alejo, J.M. Sotoca, R.M. Valdovinos, P. Toribio, Edited nearest neighbor rule for improving neural networks classifications, in International Symposium on Neural Networks (Springer, 2010), pp. 303–310

    Google Scholar 

  5. P. Branco, L. Torgo, R.P. Ribeiro, A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. (CSUR) 49(2), 1–50 (2016)

    Article  Google Scholar 

  6. R. Brun, L. Urban, F. Carminati, S. Giani, M. Maire, A. McPherson, F. Bruyant, G. Patrick, Geant: detector description and simulation tool. Technical report, CERN (1993)

    Google Scholar 

  7. B.A. Bucklin, N.L. Asdigian, J.L. Hawkins, U. Klein, Making it stick: use of active learning strategies in continuing medical education. BMC Med. Educ. 21(1), 1–9 (2021)

    Article  Google Scholar 

  8. F. Carminati, G. Khattak, M. Pierini, S. Vallecorsafa, A. Farbin, B. Hooberman, W. Wei, M. Zhang, B. Pacela, M.S. Vitorial, et al., Calorimetry with deep learning: particle classification, energy regression, and simulation for high-energy physics, in Workshop on Deep Learning for Physical Sciences (DLPS 2017), NIPS (2017)

    Google Scholar 

  9. N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  10. S. Dasgupta, J. Langford, A tutorial on active learning, in Proceedings of ICML (2009)

    Google Scholar 

  11. L. Evans, The large hadron collider. New J. Phys. 9(9), 335 (2007)

    Article  Google Scholar 

  12. Z. Farou, N. Mouhoub, T. Horváth, Data generation using gene expression generator, in International Conference on Intelligent Data Engineering and Automated Learning (Springer, 2020), pp. 54–65

    Google Scholar 

  13. A. Fernández, S. García, M. Galar, R.C. Prati, B. Krawczyk, F. Herrera, Learning From Imbalanced Data Sets, vol. 11 (Springer, 2018)

    Google Scholar 

  14. S. Gopal, Y. Yang, Recursive regularization for large-scale classification with hierarchical and graphical dependencies, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2013), pp. 257–265

    Google Scholar 

  15. S. Haghighi, M. Jasemi, S. Hessabi, A. Zolanvari, Pycm: multiclass confusion matrix library in python. J. Open Sour. Softw. 3(25), 729 (2018)

    Article  Google Scholar 

  16. M. Heide, A. Wilk, Particle identification with the transition radiation detector in alice. Verhandlungen der Deutschen Physikalischen Gesellschaft (2010)

    Google Scholar 

  17. D.H. Perkins, D.H., Perkins, Introduction to High Energy Physics (Cambridge University Press, Cambridge, 2000)

    Google Scholar 

  18. M.M. Rahman, D.N. Davis, Addressing the class imbalance problem in medical datasets. Int. J. Mach. Learn. Comput. 3(2), 224 (2013)

    Article  Google Scholar 

  19. P.J. Sadowski, D. Whiteson, P. Baldi, Searching for higgs boson decay modes with deep learning. Adv. Neural. Inf. Process. Syst. 27, 2393–2401 (2014)

    Google Scholar 

  20. T. Sandhan, J.Y. Choi, Handling imbalanced datasets by partially guided hybrid sampling for pattern recognition, in 2014 22nd International Conference on Pattern Recognition (IEEE, 2014), pp. 1449–1453

    Google Scholar 

  21. W.C. Sleeman IV., B. Krawczyk, Multi-class imbalanced big data classification on spark. Knowl.-Based Syst. 212, 106598 (2021)

    Article  Google Scholar 

  22. N.A. Verdikha, T.B. Adji, A.E. Permanasari, Study of undersampling method: instance hardness threshold with various estimators for hate speech classification. IJITEE (Int. J. Inf. Technol. Electr. Eng.) 2(2), 39–44 (2018)

    Google Scholar 

  23. C.G. Viljoen, Machine learning for particle identification and deep generative models towards fast simulations for the Alice Transition Radiation Detector at CERN. Master’s thesis, Faculty of Science (2019)

    Google Scholar 

  24. P. Vuttipittayamongkol, E. Elyan, Overlap-based undersampling method for classification of imbalanced medical datasets, in IFIP International Conference on Artificial Intelligence Applications and Innovations (Springer, 2020), pp. 358–369

    Google Scholar 

  25. X. Wang, B. Liu, S. Cao, L. Jing, J. Yu, Important sampling based active learning for imbalance classification. Sci. China Inf. Sci. 63(8), 1–14 (2020)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zakarya Farou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Farou, Z., Ouaari, S., Domian, B., Horváth, T. (2022). Directed Undersampling Using Active Learning for Particle Identification. In: Singh, P.K., Singh, Y., Chhabra, J.K., Illés, Z., Verma, C. (eds) Recent Innovations in Computing. Lecture Notes in Electrical Engineering, vol 855. Springer, Singapore. https://doi.org/10.1007/978-981-16-8892-8_12

Download citation

Publish with us

Policies and ethics