Abstract
Accurate particle identification is an ongoing task in the European organization for nuclear research, known as CERN where the challenge remains that targeted particles/events represent tiny minorities in front of the overwhelming presence of common particles such as protons. This paper presents a directed undersampling using an active learning method named DUAL to handle the high imbalance problem present in the particle identification dataset. The proposed approach was used to reduce the training set size while maintaining classifiers’ performance. Compared against various imbalance learning approaches, the experimental results show that using DUAL as a data reduction technique with a random forest classifier enhances classification performance in terms of Macro-\(F_1\) score and decreases the training time needed to train the models, which is very relevant while dealing with large-scale datasets. Despite being experimented only with particle identification dataset, we believe that DUAL could be adopted as a generic method for multi-class imbalanced classification problems with big data scale difficulties.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Extreme Gradient Boosting (XGB), Ada Boost(ADB), Gradient Boosting (GB).
- 3.
- 4.
References
M. Abbas, A. Khan, A.S. Qureshi, M.W., Khan, Extracting signals of higgs boson from background noise using deep neural networks. arXiv preprint arXiv:2010.08201 (2020)
B. Abelev, J. Adam, D. Adamová, M. Aggarwal, G.A. Rinella, M. Agnello, A. Agostinelli, N. Agrawal, Z. Ahammed, N. Ahmad et al., Alice collaboration. Nucl. Phys. A 931, 1211–1221 (2014)
S. Agostinelli, J. Allison, K.A. Amako, J. Apostolakis, H. Araujo, P. Arce, M. Asai, D. Axen, S. Banerjee, G. Barrand, et al., Geant4-a simulation toolkit. Nuclear instruments and methods in physics research section A. Acceler., Spectrom., Detect. Assoc. Equipm. 506(3), 250–303 (2003)
R. Alejo, J.M. Sotoca, R.M. Valdovinos, P. Toribio, Edited nearest neighbor rule for improving neural networks classifications, in International Symposium on Neural Networks (Springer, 2010), pp. 303–310
P. Branco, L. Torgo, R.P. Ribeiro, A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. (CSUR) 49(2), 1–50 (2016)
R. Brun, L. Urban, F. Carminati, S. Giani, M. Maire, A. McPherson, F. Bruyant, G. Patrick, Geant: detector description and simulation tool. Technical report, CERN (1993)
B.A. Bucklin, N.L. Asdigian, J.L. Hawkins, U. Klein, Making it stick: use of active learning strategies in continuing medical education. BMC Med. Educ. 21(1), 1–9 (2021)
F. Carminati, G. Khattak, M. Pierini, S. Vallecorsafa, A. Farbin, B. Hooberman, W. Wei, M. Zhang, B. Pacela, M.S. Vitorial, et al., Calorimetry with deep learning: particle classification, energy regression, and simulation for high-energy physics, in Workshop on Deep Learning for Physical Sciences (DLPS 2017), NIPS (2017)
N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
S. Dasgupta, J. Langford, A tutorial on active learning, in Proceedings of ICML (2009)
L. Evans, The large hadron collider. New J. Phys. 9(9), 335 (2007)
Z. Farou, N. Mouhoub, T. Horváth, Data generation using gene expression generator, in International Conference on Intelligent Data Engineering and Automated Learning (Springer, 2020), pp. 54–65
A. Fernández, S. García, M. Galar, R.C. Prati, B. Krawczyk, F. Herrera, Learning From Imbalanced Data Sets, vol. 11 (Springer, 2018)
S. Gopal, Y. Yang, Recursive regularization for large-scale classification with hierarchical and graphical dependencies, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2013), pp. 257–265
S. Haghighi, M. Jasemi, S. Hessabi, A. Zolanvari, Pycm: multiclass confusion matrix library in python. J. Open Sour. Softw. 3(25), 729 (2018)
M. Heide, A. Wilk, Particle identification with the transition radiation detector in alice. Verhandlungen der Deutschen Physikalischen Gesellschaft (2010)
D.H. Perkins, D.H., Perkins, Introduction to High Energy Physics (Cambridge University Press, Cambridge, 2000)
M.M. Rahman, D.N. Davis, Addressing the class imbalance problem in medical datasets. Int. J. Mach. Learn. Comput. 3(2), 224 (2013)
P.J. Sadowski, D. Whiteson, P. Baldi, Searching for higgs boson decay modes with deep learning. Adv. Neural. Inf. Process. Syst. 27, 2393–2401 (2014)
T. Sandhan, J.Y. Choi, Handling imbalanced datasets by partially guided hybrid sampling for pattern recognition, in 2014 22nd International Conference on Pattern Recognition (IEEE, 2014), pp. 1449–1453
W.C. Sleeman IV., B. Krawczyk, Multi-class imbalanced big data classification on spark. Knowl.-Based Syst. 212, 106598 (2021)
N.A. Verdikha, T.B. Adji, A.E. Permanasari, Study of undersampling method: instance hardness threshold with various estimators for hate speech classification. IJITEE (Int. J. Inf. Technol. Electr. Eng.) 2(2), 39–44 (2018)
C.G. Viljoen, Machine learning for particle identification and deep generative models towards fast simulations for the Alice Transition Radiation Detector at CERN. Master’s thesis, Faculty of Science (2019)
P. Vuttipittayamongkol, E. Elyan, Overlap-based undersampling method for classification of imbalanced medical datasets, in IFIP International Conference on Artificial Intelligence Applications and Innovations (Springer, 2020), pp. 358–369
X. Wang, B. Liu, S. Cao, L. Jing, J. Yu, Important sampling based active learning for imbalance classification. Sci. China Inf. Sci. 63(8), 1–14 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Farou, Z., Ouaari, S., Domian, B., Horváth, T. (2022). Directed Undersampling Using Active Learning for Particle Identification. In: Singh, P.K., Singh, Y., Chhabra, J.K., Illés, Z., Verma, C. (eds) Recent Innovations in Computing. Lecture Notes in Electrical Engineering, vol 855. Springer, Singapore. https://doi.org/10.1007/978-981-16-8892-8_12
Download citation
DOI: https://doi.org/10.1007/978-981-16-8892-8_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8891-1
Online ISBN: 978-981-16-8892-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)