Abstract
The unsupervised search for overdense regions in high-dimensional feature spaces, where locally high population densities may be associated with anomalous contaminations to an otherwise more uniform population, is of relevance to applications ranging from fundamental research to industrial use cases. Motivated by the specific needs of searches for new phenomena in particle collisions, we propose a novel approach that targets signals of interest populating compact regions of the feature space. The method consists in a systematic scan of subspaces of a standardized copula of the feature space, where the minimum p-value of a hypothesis test of local uniformity is sought by greedy descent. We characterize the performance of the proposed algorithm and show its effectiveness in several experimental situations.
Article PDF
Similar content being viewed by others
References
CMS collaboration, The CMS Experiment at the CERN LHC, 2008 JINST 3 S08004 [INSPIRE].
ATLAS collaboration, The ATLAS Experiment at the CERN Large Hadron Collider, 2008 JINST 3 S08003 [INSPIRE].
S.L. Glashow, Partial Symmetries of Weak Interactions, Nucl. Phys. 22 (1961) 579 [INSPIRE].
S. Weinberg, A Model of Leptons, Phys. Rev. Lett. 19 (1967) 1264 [INSPIRE].
A. Salam, Weak and electromagnetic interactions, in Elementary Particle Physics: relativistic groups and analyticity, in Proceedings of the 8th Nobel symposium, N. Svartholm ed., p. 367 Almqvist & Wiskell (1968).
ALEPH, CDF, D0, DELPHI, L3, OPAL and SLD collaboration, LEP Electroweak Working Group, Tevatron Electroweak Working Group, SLD Electroweak Working Group, SLD Heavy Flavor Group, Precision Electroweak Measurements and Constraints on the Standard Model, CERN PH-EP-2010-095 [arXiv:0911.2604] [INSPIRE].
CMS collaboration, Search for contact interactions and large extra dimensions in the dilepton mass spectra from proton-proton collisions at \( \sqrt{s} \) = 13 TeV, JHEP 04 (2019) 114 [arXiv:1812.10443] [INSPIRE].
A. Sklar, Fonctions de répartition à n dimensions et leurs marges, Publ. Inst. Statist. Univ. Paris 8 (1959) 229.
R.E. Bellman, Dynamic programming, Princeton University Press (1957) [ISBN: 978-0-691-07951-6].
T.P. Li and Y.Q. Ma, Analysis methods for results in gamma-ray astronomy, Astrophys. J. 272 (1983) 317 [INSPIRE].
C.E. Bonferroni, Teoria statistica delle classi e calcolo delle probabilità, Pubblicazioni del Regio Istituto Superiore di Scienze Economiche e Commerciali di Firenze (1936).
W.H. Press, S.A. Teukolsky, W.T. Vetterling and B.P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, second edition, Cambridge University Press (1992) [ISBN: 0-521-43108-5].
P. Baldi, P. Sadowski and D. Whiteson, Searching for Exotic Particles in High-Energy Physics with Deep Learning, Nature Commun. 5 (2014) 4308 [arXiv:1402.4735] [INSPIRE].
P. Baldi, K. Cranmer, T. Faucett, P. Sadowski and D. Whiteson, Parameterized neural networks for high-energy physics, Eur. Phys. J. C 76 (2016) 235 [arXiv:1601.07913] [INSPIRE].
https://archive.ics.uci.edu/ml/datasets/MiniBooNE+particle+identification.
MiniBooNE collaboration, A Search for Electron Neutrino Appearance at the ∆m2 ~ 1 eV2 Scale, Phys. Rev. Lett. 98 (2007) 231801 [arXiv:0704.1500] [INSPIRE].
LSND collaboration, Candidate events in a search for νμ → νe oscillations, Phys. Rev. Lett. 75 (1995) 2650 [nucl-ex/9504002] [INSPIRE].
H.-J. Yang, B.P. Roe and J. Zhu, Studies of boosted decision trees for MiniBooNE particle identification, Nucl. Instrum. Meth. A 555 (2005) 370 [physics/0508045] [INSPIRE].
B.P. Roe, H.-J. Yang, J. Zhu, Y. Liu, I. Stancu and G. McGregor, Boosted decision trees, an alternative to artificial neural networks, Nucl. Instrum. Meth. A 543 (2005) 577 [physics/0408124] [INSPIRE].
D0 collaboration, A quasi-model-independent search for new high pT physics at DØ, Phys. Rev. Lett. 86 (2001) 3712 [hep-ex/0011071] [INSPIRE].
D0 collaboration, Search for new physics in eμX data at DO using SLEUTH: A quasi-model-independent search strategy for new physics, Phys. Rev. D 62 (2000) 092004 [hep-ex/0006011] [INSPIRE].
D0 collaboration, A Quasi model independent search for new physics at large transverse momentum, Phys. Rev. D 64 (2001) 012004 [hep-ex/0011067] [INSPIRE].
B. Nachman, Anomaly Detection for Physics Analysis and Less than Supervised Learning, arXiv:2010.14554 [INSPIRE].
A. Stakia et al., Advances in Multi-Variate Analysis Methods for New Physics Searches at the Large Hadron Collider, Rev. Phys. 7 (2021) 100063 [arXiv:2105.07530] [INSPIRE].
J.H. Collins, K. Howe and B. Nachman, Anomaly Detection for Resonant New Physics with Machine Learning, Phys. Rev. Lett. 121 (2018) 241803 [arXiv:1805.02664] [INSPIRE].
P. De Castro Manzano et al., Hemisphere Mixing: a Fully Data-Driven Model of QCD Multijet Backgrounds for LHC Searches, PoS EPS-HEP2017 (2017) 370 [arXiv:1712.02538] [INSPIRE].
T. Dorigo, Anomaly!: Collider Physics and the Quest for New Phenomena at Fermilab, World Scientific, Singapore (2017) [doi:10.1142/q0032] [INSPIRE].
K. Staley, The evidence for the top quark: objectivity and bias in collaborative experimentation, Cambridge University Press (2004) [ISBN: 9780521827102].
E.M. Metodiev, B. Nachman and J. Thaler, Classification without labels: Learning from mixed samples in high energy physics, JHEP 10 (2017) 174 [arXiv:1708.02949] [INSPIRE].
J.H. Collins, K. Howe and B. Nachman, Extending the search for new resonances with machine learning, Phys. Rev. D 99 (2019) 014038 [arXiv:1902.02634] [INSPIRE].
G. Choudalakis, On hypothesis testing, trials factor, hypertests and the BumpHunter, in PHYSTAT 2011, (2011) [arXiv:1101.0390] [INSPIRE].
B. Nachman and D. Shih, Anomaly Detection with Density Estimation, Phys. Rev. D 101 (2020) 075042 [arXiv:2001.04990] [INSPIRE].
R.T. D’Agnolo and A. Wulzer, Learning New Physics from a Machine, Phys. Rev. D 99 (2019) 015014 [arXiv:1806.02350] [INSPIRE].
M. Farina, Y. Nakai and D. Shih, Searching for New Physics with Deep Autoencoders, Phys. Rev. D 101 (2020) 075021 [arXiv:1808.08992] [INSPIRE].
T. Heimel, G. Kasieczka, T. Plehn and J.M. Thompson, QCD or What?, SciPost Phys. 6 (2019) 030 [arXiv:1808.08979] [INSPIRE].
T.S. Roy and A.H. Vijay, A robust anomaly finder based on autoencoders, arXiv:1903.02032 [INSPIRE].
A. Blance, M. Spannowsky and P. Waite, Adversarially-trained autoencoders for robust unsupervised new physics searches, JHEP 10 (2019) 047 [arXiv:1905.10384] [INSPIRE].
O. Knapp, O. Cerri, G. Dissertori, T.Q. Nguyen, M. Pierini and J.-R. Vlimant, Adversarially Learned Anomaly Detection on CMS Open Data: re-discovering the top quark, Eur. Phys. J. Plus 136 (2021) 236 [arXiv:2005.01598] [INSPIRE].
O. Atkinson, A. Bhardwaj, C. Englert, V.S. Ngairangbam and M. Spannowsky, Anomaly detection with convolutional Graph Neural Networks, JHEP 08 (2021) 080 [arXiv:2105.07988] [INSPIRE].
B. Nachman, https://github.com/iml-wg/HEPML-LivingReview.
G. Kasieczka et al., The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics, Rept. Prog. Phys. 84 (2021) 124201 [arXiv:2101.08320] [INSPIRE].
N. Abe, B. Zadrozny and J. Langford, Outlier detection by active learning, in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 504–509, ACM Press (2006).
S.D. Bay and M. Schwabacher, Mining distance-based outliers in near linear time with randomization and a simple pruning rule, in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 29–38, ACM Press (2003).
V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley & Sons, Chichester (1994).
R.J. Bolton and D.J. Hand, Statistical Fraud Detection: A Review, Statist. Sci. 17 (2002) 235.
M.M. Breunig, H.-P. Kriegel, R.T. Ng and J. Sander, LOF: identifying density-based local outliers, ACM SIGMOD Record 29 (2000) 93.
Z. He, X. Xu and S. Deng, Discovering cluster-based local outliers, Pattern Recogn. Lett. 24 (2003) 1641.
E.M. Knorr and R.T. Ng, Algorithms for mining distancebased outliers in large datasets, in VLDB ’98: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 392–403, San Francisco, CA, U.S.A. (1998), Morgan Kaufmann.
F.T. Liu, K.M. Ting and Z. Zhou, Isolation Forest, in 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422 (2008) [doi:10.1109/ICDM.2008.17].
P.J. Rousseeuw and M.Hubert, Anomaly detection by robust statistics, WIREs Data Mining Knowl. Discov. 8 (2018) e1236.
P.J. Rousseeuw and A.M. Leroy, Robust Regression and Outlier Detection, Wiley-Interscience (1987).
P.J. Rousseeuw and K.V. Driessen, A fast algorithm for the minimum covariance determinant estimator, Technometrics 41 (1999) 212.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
ArXiv ePrint: 2106.05747
Rights and permissions
Open Access . This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Dorigo, T., Fumanelli, M., Maccani, C. et al. RanBox: anomaly detection in the copula space. J. High Energ. Phys. 2023, 8 (2023). https://doi.org/10.1007/JHEP01(2023)008
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/JHEP01(2023)008