Skip to main content

Application of Machine and Deep Learning Methods to the Analysis of IACTs Data

  • Chapter
  • First Online:
Intelligent Astrophysics

Part of the book series: Emergence, Complexity and Computation ((ECC,volume 39))

Abstract

The Imaging Atmospheric Cherenkov technique opened a previously inaccessible window for the study of astrophysical sources of radiation in the very high-energy regime (TeV) and is playing a significant role in the discovery and characterization of very high-energy gamma-ray emitters. However, the data collected by Imaging Atmospheric Cherenkov Telescopes (IACTs) are highly dominated, even for the most powerful sources, by the overwhelming background due to cosmic-ray nuclei and cosmic-ray electrons. For this reason, the analysis of IACTs data demands a highly efficient background rejection technique able to discriminate gamma-ray induced signal. On the other hand, the analysis of ring images produced by muons in an IACT provides a powerful and precise method to calibrate the overall optical throughput and monitor the telescope optical point-spread function. A robust muon tagger to collect large and highly pure samples of muon events is therefore required for calibration purposes. Gamma/hadron discrimination and muon tagging through Machine and Deep Learning techniques are the main topics of the present work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, J., Du, Q., Sun, C.: An improved box-counting method for image fractal dimension estimation. Pattern Recognit. 42(11), 2460–2469 (2009)

    Article  Google Scholar 

  2. Pagliaro, A., D’Anna, F, D’Alí Staiti, G.: A multiscale, lacunarity and neural network method for \(\gamma \)/h discrimination in extensive air showers. In: Proceedings of the 32nd International Cosmic Ray Conference (2011)

    Google Scholar 

  3. Huang, Z., Leng, J.: Analysis of Hu’s moment invariants on image scaling and rotation. In: Proceedings of 2nd International Conference on Computer Engineering and Technology (2010)

    Google Scholar 

  4. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  5. Weiss, K., Khoshgoftaar, T., M., Wang, D.: A survey of transfer learning. J. Big Data 3(1), 9 (2016). SpringerOpen

    Google Scholar 

  6. Jin, J., Dundar, A., Culurciello, E.: Flattened convolutional neural networks for feedforward acceleration (2014). arXiv:1412.5474

  7. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  8. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

    Google Scholar 

  9. Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., and Wang, W., Weyand, T., and Andretto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications (2017). arXiv:1704.04861

  10. Iandola, F. N., Song, H., Moskewicz, M. W., Khalid, A., Dally, W. J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size (2016). arXiv:1602.07360

  11. Russakovsky, O., Deng J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252. https://doi.org/10.1007/s11263-015-0816-y (2015)

  12. Wang, J., Perez, L.: The effectiveness of data augmentation in image classification using deep learning. In: Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  13. Srivastava, N., Salakhutdinov, R.R.: Discriminative transfer learning with tree-based priors. In: Advances in Neural Information Processing Systems, pp. 2094–2102 (2013)

    Google Scholar 

  14. Agarwal, R., Diaz, O., Llad’o, X., Yap Moi, H., Mart’i, R.:Automatic mass detection in mammograms using deep convolutional neural networks. J. Med. Imaging 6(3), 031409 (2019)

    Google Scholar 

  15. LeCun, Y., Boser, B., Denker, J.S. et al.: Handwritten digit recogntion with a back-propagation network. In: Advances in Neural Information Processing Systems (1990)

    Google Scholar 

  16. LeCun, Y., Bottou, L., Bengio, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  17. Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: NIPS (2014)

    Google Scholar 

  18. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural networks. In: NIPS (2015)

    Google Scholar 

  19. Sharma, M., Nayak, J., Koul, M. K., Bose, S., Mitra, A.:Gamma/hadron segregation for a ground based imaging atmospheric Cherenkov telescope using machine learning methods: random Forest leads. Res. Astron. Astrophys. 14(11), 1491 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Pagliaro .

Editor information

Editors and Affiliations

Appendices

Appendices: Deep Learning architectures

A. Flattened Network

The Flattened Network [6] consists of a consecutive sequence of one-dimensional filters across all directions. The architecture of the network is simple and light-weight, and we use it in order to test whether such a simple network is affected by overfitting using a limited number of images. Models affected by overfitting usually make predictions that fit the data at hand perfectly, but are not able to generalize knowledge from larger datasets. This generally happens when the system does not discriminate information from bias or background noise embedded with data.

Each output channel requires a filter \(W \in \mathbb {R}^{C\times {X}\times {Y}}\) described as:

$$\begin{aligned} F_f(x,y)=I * W_f = \sum _{c=1}^{C} \sum _{x'=1}^{X} \sum _{y'=1}^{Y} I(c, x - x',y - y') W_f(c,x',y') \end{aligned}$$

where f is an index of output channel, \(I \in \mathbb {R}^{C\times {N}\times {M}\times {F}}\) is the input map, N and M are the spatial dimensions of the input. We assume the stride parameter to be one.

A rule of thumb to accelerate multi-dimensional convolution is to apply filter separation. To accomplish filter separation some constraints need to be considered. Under unit rank of the filter \(W_f\), the unit rank filter \(\widehat{W}_f\) can be separated into cross-products of three one-dimensional filters as follows:

$$\begin{aligned} \widehat{W}_f = \alpha _f\times \beta _f \times \gamma _f \end{aligned}$$

It is necessary to highlight that separability of filters is a strong condition. The rank of filter \(W_f\) is usually larger than one in practice. The problem mentioned above might affect the performance of the network over classification tasks. In our work we used Flattened Networks abiding by the condition that one or more convolutional layers are converted to a sequence of 1-dimensional convolutions.

Fig. 8
figure 8

MobileNets are based on Depthwise Convolutional Filters (b) and Pointwise Convolution (c), which make a noticeable parameter reduction over the architectures based on Standard Convolution Filters (a)

B. MobileNets

MobileNets [9] represent a class of efficient models based on streamlined architectures employing depth-wise separable convolutions to set-up lightweight deep neural networks. As an innovative solution Howard et al. [9] introduced two different global hyper-parameters as a trade-off between latency and accuracy of the network. MobileNets have different application domains, even though their ideal destination is to allow developers and computer scientists for testing and training CNNs over mobile devices and embedded vision applications. Depthwise separable filters represent the base on which MobileNets are built. Depthwise separable convolutions make use of factorization of convolutions into a depthwise and a \(1\times 1\) pointwise convolution. A standard convolution layer both filters and combines the input into a new set of outputs. The depthwise separable convolution splits this into two layers, the first one is devoted to filtering inputs while the second one is for the combination of inputs into a new series of outputs. The innovation of the proposed scheme in MobileNets is depicted as in Fig. 8 where standard Convolutional Filters are substituted with Depthwise Convolutional Filters and Pointwise Convolution allowing for a reduction of a great number of the architecture parameters. By zooming in Fig. 8 we notice that standard convolutions carry out a computational burden of:

$$\begin{aligned} D_k \cdot D_k \cdot M \cdot N \cdot D_f \cdot D_f \end{aligned}$$

where M represents the number of input channels, N is the number of output channels, \(D_k\cdot D_k\) represents the size of the kernel while \(D_f\cdot D_f\) is the feature map size. Depthwise convolution with one filter per input channel turns out to have a cost of:

$$\begin{aligned} D_k \cdot D_k \cdot M \cdot D_f \cdot D_f \end{aligned}$$

Depthwise convolution filters input channels but it does not combine the input into a new set of outputs. For this reason another layer is needed to combine the results of Depthwise Convolutions filtering. That is accomplished with a linear combination of Depthwise Convolution using a \(1 \times 1\) convolution. The latter one is called depthwise separable convolution whose computational cost is as it follows:

$$ D_k \cdot D_k \cdot M \cdot D_f \cdot D_f + M \cdot N \cdot M \cdot D_f \cdot D_f $$

As described in [9], a reduction in computation is achieved expressing convolution as a two-step process of filtering. By adopting \(3 \times 3\) Depthwise separable convolutions, MobileNet is able to achieve up to 9 times less the computation than standard convolution. In our work we are interested in assessing the performance of MobileNets in the topic of IACT. For a more-in-depth description of the overall architecture the reader is remanded to [9].

C. SqueezeNet

SqueezeNet [10] architecture aims to leverage the reduction of parameters to deliver proper levels of accuracy in classification tasks with a short latency time as well as MobileNets. As well as in MobileNets, Iandola et al. [10] aim to identify a model that has fewer parameters in such a way to carry out experiments with more efficient training, to have less overhead time in client-server Deep Learning-based applications and to rely upon available architecture in embedded deployment. Iandola et al. [10] focused their efforts on the so-called model compression which has recently arisen around the objective of compressing existing CNN models in a lossy way. Denton et al. [17] applied SVD (Singular Value Decomposition) to pre-trained CNN models, Han et al. [18] used a pruning algorithm over networks to compress model dimensions. When it was introduced, SqueezeNet represented an innovation in the field of model compression for CNN architecture because of the introduction of the so-called fire module out of which the architecture itself is built.

In greater detail, Iandola et al.[10] proposed SqueezeNet by leveraging three strategies. The first strategy consists of replacing \(3\times 3\) filters with \(1\times 1\) filters (\(1\times 1\) convolution filters have nine times fewer parameters than \(3\times 3\) convolution filters). The second strategy is to decrease the number of input channels to \(3\times 3\) filters. The third strategy consists of apply downsampling late in CNN to achieve larger feature maps as convolution layer output across the most of layers in the network. The first two strategies concern mainly decreases of parameters in CNN architectures, and the third one is focused on accuracy maximization with fewer parameters. A Fire module consists of a squeeze convolution layer sized \(1\times 1\) which feeds into an expand layer that is a combination of \(1\times 1\) and \(3\times 3\) convolution layers. In Fig. 9 a simplified scheme of this module is given. Furthermore, the module can be tuned up using three different hyper-parameters. In our work, we adopt the architecture of SqueezeNet with a standalone convolution layer, followed by a series of 8 Fire modules and a final convolution layer. Max pooling is performed along with the network while the stride parameter is set to 2. The reader who is interested in further and more detailed description is remanded to the reference paper [10].

Fig. 9
figure 9

SqueezeNet CNN architecture allows for a huge reduction of parameters, which is mainly based on the employment of squeeze (a) and expand (b) steps. The usage of \(1\times 1\) convolution filters make the CNN architecture more lightweight

D. GoogLeNet

GoogLeNet is crafted to be an efficient deep neural network for computer vision tasks and its performances over the context of classification have been widely assessed [8]. Szegedy et al.[8] proposed an architecture whose main hallmark is the improved utilization of computing resources in the network itself. The main idea behind the GoogLeNet architecture is to find out how sparse structures in a convolutional network can be approximated by dense components. The authors of GoogLeNet engineered a network using a layer-by-layer approach where high statistic correlation values of the last layer are used to group visual features (boundaries, edges, contours, motifs) in clusters. These clusters give rise to units of the next layer and, at the same time, are connected to the previous layer. Inception module is described in Fig. 10. The current version of Inception includes layers with filters sized \(1\times 1\), \(3\times 3\), \(5\times 5\). Furthermore, max-pooling filters are added in the architecture as successful elements in state-of-the-art CNNs. Each unit from the earlier layers corresponds to a region of the input image. As depicted in Fig. 10, the visual information coming out of the previous layer is conveyed onto the next one through both convolution and pooling filters which are combined using a filter concatenation. Inception modules are piled up on top of each other. The outputs of inception modules are statistically correlated with the corresponding layers of the network: features of higher levels are expected to decrease in spatial density when captured by higher layers. It is necessary to highlight that because of technical reasons, such as memory efficiency, Inception modules are added only at higher layers. The overall network counts in 22 layers when only layers with parameters are considered (no pooling layers are counted in). If we consider each block inside Inception modules the overall networks can count up to 100 layers (this number is affected by the infrastructure system set-up). Whoever interested in further insights on the architecture of GoogLeNet is remanded to the reference paper [8].

Fig. 10
figure 10

The Inception module allows for multiple outputs coming out of the previous layer to be combined using a filter concatenation

E. ResNet-50

He et al. [7] proposed Deep Residual Learning to generally ease the process of training of deep networks, which are likely to get through the problem of degradation. When the depth of networks increases accuracy gets saturated and then degrades rapidly. This kind of degradation is not caused by overfitting, and what sounds more surprisingly is that adding more layers makes the training accuracy even lower. He et al. [7] addressed the issue of training accuracy degradation with the introduction of Deep Learning Residual. Rather than considering Deep Learning models fitting a particular mapping between input and output, they approach the improvement of training accuracy from a different perspective. They let a stack of layers fit a residual mapping function. Denoting the original mapping function as H(x), the Deep Residual Learning is based on the position of letting the layer stack fit the residual function defined as down below.

$$\begin{aligned} F(x) := H(x) - x \end{aligned}$$
Fig. 11
figure 11

A building block of Residual Net

Following this approach, the original mapping problem is proposed in a new form as \(F(x) + x\). As a new formulation based on a simple summation, it can be realised using feedforward networks with shortcut connections as in Fig. 11. Shortcut connections simply add new layers with identity mapping. Their outputs are added to the outputs of the stack. Using shortcut connections is an excellent way to avoid new parameters because they are based on identity mapping. The intuition of Deep Residual Learning authors is that multiple nonlinear layers can asymptotically approximate complicated function representing the input of the training process. The reformulation of learning is meant to deal with the training accuracy degradation problem. He et al. named ResNet (Residual Network) after the Deep Residual Learning formulation. The standard version of ResNet is 34 parameter layer sized where shortcut connections turn the network structure into its residual counterpart. A more in-depth version of ResNet has also been proposed with a bottleneck building block whose each residual function F involves a number of three layers rather than two. In our work we conducted different experiments with ResNet-50, which counts a number of 50 parameter layers.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bruno, A., Pagliaro, A., La Parola, V. (2021). Application of Machine and Deep Learning Methods to the Analysis of IACTs Data. In: Zelinka, I., Brescia, M., Baron, D. (eds) Intelligent Astrophysics. Emergence, Complexity and Computation, vol 39. Springer, Cham. https://doi.org/10.1007/978-3-030-65867-0_5

Download citation

Publish with us

Policies and ethics