MLaaS4HEP: Machine Learning as a Service for HEP

Machine Learning (ML) will play a significant role in the success of the upcoming High-Luminosity LHC (HL-LHC) program at CERN. An unprecedented amount of data at the exascale will be collected by LHC experiments in the next decade, and this effort will require novel approaches to train and use ML models. In this paper, we discuss a Machine Learning as a Service pipeline for HEP (MLaaS4HEP) which provides three independent layers: a data streaming layer to read High-Energy Physics (HEP) data in their native ROOT data format; a data training layer to train ML models using distributed ROOT files; a data inference layer to serve predictions using pre-trained ML models via HTTP protocol. Such modular design opens up the possibility to train data at large scale by reading ROOT files from remote storage facilities, e.g., World-Wide LHC Computing Grid (WLCG) infrastructure, and feed the data to the user’s favorite ML framework. The inference layer implemented as TensorFlow as a Service (TFaaS) may provide an easy access to pre-trained ML models in existing infrastructure and applications inside or outside of the HEP domain. In particular, we demonstrate the usage of the MLaaS4HEP architecture for a physics use-case, namely, the tt¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t{\bar{t}}$$\end{document} Higgs analysis in CMS originally performed using custom made Ntuples. We provide details on the training of the ML model using distributed ROOT files, discuss the performance of the MLaaS and TFaaS approaches for the selected physics analysis, and compare the results with traditional methods.


Introduction
With the CERN LHC program underway, we started seeing an exponential acceleration of data growth in the HEP field.By the end of Run II, the CERN experiments were already operating in the Peta-Byte (PB) level, producing O(100)PB of data each year.The new HL-LHC program will extend further, to the Exa-Byte scale.The usage of ML in HEP is on the rise too.It has been successfully used in online and offline reconstruction programs, and there is a huge gain in applying it on detector simulation, object reconstruction, identification, MC generation, and beyond [1].One of the main obstacles of using ML frameworks and bringing Computer Science expertise in ML to HEP lies in the differences of data-formats used by ML practitioners and HEP users.In IT world ML relies on flat-format data representation, e.g.CSV or NumPy data formats, while in HEP the data are stored in tree-based data-structures used by ROOT [2] data-format.As was pointed out in the HEP ML Community White Paper [1], the usage of ROOT data-format outside of HEP practically does not exist, thus inducing an artificial gap between ML and HEP communities.Recent Kaggle challenges, e.g.ATLAS on the Higgs boson discovery [3] and the crossexperiment tracking ML challenge [4] relied on the CSV data-format for the input dataset to be presented to ML competitors.While, within the HEP community, these datasets are easily accessible, without any preprocessing or transformation in the ROOT data-format.To close this gap, we present in this paper a novel approach to use HEP ROOT data natively for training purposes, reading ROOT files from remote storages via XrootD [5], and presenting pre-trained models as a service accessible via HTTP protocol.Such Machine Learning as a Service modular design opens up a possibility to train ML models on PB-size datasets remotely accessible from the WLCG sites without requiring data transformations and data locality.

Related work and solutions
Machine Learning as a Service is a well-known concept in industry, and major IT companies offer such solutions to their customers.For example, Amazon ML, Microsoft Azure ML Studio, Google Prediction API and ML engine, and IBM Watson are prominent implementation of this concept, see [6].Usually, Machine Learning as a Service is used as an umbrella of various ML tasks such as data pre-processing, model training and evaluation, and inference through REST APIs.Even though they offer solid interfaces, most of the time these services are designed to cover standard usecases.For instance, data are expected to be fed in flat data formats.All data preprocessing operations are performed automatically, e.g.specific services identify categorical versus numerical fields, and select best methods to perform further data preprocessing.The model predictions are limited to well-established patterns, such as binary classifications, multi-class classifications, and regressions.Quite often, Machine Learning as a Service providers offer pre-defined models that can be used to cover standard use-cases, e.g.image classifications, etc.
In HEP, the usage of such services is quite limited though for several reasons.Among them, the HEP ROOT data-format cannot be used directly in any of these services, and the required pre-processing operations may be more complex than those offered by commercial service providers.For instance, the two HEP Kaggle challenges [3], [4] use custom HEP metrics for an evaluation procedure which is not available in outof-the-box industry solutions.In addition, ML workflows in both competitions are far from trivial, e.g. the pre-processing step required writing custom code to include event selection and perform additional HEP specific steps.Therefore, after rounds of evaluations, we found that out-of-the box commercial solutions most often are ineffective for HEP use-cases (cost-wise and functionality-wise).This might change in the future, as various initiatives, e.g.CERN OpenLab framework, continue to work in close cooperation with almost all aforementioned service providers.
At the same time, various R&D activities within HEP are underway.For example, the hls4ml project [7] targets ML inference on FPGAs, while the SonicCMS project [8] is designed as Services for Optimal Network Inference on Coprocessors.Both are designed and targeted to the optimization of the inference phase rather than the whole ML pipeline, i.e. from reading the data to training models and serving predictions.At the moment, the state of the art indicates there is no final product that can be used as Machine Learning as a Service in HEP.
The novelty of the proposed solution is threefold.Firstly, we are proposing to use HEP ROOT files directly, either using them locally or remotely, without requiring data transformation operations to convert them to a flat data format.Secondly, the training layer can use external third party ML frameworks, from well-established ML libraries like scikit-learn to Deep-Learning (DL) frameworks such as TensorFlow, Py-Torch, etc. Thirdly, the inference phase can be provided by HTTP RESTful APIs of TensorFlow as a Service (TFaaS) [9] or any other solutions.The latter does not require significant changes in existing HEP infrastructures, frameworks, and applications due to the usage of HTTP protocol between clients and TFaaS server(s).

MLaaS4HEP architecture
A typical ML workflow consists of several steps: acquire the data necessary for training, use a ML framework to train the model, and utilize the trained model for predictions.In our Machine Learning as a Service solution, MLaaS4HEP [10], this workflow can be abstracted as data streaming, data training, and inference phases.Each of these steps can be either tightly integrated into the application design, or composed and used individually.The choice is mostly driven by particular use cases.In HEP we can define these layers as following, see Fig.Even though the implementation of these layers can differ from one experiment to another (or other scientific domains/fields using ROOT files), it can be easily generalized and be part of the foundation for a generic Machine Learning as a Service framework.In the following sub-sections we will discuss individual layers and outline particular issues problems that should be addressed in their implementation.

Data Streaming Layer
The Data Streaming Layer is responsible for streaming data from local or remote data storages.Originally, reading ROOT files was mostly possible from C++ frameworks, but recent development of ROOT I/O now allows to easily access ROOT data locally from Python, and use XrootD protocol for remote file access.The main development was done in uproot [11] framework backed by the DIANA-HEP initiative [12].The uproot library uses NumPy [13] calls to rapidly cast data blocks in the ROOT file as NumPy arrays, and provides integration with the XrootD protocol.Among the implemented features it allows a partial reading of ROOT TBranches, non-flat TTrees, non TTrees histograms and more.It relies on data caching and parallel processing to achieve high throughput.In our benchmarks we were able to read HEP events at the level of ∼ O(10)kHz from local and from remote storages 1 .
In our implementation of Machine Learning as a Service, see Sect.3.4, this layer was composed as a Data Generator which is capable of reading either local or remote file(s) with a pre-defined size.The batch data size can be easily fine tuned based on the complexity of the event and available bandwidth.The output of the Data Generator was a NumPy array with flat and Jagged Array attributes, see next Section for further discussion.

Data Training Layer
This layer is required to encapsulate HEP data and present it into ML to be used by the application.The main obstacle here is the usage of non-flat representation of HEP data in ML frameworks.In particular, the ROOT data-format can be represented in so-called Jagged Arrays2 , see Fig. 2. The HEP tree-based data representation is optimized for data storage but it is not directly suitable for ML frameworks.Therefore a certain data transformation is required to feed tree-based data structures into ML framework as a flat data structure.We explored two possible transformations: a vector representation with padded values, see Fig. 3, and matrix representation into one of the multiple phase spaces, see Fig. 4.
The idea of the vector representation approach is to identify a dimensionality of the Jagged Array attributes in a vector via a one-time pass across the data, and the subsequent composition of the final vector with sufficient allocation for Jagged Array attribute values based on their dimensionality.If a certain event will have Jagged Array attribute shorter then its dimensionality padded values can be used.For instance, a physics event is composed of a set of particles.A priori we may not know how many particles would be created in an event, and therefore how much space we would require to allocate for particle attributes even though their attributes have a fixed size (e.g.particle momentum values can be represented by three numerical values p x , p y and p z ).However, knowing the distributions of the particles in all events of certain physics datasets allow us to choose the dimensionality of their Jagged Array attributes.For instance, we can run the MC process and identify how many electrons per even we may have.A maximum number of electrons in this distribution will represent a dimensionality for the corresponding Jagged Array attributes.Using these dimensionality numbers we can represent an event as a flat vector of a certain size.The allocated values of Jagged Array attributes will vary event by event where extra slots of Jagged Array attributes will be filled with pre-defined pad values, e.g.NaN3 .Additionally, the one time pass across a series of events can be used to determine the min, max, and mean values of Jagged Array attributes which can be later used for normalization purposes.
The matrix representation of Jagged Array, see Fig. 4, can use certain phase space if it is present in a dataset.For example, the spatial coordinates or attribute components are often part of HEP datasets, and therefore can be used for Jagged Array mappings.This approach can resolve the ambiguity of vector representation (in terms of dimensionality choice) but it has its own problem with the choice of granularity of a phase space matrix.For example, if the X-Y phase space (where X and Y refer to an arbitrary pair of attributes) will be used in matrix representation we do not know a cell size in this space.A choice of matrix granularity may introduce a collision problem with Jagged Array attribute values, e.g. if two particles have the same phase space values of the cell, i.e. two particles point into the same cell in X-Y space.Such ambiguity may be easily resolved either by reducing matrix granularity or adding other phase spaces, e.g. using matrices in X-Y, Y-Z and X-Z phase spaces and concatenate them together into a final vector.But such enhancement will increase the sparsity of the final matrix and therefore will require more computing resources at the training time.
In our prototype, discussed in Sect.3.4, we used vector representation with padded values and applied a two-pass procedure over the data.The first pass reads data streams and determines the dimensionality of Jagged Arrays along with min, max, and mean values used for normalization.The second pass reads and transforms data from the streaming layer to the underlying ML framework.
In Neural Network models it is natural to assign padded NaN values to zeros since they are used in the multiplication operations between input values and weight matrix elements.But knowledge of locations of padded values in vector representation approach may be valuable in certain circumstances.For instance,

Data Inference Layer
A choice of a data inference layer should be driven by the usage of the underlying technology, i.e.ML framework.It can be either tightly integrated with application frameworks (both CMS and ATLAS experiments followed this approach in their CMSSW-DNN [14] and LTNN [15] solutions) or it can be developed as a Service (aaS) solution.The former has the advantage of reducing latency of the inference step per processing event, but later can be easily generalized and become independent from the internal infrastructure.As such, it can be easily integrated into cloud platforms, be used as a repository of pre-trained models, and serve models across experiment boundaries.We decided to implement the latter solution via TensorFlow as a Service architecture [9].
We evaluated several ML frameworks and decided to use TensorFlow graphs [16] for the inference phase.The TF model represents a computational graph in a static form, i.e. the mathematical computations, graph edges and data flow are well-defined at run time.Reading TF model can be done in different programming languages due to support of APIs provided by TF library.Moreover, the TF graphs are very well optimized for GPUs and TPUs.We opted for the Go programming language Transform matrix form into vector branch vector w/ 0's phi eta Fig. 4 A matrix representation of Jagged Array into certain phase space, e.g.eta-phi to implement the inference part of MLaaS4HEP framework based on the following factors: the Go language natively supports concurrency via goroutines and channels; it is the language developed and used by Google and very well integrated with the TF library, it provides a final static executable which significantly simplifies its deployment on premises and to various (cloud) service providers.We also opted out in favor of REST interface where clients may upload their TF models to the TFaaS server and use it for their inference needs via the same interface.Both Python and C++ clients were developed on top of the REST APIs (end-points) and other clients can be easily developed thanks to HTTP protocol used by the TFaaS Go RESTful implementation.
We performed several benchmarks using the TFaaS server running on CentOS 7 Linux, 16 cores, 30GB of RAM.The benchmarks were done in two modes: using 1000 calls with 100 concurrent clients and 5000 calls with 200 concurrent clients.We tested both JSON and ProtoBuffer [17] data formats while sending and fetching the data to/from the TFaaS server.In both scenarios, we achieved a throughput of ∼ 500 req/sec.These numbers were obtained with serving mid-size pretrained model which consists of 1024x1024 hidden layers.
Even though a single TFaaS server may not be as efficient as an integrated solution, it can be easily horizontally scaled, e.g. using Kubernetes or other cluster solutions, and may provide the desired throughput for concurrent clients.It also decouples the application layer/framework from the inference phase which can be easily integrated into any existing infrastructure by using the HTTP protocol to TFaaS server for inference results.Also, the TFaaS can be used as a repository of the pre-trained model which can be easily shared across experiment boundaries or domains.For instance, the current implementation of TFaaS allows visual inspection of uploaded models, versioning, tagging, etc.A simple search engine can be put on top of TFaaS with little effort.For a full list of planned improvements see Sect. 5.

MLaaS4HEP: proof-of-concept prototype
We implemented data-streaming, data-training layers using Python programming language and put them into MLaaS4HEP repository [10].The data-trained layer was abstracted to support any kind of Python based ML frameworks, from TensorFlow to PyTorch and others 4 .
The data inference layer was implemented using Goprogramming language and kept separately in TFaaS repository [9].Both frameworks were released as Open-Source software.Moreover, the TFaaS middleware can be used outside of HEP to serve any kind of TF-based models uploaded to TFaaS service via HTTP protocol 5 .
When all layers of the MLaaS4HEP framework were developed, we successfully tested a working prototype of the system by using ROOT files accessible through XrootD servers.The data were read by 1000 event batches, where the single batch was approximately 4MB in size.Each batch was fed into both Tensor Flow (implemented via Keras framework) and PyTorch models.The Data Generator representing the data streaming layer yields a vector representation of Jagged Array ROOT data structures along with mask vector representing positions of padded values, see Fig. 5, into corresponding model.This was done to avoid misinterpretation of real values of attributes from padded values.This mask vector was used in both models to cast NaN values to zeros.We tested this prototype on a local machine as well as successfully deploy it on the GPU node.The trained ML model was later uploaded into TFaaS server, and its functionality was tested using Python, C++ and curl HTTP based clients.The further details of this proof-of-concept prototype can be found in the MLaaS4HEP [10] and TFaaS [9] GitHub repositories, respectively.

Real case scenario
In order to validate the MLaaS4HEP framework, we decided to test the infrastructure on real physics use-case.This allows us to test performances of MLaaS4HEP framework, and validate its results from the physics point of view.Since we were not constrained by the choice of the physics use-case we decided to use the t t Higgs analysis (t tH(bb)) in the boosted, all-hadronic final states.

The description of physics analysis
The Higgs boson is considered the most relevant discovery of the last few years in High Energy Physics.After almost fifty years from its prediction, it was discovered by the ATLAS and CMS collaborations in 2012 at the CERN Large-Hadron Collider (LHC) [18,19].Since then, different analyses have been performed in order to measure its properties with high precision.
In the Standard Model framework, the Higgs boson is predicted to couple with fermions via Yukawa-like interaction, where its coupling is proportional to the fermion mass.The heaviest top quark is responsible for coupling to the Higgs boson.A direct measurements of the top-Higgs coupling exploit tree-level processes.The t tH production, see Fig. 6, plays an important role to exclude beyond standard model contributions.Its alljets decay channel is the one with the highest branching ratio (≈25%).The W bosons produced by the t t pair decay into a pair of light quarks while the Higgs boson decays to a b b pair.In the final state there are at least eight partons (more might arise from the initial and final state radiation) where four of them are bottom (b) quarks.Despite the highest branching ratio, the all-jets final state is very challenging.It is dominated by the large QCD multi-jet production at LHC, and there are large uncertainties in this channel due to the presence of many jets.At the same time, it represents the unique possibility to fully reconstruct the t tH due to lack of missing energy in this channel.
At the 13 TeV centre-of-mass energy, top quarks with a very high p T and Higgs bosons can be produced.If their Lorentz boost is sufficiently high, their decay products are very collimated into a single, wide jet, named boosted jet.In particular we are interested on the t tH analysis with boosted all-jets final state, where at least one of the jets of the final state is a boosted jet, and where the Higgs boson decays in a pair of well resolved jets.
To validate MLaaS4HEP framework and its applicability to HEP we perform two steps.In the first part, see Sect.4.2, we compare ML models produced by MLaaS4HEP framework and compare them those produced by traditional analysis based on ROOT and TMVA [20] frameworks for resolved-Higgs analysis.
In the second part, see Sect.4.3, we tested scalability of MLaaS4HEP framework using all available data before applying any cuts.
The approach used by the analysts in the resolved-Higgs analysis was to train a Boosted Decision Tree (BDT) using TMVA in order to identify t tH events containing a resolved-Higgs decay.Training events are taken from Monte Carlo simulation and are selected among the t tH sample and the two dominant background sample, namely QCD and t t.Both the signal and the background events are required to have some constraints, such as to have at least a boosted jet, to contain no leptons, to pass the signal trigger, etc.This selection is aimed to select boosted, all-jets-like events.The t tH events with these constraints and with the resolved Higgs-boson matching to the system of two b-tagged jets, are considered as signal events.On the contrary, unmatched t tH events, and all QCD and t t events passing the aforementioned selection, are considered as background events.For validating MLaaS4HEP functionality we used a set of ROOT files obtained for the specific physics analysis discussed in Sect.4.1.Initially, we performed the resolved-Higgs analysis using 8 ROOT files containing background events, and 1 file containing signal events.Each file has 27 branches, with 350 hundreds events for the whole pool of file and a total size of about 28 MB.The ratio between the number of signal events and background events is approximately 10.8%.
We decided to use a generic ML model and compare the results obtained inside and outside MLaaS4HEP.For our goal it was sufficient to demonstrate that model trained within MLaaS4HEP framework is comparable with similar one produced by traditional analysis based on pre-defined set of metrics.In particular, we used a Keras sequential Neural Network (with two hidden layers made by 128 and 64 neurons, and with dropout regularization between layers) and we trained it (for 5 epochs and with a batch size of 100 events) on an anal-ysis dataset.We used 64% for training, 16% for validation and 20% of data for test purposes.We explored the following approaches: use MLaaS4HEP to read and normalize events, and to train the ML model; use MLaaS4HEP to read and normalize events, and use a jupyter notebook to perform the training of the ML model outside MLaaS4HEP; use a jupyter notebook to perform the entire pipeline without using MLaaS4HEP.The results of this exercise are shown in Fig. 7, and show little or no difference among different approaches.
In order to properly train any ML model we need the ability to read data in chunks and shuffle them accordingly in each batch of training.Therefore, we adjusted the MLaaS4HEP codebase to provide this functionality, as follows.The user specifies a chunk size and MLaaS4HEP ensures that each chunk will have the same proportion of signal and background events presented in ROOT files.Figs 8a, 8b, and 8c, show the computed loss, accuracy and AUC metrics, respectively, after 5 epochs of training using the same structure of the previous model for the chunk size of 10 thousands events.
We observed that while the accuracy and AUC go up, the losses go down with the number of chunks used for fitting the model, indicating that the ML model is actually learning.We also observe that these trends are not smooth, namely, we see sawtooth shape patterns.We investigated these behaviours by dropping one by one ROOT files from the pool, and we found that a particular ROOT file with ttH noDRmatch background is responsible alone for this effect.In Figs 8d, 8e, 8f we show that when we use all the files except the aforementioned one the loss metric goes rapidly to 0, while the accuracy and the AUC climb up to 1 respectively.When we used only the ttH noDRmatch file as a background file the performance is lower: for instance, the Fig. 7 Comparison of the metrics score (loss, accuracy and AUC) for the training, validation and test set for three different cases: (i) using MLaaS4HEP to read and normalize events, and to train the ML model; (ii) using MLaaS4HEP to read and normalize events, and using a jupyter notebook to perform the training of the ML model outside MLaaS4HEP; (iii) using a jupyter notebook to perform the entire pipeline without using MLaaS4HEP accuracy score (see Fig. 8e) is between 0.7 and 0.8 during the training compared to 0.9 and 1 in the former case.
Further, we trained our model with chunks made of 50% signal events and 50% of background events, and demonstrated that eventual effects caused by the unbalancing of classes in the data can be avoided.This test confirmed the results obtained before, the ttH noDRmatch was causing the spikes in the trend of the metrics in Figs 8a, 8b, 8c.When this background file is not present, the ML model almost perfectly distinguish signal from the background.
The effect of the ttH noDRmatch file on overall performance is due to the fact that it has a signature similar to the signal events, with the only difference that signal events match with the Higgs boson.Such similarities at the attribute level influence the training process, and they are responsible for the observed spikes in ML evaluation metrics.
Table 1 shows the comparison of different metrics using original 27 features and a reduced set based on 14 features.As can be seen, the performance of the ML model based on 27 features is better than the analysis   Fig. 8 Comparison of the metrics (loss, accuracy and AUC) score for the training plus validation, and test set using all the events of the pool of files, read in chunk of size 10 thousands (see plots 8a, 8b, 8c).The plots 8d, 8e, 8f show the comparison of the same metrics for three tests: one without the ttH noDRmatch ROOT file in the background files list, one with only the ttH noDRmatch ROOT file as background file, and finally, the third test (which repeats the first two) with a symmetric composition (50% and 50%) of signal and background in each data chunk (line with the 'x' marker) one 6 , while the situation is reversed for ML model based on 14 features.The reason of the discrepancy between different models is two-fold: we did not perform any ML tuning, and the results of physics analysis was based on the TMVA tool, which we treated as a black-box.Moreover, we knew that in TMVA a weight for each ROOT file was applied according the inverse of the luminosity.Therefore we decided that the difference in the AUC scores obtained is acceptable to validate the MLaaS4HEP framework.

MLaaS4HEP performance
In this section we provide details of MLaaS4HEP performance testing: the scalability of the framework and its benchmarks using different storage layers.For that purpose we used all available ROOT files without any physics cuts.This gave us 8 ROOT files (one of which contains signal events), with 74 branches (22 flat and 52 Jagged), with a total size of about 10.1 GB and about 28.5 million events.Fig. 9 shows steps performed by the MLaaS4HEP pipeline, in particular those inside the Streaming and Training layer.
The first step (denoted by 1 in Fig. 9) represents the reading part of MLaaS4HEP pipeline to create a specs file.This file contains all the information about the ROOT files: the dimension of the Jagged branches, the minimum and the maximum for each branch, the number of events for each ROOT file.The second part of the plot shown as 2 describes the following structure.We performed a loop over the files where in the first part we read from the i-th file (an amount of events equal to the chunk size which is fixed a priori by the user can be provided).Then, the right proportion of events with respect to the whole amount of events is taken from it (n i /N tot • chunk size where n i is the number of the events of the i-th file and N tot is the whole amount of events of all the files).These events are converted into Numpy arrays, the dimensions related to Jagged arrays are fixed, and the values are normalised.After this loop, and after having created a chunk of events properly mixed from the different files, the events are fed into the ML model (in this case a Keras model) that is trained and updated.At this point, if the files are not completely read the entire pipeline is restarted from point 2 until all the events are read.
As shown in Fig. 9 there are several steps to be performed before producing a trained ML model where each of them has to be tested in terms of performance.We performed all the tests running MLaaS4HEP framework on macOS, 2.2 GHz Intel Core i7 dual-core, 8 GB of RAM and on CentOS 7 Linux, 4 VCPU Intel Core Processor Haswell 2.4 GHz, 7.3 GB of RAM CERN Virtual Machine.The ROOT files are read from files obtained from local file-systems and remotely from the Grid sites.In particular, we read files remotely from three different data-centers located at Bologna (BO), Pisa (PI) and Bari (BA).
Table 2 summaries the I/O numbers we obtained in the first step of MLaaS4HEP pipeline using different regimes and chunk size of 100K events. 7n Fig. 10 we show a reading I/O frequency as a function of chunk size for different trials.In all the cases we found no significant peaks, thus based on these results we deduce that 100K chunk size might be a good choice.The larger chunk sizes can lead to a certain problems, as in the case of CERN VMs, where we may reach a limitation of underlying hardware, e.g.big memory footprint.
In the performance studies of the second step of MLaaS4HEP pipeline we are interested in the data reading part, the data pre-processing step (which include data transformation), and the time spent in the MLaaS4HEP training step.
As already mentioned, there is a loop over the files that allows to build the chunk with the right proportion of the events.If necessary, a chunk of events from the selected ROOT file is read, and the time for reading is added to the whole time spent for creating the chunk.In other words, the time spent for creating a chunk is made by the sum of n reading actions (where n goes from 0, when no ROOT files has to be read, to the number of ROOT files that happens the first time of the loop), and of the time to pre-process the events.The frequencies for creating a single data chunk (in terms of number of events in the chunk over the time spent) are reported  in Table 3.In Fig. 11 we show the frequency of creating a chunk as a function of chunk size for different trials.
We found that the time spent for creating a chunk is almost the same if we use macOS or CERN VM, and if we use local or remote files.Naturally, if remote files are used, the reading time increases consequently, and the time for creating the chunk increases, but this difference is quite negligible.Taking the first case in Table 3, the frequency 1.11 kHz is traduced in almost 87.8 seconds spent to create a chunk of 100 thousands of events.
In conclusion, the usage of Machine Learning as a Service architecture on a physics use-case demonstrated the following: the ML model produced with MLaaS4HEP has comparable results with traditional analysis approaches; the MLaaS4HEP framework is capable to read local and remote files; its performance allows to reach 13.4 kHz for reading distributed ROOT files, and 1.2 kHz for the pre-processing step using 100 thousand events as chunk size.The performances of MLaaS4HEP can be further improved by additional steps discussed in Sect. 5.

Future directions
We foresee that the Machine Learning as a Service approach can be widely applicable in HEP.As such, further improvements should be explored.

Data Streaming Layer
To improve the data streaming layer a multi-threaded IO layer can be implemented.This can be achieved by wrapping up the data reader code-base into a service which will deliver the data chunks in parallel upon requests from upstream layer.In addition, the chunks  Table 3 Frequency of creation and of pre-processing step for a chunk size of 100 thousands events computed as the ratio of the number of events and the time spend on chunk creation.The difference between the two steps is based on the reading part, i.e. the time for creating a chunk, as the sum of times for reading events from the ROOT files, and the time for pre-processing step can be pre-fetched into local cache to improve the I/O throughput.

Data Training Layer
If data I/O parallelism can be achieved via service like functionality of data streaming layer further improvements can be achieved via implementation of distributed training.There is plenty of R&D in this direc-tion, from adopting the Dask python framework [21], to using MLflow framework [22] on a HDFS+Spark infrastructure, which explores both task and data parallelism approaches.
The current landscape of ML framework is changing rapidly, and we should be adapting MLaaS4HEP to existing and future ML framework and innovations.For instance, Open Network Exchange Format [23] opens up the door to migration of models from one framework into another.So far we are working on the automatic As discussed in Sect.3.2 there are different approaches to feed Jagged Array into ML framework the and R&D in this direction is in progress.For instance, for AutoEncoder (AE) models the vector representation with padded values should always keep around a cast vector since the AE model transform the input vector into an internal dense representation and then decode it back into original representation.The latter transformation can use a cast vector to assign back the padded values, and if necessary convert vector representation of the data back to Jagged Array or ROOT TTree datastructures.

Data Inference Layer
On the inference side several approaches can be used.As discussed above, the TFaaS services [9] can be used for HTTP based clients and it may become a repository of pre-trained models.Or, if greater performances are required a gRPC based solution such as SONIC [26] can provide fast inference layer based on FPGAs and GPUs based infrastructures.
In any way, we foresee a next logical steps towards a repository of pre-trained modules with flexible search capabilities, extended model tagging, and versioning.This can be easily achieved by providing a dedicated service for ML models with proper meta-data description.

MLaaS4HEP services
The proposed architecture allows to develop and deploy training and inference layers as independent services where separate resource providers can be used and dynamically scaled if necessary, e.g.GPUs/TPUs can be provisioned on-demand using the commercial cloud(s) for training purposes of specific models, while inference TFaaS service can reside at CERN premises.For instance, the continuous training of complex DL models would be possible when data produced by the experiment will be placed on WLCG sites, and the training service will receive a set of notifications about newly available data, and re-train specific model(s).When a new model is ready it can be easily pushed to TFaaS and be available for end-users immediately without any intervention on the existing infrastructure.The TFaaS can be further optimized to use FPGAs to speed up the inference phase.We foresee that such an approach may be more flexible and cost-effective for HEP experiments in the HL-LHC era.As such, we plan to perform additional R&D studies in this direction and evaluate further MLaaS4HEP services using available resources.

Summary
In this paper, we presented a novel approach to train HEP ML models using the native ROOT data-format.The MLaaS4HEP consists of three layers: the datastreaming and data-training as part of MLaaS4HEP framework [10], and the data-inference framework based on TensorFlow library [9].All three layers are implemented as independent components.The data streaming layer relies on the uproot library for read-ing data from ROOT files (local or remote) and yielding NumPy (Jagged) arrays upstream.The data training layer transforms the input Jagged Array portion of the data into vector representation, and passes it into the ML framework provided by the user.Since outcome of the data-trained layer consists of the NumPy array it can be used with any ML Python based framework which supports such input, e.g.TensorFlow, PyTorch and others.Finally, the data-inference layer was implemented as an independent service (TFaaS) to serve Ten-sorFlow models via an HTTP protocol.Such flexible architecture allows to perform ML training over large set of distributed HEP ROOT data without physically downloading data into local storage.It reads and transforms ROOT Tree data representation (Jagged Array) into an intermediate flat data-format suitable as an input for the underlying ML framework.We demonstrated that such architecture is capable of reading distributed, arbitrary size data sets, e.g.reading the data from WLCG infrastructure, and potentially allow to train HEP ML models over large data sets at any scale.We used an official CMS t t Higgs analysis (t tH(bb)) in the boosted, all-hadronic final states to validate Machine Learning as a Service approach, and we have successfully have shown that it can be used to achieve comparable ML model performances on CMS NANOAOD data-files with respect to a traditional physics analysis based on data extraction from ROOT files into custom Ntuples and using open-source ML frameworks.

1 :-Fig. 1
Fig. 1 MLaaS4HEP architecture diagram representing three independent layers: a Data Streaming Layer to read local or remote ROOT files, a Data Training Layer to feed tree-based HEP data into ML framework, and a Data Inference Layer via TensorFlow as a Service

Fig. 2 Fig. 3 A
Fig. 2 Jagged Array data representation.It consists of flat attributes followed by Jagged attributes whose dimensions vary event by event branches representation as fixed size branch vectors in some (eta-phi) space rest of branch vectors flat branches Transform jagged NumPy matrix form (eta-phi phase)

Fig. 5 AFig. 6
Fig. 5 A vector representation of Jagged Array along with corresponding mask vector < l a t e x i t s h a 1 _ b a s e 6 4 = " D w

Fig. 9
Fig. 9 Schematic representation of the steps performed in the MLaaS4HEP pipeline, in particular those inside the Streaming and Training layer (see text for details) Fig. 11 Frequency for creating a chunk as a function of chunk size for different trials

Table 1
Comparison of the loss, accuracy and AUC score between the 27 and 14 features cases, with the addition of the analysis group reference for the AUC metric, see text for details time to go through reading + specs comp.reading freq.specs comp.freq.

Table 2
Performances of reading and specs computing phase with chunk size fixed to 100 thousands events, using the macOS system and the CERN VM.Here, BO, BA and PI represent different Italian storage facilities with different WAN configurations (see text for more details)