Security of Ubiquitous Computing Systems pp 133145  Cite as
It Started with Templates: The Future of Profiling in SideChannel Analysis
 2.3k Downloads
Abstract
Sidechannel attacks (SCAs) are powerful attacks based on the information obtained from the implementation of cryptographic devices. Profiling sidechannel attacks has received a lot of attention in recent years due to the fact that this type of attack defines the worstcase security assumptions. The SCA community realized that the same approach is actually used in other domains in the form of supervised machine learning. Consequently, some researchers started experimenting with different machine learning techniques and evaluating their effectiveness in the SCA context. More recently, we are witnessing an increase in the use of deep learning techniques in the SCA community with strong first results in sidechannel analyses, even in the presence of countermeasures. In this chapter, we consider the evolution of profiling attacks, and subsequently we discuss the impacts they have made in the data preprocessing, feature engineering, and classification phases. We also speculate on the future directions and the bestcase consequences for the security of small devices.
8.1 Introduction
In 1996, Kocher demonstrated the possibility to recover secret data by introducing a method for exploiting leakages from the device under attack [338]. In other words, implementations of cryptographic algorithms leak relevant information about the data processed through physical sidechannels such as timing [338], power consumption [339], EM emanation [493], and sound [225].
Sidechannel attacks (SCAs) exploit weaknesses in the physical implementation of cryptographic algorithms rather than the algorithms themselves [389]. Those weaknesses stem from the physics of the underlying computing elements, i.e., CMOS cells, which makes it hard to eliminate such threats.
Numerous evaluation techniques, which generally involve some form of digital signal processing and statistical computations, have been proposed in the literature. Some of the most important methods include Simple Power Analysis (SPA) [339], Differential Power Analysis (DPA), and Template Attacks (TA) [135].
The SPA technique implies that the attacker aims at reconstructing the secret key using just a single trace of sidechannel information, and it often exploits the difference in basic publickey operations such as doubleandadd, or addandmultiply [339]. Still, SPA is not possible if the observed signaltonoise ratio (SNR) is not high enough. Consequently, most of the time developed countermeasures make SPA futile.
DPA techniques are based on the evaluation of many traces with varying input data for the targeted algorithm. After that step, a bruteforce attack, testing subkey hypotheses, is performed on a part of the algorithm (socalled “divide and conquer”). In the DPA approach, a large number of samples are used in order to reduce noise by averaging, and a singlebit power model is commonly adopted [339]. On the other hand, Correlation Power Analysis (CPA) represents a multibit power model in order to reduce the influence of noise on the possibility to execute a successful attack [115]. The main difference between these two techniques is that DPA is based on computing the difference between two trace sets, while CPA uses the correlation coefficient in order to calculate the dependency test. We often also say that the two use different sidechannel distinguishers. Sidechannel attacks using the above three techniques have been reported on a wide variety of cryptographic implementations, see, e.g., [154, 402, 410, 412, 434, 500] including some realworld applications [196].
In contrast to DPA, TA requires a profiling stage, i.e., a step during which the cryptographic hardware is under full control of the adversary to estimate the probability distribution of the leaked information and make better use of all the information present in each sample [135]. In this way, TA can provide a promising model of the real device, instead of using some a priori model.
TA is the best (optimal) technique from an informationtheoretic point of view if the attacker has an unbounded number of traces and the noise follows the Gaussian distribution [277, 367]. After the template attack, the stochastic attack emerged using linear regression in the profiling phase [515]. In the years that followed, researchers recognized certain shortcomings of template attacks and tried to modify them in order to deal better with the complexity and portability issues. An example of such an approach is the pooled template attack where only one pooled covariance matrix is used in order to cope with statistical difficulties [142]. Alongside such techniques, the SCA community realized that a similar approach to profiling is used in other domains in the form of supervised machine learning. Consequently, some researchers started experimenting with different machine learning (ML) techniques and evaluating their effectiveness in the SCA context. Although mainly considering distinct scenarios and various ML techniques, all those papers tend to establish different use cases where ML techniques can outperform the template attack and establish themselves as the best choice for profiled SCA. More recently, we are witnessing the relevance of deep learning (DL) techniques in the SCA community with strong results in sidechannel analyses, even in the presence of countermeasures.
8.2 Profiled SideChannel Attacks
Profiled sidechannel attacks estimate the worstcase security risk by considering the most powerful sidechannel attacker. In particular, one assumes that an attacker can possess an additional device of which he or she has nearly full control. From this device, he obtains leakage measurements and is able to control the used secret key or at least knows which one is used. Knowing the secret key enables him to calculate intermediate processed values that involve the secret key for which he is estimating models. These models can then be used in the attacking phase to predict which intermediate values are processed and therefore carry information about the secret key. Commonly used models are the identity value or Hamming weight/distance.
Uniformly Distributed Classes
Targeting intermediate variables, e.g., when loaded or manipulated on the device and resulting mostly in 2^{n} uniformly distributed classes where n is the number of bits of the intermediate variable.
Binomial Distributed Classes
The Hamming Weight (HW) or the Hamming Distance (HD) of a uniformly distributed intermediate variable results in n + 1 binomially distributed classes.
8.2.1 Definition of Profiling Attacks
In this section, we consider sidechannel attacks on block ciphers for which a divide and conquer approach can be utilized. Note that, as there exist operations within the block cipher which manipulate each block/chunk (e.g., bytes in Advanced Encryption Standard (AES)) independently and most importantly involving only one block/chunk of the secret key, an attacker only needs to make hypotheses about the secret key block/chunk instead of the complete secret key at once.

profiling phase:N traces (measurements) \(\mathbf x_{p_1},\ldots ,\mathbf x_{p_N}\), the secret key \(k_p^*\), and plaintexts/ciphertexts \(t_{p_1},\ldots ,t_{p_N}\), such that he can calculate \(y(t_{p_1},k_p^*), \ldots , y(t_{p_N},k_p^*)\).

attacking phase:Q traces \(\mathbf x_{a_1},\ldots ,\mathbf x_{a_Q}\) (independent from the profiling traces), plaintexts/ciphertexts \(t_{a_1},\ldots ,t_{a_Q}\).
8.2.2 Data Preprocessing
In the data preprocessing phase, the aim is to prepare the data in a way to increase the performance of sidechannel analysis. There are several papers considering various data augmentation techniques in order to artificially generate measurements so as to increase the size of the profiling dataset. Cagli et al. propose two data augmentation techniques they call Shifting and AddRemove [122]. They use convolutional neural networks (CNN) and find data augmentation to significantly improve the performance of CNN. Pu et al. use a data augmentation technique where they randomly shift each measurement in order to increase the number of measurements available in the profiling phase [489]. They report that even such simple augmentation can effectively improve the performance of profiling SCA. Picek et al. experiment with several data augmentation and class balancing techniques in order to decrease the influence of highly unbalanced datasets that occur when considering HW/HD models [478]. They show that by using a wellknown machine learning technique called SMOTE, it is possible to reduce the number of measurements needed for a successful attack by up to 10 times. Kim et al. investigate how the addition of artificial noise to the input signal can be beneficial to the performance of the neural network [329].
8.2.3 Feature Engineering

feature selection. Here, the most important subsets of features are selected. We can distinguish between filter, wrapper, and hybrid techniques.

dimensionality reduction. The original features are transformed into new features. A common example of such a technique is Principal Component Analysis (PCA) [25].
When discussing feature engineering, it is important to mention the curse of dimensionality. This describes the effects of an exponential increase in volume associated with the increase in the dimensions [71]. As a consequence, as the dimensionality of the problem increases, the classifier’s performance increases until the optimal feature subset is reached. Further increasing the dimensionality without increasing the number of training samples results in a decrease in the classifier performance.
 Pearson Correlation Coefficient. The Pearson correlation coefficient measures the linear dependence between two variables, x and y, in the range [−1, 1], where 1 is a total positive linear correlation, 0 is no linear correlation, and − 1 means a total negative linear correlation. The Pearson correlation for a sample of the entire population is defined by [301]:where \(\bar x\) and \(\bar y\) are the empirical means of x and y, respectively.$$\displaystyle \begin{aligned} Pearson(x,y) = \frac{\sum_{i=1}^N ((x_i\bar x)(y_i  \bar y))}{\sqrt{\sum_{i=1}^N(x_i\bar x)^2} \sqrt {\sum_{i=1}^N(y_i\bar y)^2}}, \end{aligned} $$(8.3)
 SOSD. In [230], the authors proposed as a selection method the sum of squared differences, simply as:where \(\bar x_{y_i}\) is the mean of the traces where the model equals y_{i}. Because of the square term, SOSD is always positive. Another advantage of using the square is that it enlarges big differences.$$\displaystyle \begin{aligned} SOSD(x,y) = \sum_{i,j>i}(\bar x_{y_i}\bar x_{y_j})^2, \end{aligned} $$(8.4)
 SOST. SOST is the normalized version of SOSD [230] and is thus equivalent by the pairwise student Ttest:with \(n_{y_i}\) and \(n_{y_j}\) being the number of traces where the model equals to y_{i} and y_{j}, respectively.$$\displaystyle \begin{aligned} SOST(x,y) = \sum_{i,j>i} \left( {(\bar x_{y_i}\bar x_{y_j})} / {\sqrt{\frac{ \sigma_{y_i}^2}{n_{y_i}} + \frac{\sigma_{y_j}^2}{n_{y_j}}} }\right)^2 \end{aligned} $$(8.5)
There are several more relevant works in the domain of feature selection and SCA. The work of Lerman et al. [367] compared template attacks and machine learning on dimensionality reduction. They concluded that template attacks are the method of choice as long as a limited number of features can be identified in leakage traces containing most of the relevant information. Zheng et al. looked into feature selection techniques but they did not consider machine learning options [600]. Picek et al. conducted a detailed analysis of various feature selection techniques where some are also based on machine learning (socalled wrapper and hybrid methods) [477]. They concluded that commonly used feature selection techniques in SCA are rarely the best ones and they mentioned L1 regularization as a powerful feature selector in many scenarios.
8.3 Template Attacks
In this section, we start by explaining the details of template attacks, and after that we give details about two techniques that emerged from template attacks—pooled template attacks and stochastic attacks.
8.3.1 Context of Template Attack
In the pioneering template attacks article of Chari, Rao, and Rohatgi, it is shown that template attacks apply advanced statistical methods and can break implementations secure against other forms of sidechannel attacks [135].
In some works template attacks are built to classify the state of a byte, e.g., a key byte in RC4 [135, 498]. The weakness of these papers is the need to create 256 templates for each byte. Additionally, the template building process can only be guided by partial attack results. In [498], the authors reduce the number of points of a trace by using an efficient algorithm instead of the standard principal component analysis method, which increases the speed of selecting points of interest. Also, by introducing a preprocessing phase with the use of discrete Fourier transformation on traces, the authors improve the template attack results in practice.
Agrawal et al. develop two new attack techniques that extend the work of the previously mentioned research results [11]. The first is a singlebit template attack technique that creates templates from peaks observed in a DPA attack resulting with a high probability value of a single DPAtargeted bit. Their second, templateenhanced DPA attack technique can be used to attack DPA protected cards and consists of two steps: a profiling phase and a hypothesis testing phase. In the first, profiling phase, the attacker, who is in possession of a smart card with a biased RNG, builds templates, and in the hypothesis testing phase the attacker uses previously built templates to mount a DPAlike attack on a target card which is identical to the test smart card, but has a perfect RNG. The authors illustrate these two attack techniques considering unprotected implementations of DES and AES on smart cards.
Archambeau et al. take template attacks techniques a step further by transforming leakage traces in order to identify important features (i.e., transformed time instants) and their number automatically. Actually, they use the optimal linear combination of the relevant time samples and execute template attacks in the principal subspace of the mean traces creating a new approach, the principal subspacebased template attack (PSTA) [25]. The authors validate this approach by attacking the RC4 stream cipher implementation and an FPGA implementation of AES.
In the literature, the main focus is on template attacks aiming at recovering the secret key of a cryptographic core from measurements of its dynamic power consumption. But with scaling of technology, static power consumption grows faster and creates new issues in the security of smart card hardware. Therefore, Bellizia et al. proposed Template Attack Exploiting Static Power (TAESP) in order to extract information from a hardware implementation of a cryptographic algorithm using temperaturedependence of static currents as a source of information leakage [70].
8.3.2 Standard Template Attack
8.3.3 Pooled Template Attack
8.3.4 Stochastic Attack
8.4 Machine LearningBased Attacks
Machine learning encompasses a number of methods used for classification, clustering, regression, feature selection, and other knowledge discovering methods [423]. A typical division of machine learning algorithms is into supervised, semisupervised, and unsupervised approaches. Each of those paradigms can also be used in SCAs—supervised (profiling) attacks, semisupervised attacks (profiling), unsupervised (nonprofiling) attacks.
Supervised Techniques
The supervised approach assumes that the attacker first possesses a device similar to the one under attack. Having this additional device, he is then able to build a precise profiling model using a set of measurements while knowing the plaintext/ciphertext and the secret key of this device. In the second step, the attacker uses the earlier profiling model to reveal the secret key of the device under attack. For this, he additionally measures a new set of traces, but as the key is secret he has no further information about the intermediate processed data and thus builds hypotheses. The only information that the attacker transfers between the profiling phase and the attacking phase is the profiling model he builds.
When considering supervised machine learning and SCA, in recent years there have been numerous papers considering various targets, machine learning algorithms, and scenarios. Actually, the most common denominator for most of the work is the fact that they attack AES [235, 274, 279, 285, 363, 364, 365, 367, 475, 476, 479, 481]. More recently, deep learning (DL) techniques started to capture the attention of the SCA community. Accordingly, the first results confirmed expectations, with most of the early attention being paid to convolutional convolutional neural networks [122, 329, 386, 482].
As far as we know, when considering machine learningbased attacks on other ciphers, there are only a few papers. Heuser et al. consider Internet of Things scenarios and lightweight ciphers where they compare 11 lightweight ciphers and AES in terms of their SCA resilience and conclude that lightweight ciphers cannot be considered to be significantly less resilient than AES [274, 276].
Semisupervised Techniques
Semisupervised learning is positioned in the middle between supervised and unsupervised learning. There, the basic idea is to take advantage of a large quantity of unlabeled data during a supervised learning procedure [517]. This approach assumes that the attacker is able to possess a device to conduct a profiling phase but has limited capacities. This may reflect a more realistic scenario in some practical applications, as the attacker may be limited by time or resources, or also face implemented countermeasures, which prevent him from taking an arbitrarily large amount of sidechannel measurements while knowing the secret key of the device.
The first application of semisupervised SCA was done by Lerman et al., where the authors conclude that the semisupervised setting cannot compete with a supervised setting [366]. Note, the authors compared the supervised attack with n + m labeled traces for all classes with a semisupervised attack with n labeled traces for one class and m unlabeled traces for other unknown classes (i.e., in total n + m traces). Picek et al. conduct an analysis of two semisupervised paradigms (selftraining and graphbased learning) where they show that it is possible to improve the accuracy of classifiers if semisupervised learning is used [480]. What is especially interesting is that they show how semisupervised learning is able to significantly improve the behavior of the template attack when the profiling set is (very) small.
8.4.1 Conducting Sound Machine Learning Analysis
Since it is not possible (in general) to expect machine learning techniques to give us theoretical observations or proofs of results, we need to rely on a set of procedures to run experiments such that the results are convincing and easy to reproduce. In the next section, we briefly discuss several steps to be considered in order to make the analysis more reproducible.
Datasets
When preparing the data for machine learning analysis, it is necessary to discuss the number of measurements, the number of features, and the number of classes (if known). Additionally, if the data come from different distributions, one needs to discuss those. If not all data from datasets are used, it is necessary to state how the samples are chosen and how many are used in the experiments. One needs to define the level of noise appearing in the data in a clearly reproducible way, e.g., using the signaltonoise ratio (SNR). Finally, if some feature engineering procedure is used, it needs to be clearly stated in order to know what features are used in the end.
Algorithms
When discussing the choice of algorithms, first it is necessary either to specify which framework and algorithms are used or provide pseudocode (for example, when custom algorithms are used). As a rule of thumb, more than one algorithm should always be used: the algorithms should ideally belong to different machine learning approaches (e.g., a decision tree method like Random Forest and a kernel method like Support Vector Machine (SVM)). Next, all parameters that uniquely define the algorithm need to be enumerated.
Experiments
Regarding the experiments, it is first necessary to discuss how the data are divided into training and testing sets. Then, for the training phase, one needs to define the test options (e.g., whether to use the whole dataset or crossvalidation, etc.) After that, for each algorithm, one needs to define a set of parameter values to conduct the tuning phase. There are different options for tuning, but we consider starting with the default parameters as a reasonable approach and continue varying them until there is no more improvement. Naturally, this should be done in a reasonable way, since the tuning phase is the most expensive from the computational perspective and it is usually not practical to test all combinations of parameters.
Results
For the tuning phase, it is usually sufficient to report the accuracy. For the testing results, one should report the accuracy but also some other metric like the area under the ROC curve (AUC) or the Fmeasure. The area under the ROC curve is used to measure the accuracy and is calculated via MannWhitney statistics [580]; the ROC curve is the ratio between the true positive rate and the false positive rate. An AUC close to 1 represents a good test, while a value close to 0.5 represents a random guess. The Fmeasure is the harmonic mean of the precision and recall, where precision is the ratio between true positive (TP, the number of examples predicted positive that are actually positive) and predicted positive. The recall is the ratio between true positives and actual positives [488]. Both the FMeasure and the AUC can help in situations where accuracy can be misleading, i.e., where we are also interested in the number of false positive and false negative values.
8.5 Performance Metrics
8.6 Countermeasures Against SCA

Noise Addition. Introducing external noise in the sidechannel, shuffling the operations or inserting dummy operations in cryptographic implementations is often used as a countermeasure against SCAs. The basic objective is to reduce the signaltonoise ratio (SNR) and thereby decrease the information gathered from measurements. Still, as shown already by Durvaux et al. [194], these countermeasures become insecure with increasing attack time.

Dynamic and Differential CMOS Logic. Tiri et al. [557] proposed Sense Amplifier Based Logic (SABL)—a logic style that uses a fixed amount of charge for every transition, including the degenerated events in which a gate does not change state.

Leakage Resilience. Another countermeasure, typically applied at the system level, focuses on restricting the number of usages of the same key for an algorithm. Still, generation and synchronization of new keys have practical issues. Dziembowski et al. introduced a technique called leakage resilience, which relocates this problem to the protocol level by introducing an algorithm to generate these keys [195].

Masking. One of the most efficient and powerful approaches against SCAs is masking [134, 243], which aims to break the correlation between the power traces and the intermediate values of the computations. This method achieves security by randomizing the intermediate values using secret sharing and carrying out all the computations on the shared values.
8.7 Conclusions
In this chapter, we discussed profiling sidechannel attacks where we started with data preprocessing and feature engineering. Then we presented several templatelike techniques and afterward machine learning techniques. Next, we discussed how to conduct a sound machine learning analysis that should result in reproducible experiments. We finished the chapter with a short discussion on how to test the performance of SCA and what are some of the possible countermeasures to make such attacks more difficult.
References
 11.Dakshi Agrawal, Josyula R. Rao, Pankaj Rohatgi, and Kai Schramm. Templates as Master Keys. In CHES, volume 3659, pages 15–29. Springer, August 29 – September 1 2005. Edinburgh, UK.Google Scholar
 25.Cédric Archambeau, Éric Peeters, FrançoisXavier Standaert, and JeanJacques Quisquater. Template Attacks in Principal Subspaces. In CHES, volume 4249 of LNCS, pages 1–14. Springer, October 10–13 2006. Yokohama, Japan.Google Scholar
 70.Davide Bellizia, Milena Djukanovic, Giuseppe Scotti, and Alessandro Trifiletti. Template attacks exploiting static power and application to CMOS lightweight cryptohardware. I. J. Circuit Theory and Applications, 45(2):229–241, 2017.CrossRefGoogle Scholar
 71.Richard Ernest Bellman. Dynamic Programming. Dover Publications, Incorporated, 2003.Google Scholar
 115.Éric Brier, Christophe Clavier, and Francis Olivier. Correlation Power Analysis with a Leakage Model. In CHES, volume 3156 of LNCS, pages 16–29. Springer, August 11–13 2004. Cambridge, MA, USA.Google Scholar
 122.Eleonora Cagli, Cécile Dumas, and Emmanuel Prouff. Convolutional neural networks with data augmentation against jitterbased countermeasures  profiling attacks without preprocessing. In Cryptographic Hardware and Embedded Systems  CHES 2017  19th International Conference, Taipei, Taiwan, September 25–28, 2017, Proceedings, pages 45–68, 2017.Google Scholar
 134.Suresh Chari, Charanjit Jutla, Josyula Rao, and Pankaj Rohatgi. Towards sound approaches to counteract poweranalysis attacks. In Advances in Cryptology  CRYPTO’99, pages 791–791. Springer, 1999.Google Scholar
 135.Suresh Chari, Josyula R. Rao, and Pankaj Rohatgi. Template Attacks. In CHES, volume 2523 of LNCS, pages 13–28. Springer, August 2002. San Francisco Bay (Redwood City), USA.Google Scholar
 142.Omar Choudary and Markus G. Kuhn. Efficient template attacks. In Aurélien Francillon and Pankaj Rohatgi, editors, Smart Card Research and Advanced Applications  12th International Conference, CARDIS 2013, Berlin, Germany, November 27–29, 2013. Revised Selected Papers, volume 8419 of LNCS, pages 253–270. Springer, 2013.Google Scholar
 154.JeanSébastien Coron. Resistance against differential power analysis for elliptic curve cryptosystems. In Proceedings of the First International Workshop on Cryptographic Hardware and Embedded Systems, CHES ’99, pages 292–302, London, UK, UK, 1999. SpringerVerlag.Google Scholar
 194.François Durvaux, Mathieu Renauld, FrançoisXavier Standaert, Loic van Oldeneel tot Oldenzeel, and Nicolas VeyratCharvillon. Cryptanalysis of the ches 2009/2010 random delay countermeasure. IACR Cryptology ePrint Archive, 2012:38, 2012.Google Scholar
 195.Stefan Dziembowski and Krzysztof Pietrzak. Leakageresilient cryptography. In Foundations of Computer Science, 2008. FOCS’08. IEEE 49th Annual IEEE Symposium on, pages 293–302. IEEE, 2008.Google Scholar
 196.Thomas Eisenbarth, Timo Kasper, Amir Moradi, Christof Paar, Mahmoud Salmasizadeh, and Mohammad T. Manzuri Shalmani. On the Power of Power Analysis in the Real World: A Complete Break of the KeeLoq Code Hopping Scheme. In CRYPTO, volume 5157 of Lecture Notes in Computer Science, pages 203–220. Springer, August 17–21 2008. Santa Barbara, CA, USA.Google Scholar
 225.Daniel Genkin, Adi Shamir, and Eran Tromer. Acoustic cryptanalysis. Journal of Cryptology, 30(2):392–443, Apr 2017.CrossRefGoogle Scholar
 230.Benedikt Gierlichs, Kerstin LemkeRust, and Christof Paar. Templates vs. Stochastic Methods. In CHES, volume 4249 of LNCS, pages 15–29. Springer, October 10–13 2006. Yokohama, Japan.Google Scholar
 235.R. Gilmore, N. Hanley, and M. O’Neill. Neural network based attack on a masked implementation of aes. In 2015 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pages 106–111, May 2015.Google Scholar
 243.Louis Goubin and Jacques Patarin. Des and differential power analysis the “duplication” method. In Cryptographic Hardware and Embedded Systems, pages 728–728. Springer, 1999.Google Scholar
 274.A. Heuser, S. Picek, S. Guilley, and N. Mentens. Lightweight ciphers and their sidechannel resilience. IEEE Transactions on Computers, PP(99):1–1, 2017.Google Scholar
 275.Annelie Heuser, Michael Kasper, Werner Schindler, and Marc Stöttinger. A New Difference Method for SideChannel Analysis with HighDimensional Leakage Models. In Orr Dunkelman, editor, CTRSA, volume 7178 of Lecture Notes in Computer Science, pages 365–382. Springer, 2012.Google Scholar
 276.Annelie Heuser, Stjepan Picek, Sylvain Guilley, and Nele Mentens. Sidechannel analysis of lightweight ciphers: Does lightweight equal easy? In Radio Frequency Identification and IoT Security  12th International Workshop, RFIDSec 2016, Hong Kong, China, November 30  December 2, 2016, Revised Selected Papers, pages 91–104, 2016.Google Scholar
 277.Annelie Heuser, Olivier Rioul, and Sylvain Guilley. Good is Not Good Enough — Deriving Optimal Distinguishers from Communication Theory. In Lejla Batina and Matthew Robshaw, editors, CHES, volume 8731 of Lecture Notes in Computer Science. Springer, 2014.Google Scholar
 278.Annelie Heuser, Werner Schindler, and Marc Stöttinger. Revealing sidechannel issues of complex circuits by enhanced leakage models. In Wolfgang Rosenstiel and Lothar Thiele, editors, DATE, pages 1179–1184. IEEE, 2012.Google Scholar
 279.Annelie Heuser and Michael Zohner. Intelligent Machine Homicide  Breaking Cryptographic Devices Using Support Vector Machines. In Werner Schindler and Sorin A. Huss, editors, COSADE, volume 7275 of LNCS, pages 249–264. Springer, 2012.Google Scholar
 285.Gabriel Hospodar, Benedikt Gierlichs, Elke De Mulder, Ingrid Verbauwhede, and Joos Vandewalle. Machine learning in sidechannel analysis: a first study. Journal of Cryptographic Engineering, 1:293–302, 2011. 10.1007/s133890110023x.Google Scholar
 301.Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibsihrani. An Introduction to Statistical Learning. Springer Texts in Statistics. Springer, 2001.Google Scholar
 329.Jaehun Kim, Stjepan Picek, Annelie Heuser, Shivam Bhasin, and Alan Hanjalic. Make some noise: Unleashing the power of convolutional neural networks for profiled sidechannel analysis. Cryptology ePrint Archive, Report 2018/1023, 2018. https://eprint.iacr.org/2018/1023.
 338.Paul C. Kocher. Timing Attacks on Implementations of DiffieHellman, RSA, DSS, and Other Systems. In Proceedings of CRYPTO’96, volume 1109 of LNCS, pages 104–113. SpringerVerlag, 1996.Google Scholar
 339.Paul C. Kocher, Joshua Jaffe, and Benjamin Jun. Differential power analysis. In Proceedings of the 19th Annual International Cryptology Conference on Advances in Cryptology, CRYPTO ’99, pages 388–397, London, UK, UK, 1999. SpringerVerlag.Google Scholar
 363.Liran Lerman, Gianluca Bontempi, and Olivier Markowitch. Power analysis attack: An approach based on machine learning. Int. J. Appl. Cryptol., 3(2):97–115, June 2014.MathSciNetCrossRefGoogle Scholar
 364.Liran Lerman, Gianluca Bontempi, and Olivier Markowitch. A machine learning approach against a masked AES  Reaching the limit of sidechannel attacks with a learning model. J. Cryptographic Engineering, 5(2):123–139, 2015.CrossRefGoogle Scholar
 365.Liran Lerman, Stephane Fernandes Medeiros, Gianluca Bontempi, and Olivier Markowitch. A Machine Learning Approach Against a Masked AES. In CARDIS, Lecture Notes in Computer Science. Springer, November 2013. Berlin, Germany.Google Scholar
 366.Liran Lerman, Stephane Fernandes Medeiros, Nikita Veshchikov, Cédric Meuter, Gianluca Bontempi, and Olivier Markowitch. Semisupervised template attack. In Emmanuel Prouff, editor, COSADE 2013, Paris, France, 2013, Revised Selected Papers, pages 184–199. Springer, 2013.Google Scholar
 367.Liran Lerman, Romain Poussier, Gianluca Bontempi, Olivier Markowitch, and FrançoisXavier Standaert. Template attacks vs. machine learning revisited (and the curse of dimensionality in sidechannel analysis). In Stefan Mangard and Axel Y. Poschmann, editors, Constructive SideChannel Analysis and Secure Design  6th International Workshop, COSADE 2015, Berlin, Germany, April 13–14, 2015. Revised Selected Papers, volume 9064 of Lecture Notes in Computer Science, pages 20–33. Springer, 2015.Google Scholar
 386.Houssem Maghrebi, Thibault Portigliatti, and Emmanuel Prouff. Breaking cryptographic implementations using deep learning techniques. In Security, Privacy, and Applied Cryptography Engineering  6th International Conference, SPACE 2016, Hyderabad, India, December 14–18, 2016, Proceedings, pages 3–26, 2016.Google Scholar
 389.Stefan Mangard, Elisabeth Oswald, and Thomas Popp. Power Analysis Attacks: Revealing the Secrets of Smart Cards. Springer, December 2006. ISBN 0387308571, http://www.dpabook.org/.zbMATHGoogle Scholar
 402.Rita MayerSommer. Smartly Analyzing the Simplicity and the Power of Simple Power Analysis on Smartcards. In CHES, volume 1965 of LNCS, pages 78–92. Springer, May 14–16 2001. http://citeseer.nj.nec.com/mayersommer01smartly.html.
 410.Thomas S. Messerges. Using SecondOrder Power Analysis to Attack DPA Resistant Software. In CHES, volume 1965 of LNCS, pages 238–251. SpringerVerlag, August 17–18 2000. Worcester, MA, USA.Google Scholar
 412.Thomas S. Messerges, Ezzy A. Dabbish, and Robert H. Sloan. Power Analysis Attacks of Modular Exponentiation in Smartcards. In Çetin Kaya Koç and Christof Paar, editors, CHES, volume 1717 of LNCS, pages 144–157. Springer, 1999.Google Scholar
 423.Thomas M. Mitchell. Machine Learning. McGrawHill, Inc., New York, NY, USA, 1 edition, 1997.Google Scholar
 434.Radu Muresan and Stefano Gregori. Protection Circuit against Differential Power Analysis Attacks for Smart Cards. IEEE Trans. Computers, 57(11):1540–1549, 2008.MathSciNetCrossRefGoogle Scholar
 475.Stjepan Picek, Annelie Heuser, Cesare Alippi, and Francesco Regazzoni. When theory meets practice: A framework for robust profiled sidechannel analysis. Cryptology ePrint Archive, Report 2018/1123, 2018. https://eprint.iacr.org/2018/1123.
 476.Stjepan Picek, Annelie Heuser, and Sylvain Guilley. Template attack versus bayes classifier. Journal of Cryptographic Engineering, 7(4):343–351, Nov 2017.CrossRefGoogle Scholar
 477.Stjepan Picek, Annelie Heuser, Alan Jovic, Lejla Batina, and Axel Legay. The secrets of profiling for sidechannel analysis: feature selection matters. IACR Cryptology ePrint Archive, 2017:1110, 2017.Google Scholar
 478.Stjepan Picek, Annelie Heuser, Alan Jovic, Shivam Bhasin, and Francesco Regazzoni. The curse of class imbalance and conflicting metrics with machine learning for sidechannel evaluations. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2019(1):209–237, Nov. 2018.CrossRefGoogle Scholar
 479.Stjepan Picek, Annelie Heuser, Alan Jovic, and Axel Legay. Climbing down the hierarchy: Hierarchical classification for machine learning sidechannel attacks. In Marc Joye and Abderrahmane Nitaj, editors, Progress in Cryptology  AFRICACRYPT 2017: 9th International Conference on Cryptology in Africa, Dakar, Senegal, May 24–26, 2017, Proceedings, pages 61–78, Cham, 2017. Springer International Publishing.Google Scholar
 480.Stjepan Picek, Annelie Heuser, Alan Jovic, Axel Legay, and Karlo Knezevic. Profiled sca with a new twist: Semisupervised learning. Cryptology ePrint Archive, Report 2017/1085, 2017. https://eprint.iacr.org/2017/1085.
 481.Stjepan Picek, Annelie Heuser, Alan Jovic, Simone A. Ludwig, Sylvain Guilley, Domagoj Jakobovic, and Nele Mentens. Sidechannel analysis and machine learning: A practical perspective. In 2017 International Joint Conference on Neural Networks, IJCNN 2017, Anchorage, AK, USA, May 14–19, 2017, pages 4095–4102, 2017.Google Scholar
 482.Stjepan Picek, Ioannis Petros Samiotis, Jaehun Kim, Annelie Heuser, Shivam Bhasin, and Axel Legay. On the performance of convolutional neural networks for sidechannel analysis. In Anupam Chattopadhyay, Chester Rebeiro, and Yuval Yarom, editors, Security, Privacy, and Applied Cryptography Engineering, pages 157–176, Cham, 2018. Springer International Publishing.Google Scholar
 488.David Martin Ward Powers. Evaluation: from precision, recall and ffactor to roc, informedness, markedness and correlation, 2007.Google Scholar
 489.Sihang Pu, Yu Yu, Weijia Wang, Zheng Guo, Junrong Liu, Dawu Gu, Lingyun Wang, and Jie Gan. Trace augmentation: What can be done even before preprocessing in a profiled sca? In Thomas Eisenbarth and Yannick Teglia, editors, Smart Card Research and Advanced Applications, pages 232–247, Cham, 2018. Springer International Publishing.Google Scholar
 493.JeanJacques Quisquater and David Samyde. Electromagnetic analysis (ema): Measures and countermeasures for smart cards. In Isabelle Attali and Thomas Jensen, editors, Smart Card Programming and Security, pages 200–210. Springer, 2001.Google Scholar
 498.Christian Rechberger and Elisabeth Oswald. Practical Template Attacks. In WISA, volume 3325 of LNCS, pages 443–457. Springer, August 2325 2004. Jeju Island, Korea.Google Scholar
 500.Mathieu Renauld, FrançoisXavier Standaert, Nicolas VeyratCharvillon, Dina Kamel, and Denis Flandre. A formal study of power variability issues and sidechannel attacks for nanoscale devices. In Kenneth G. Paterson, editor, Advances in Cryptology  EUROCRYPT 2011  30th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Tallinn, Estonia, May 1519, 2011. Proceedings, volume 6632 of Lecture Notes in Computer Science, pages 109–128. Springer, 2011.Google Scholar
 515.Werner Schindler, Kerstin Lemke, and Christof Paar. A Stochastic Model for Differential Side Channel Cryptanalysis. In LNCS, editor, CHES, volume 3659 of LNCS, pages 30–46. Springer, Sept 2005. Edinburgh, Scotland, UK.Google Scholar
 517.Friedhelm Schwenker and Edmondo Trentin. Pattern classification and clustering: A review of partially supervised learning approaches. Pattern Recognition Letters, 37:4–14, 2014.CrossRefGoogle Scholar
 557.K. Tiri and I. Verbauwhede. A logic level design methodology for a secure dpa resistant asic or fpga implementation. In Proceedings Design, Automation and Test in Europe Conference and Exhibition, volume 1, pages 246–251 Vol.1, Feb 2004.Google Scholar
 580.Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005.Google Scholar
 600.Yingxian Zheng, Yongbin Zhou, Zhenmei Yu, Chengyu Hu, and Hailong Zhang. How to Compare Selections of Points of Interest for SideChannel Distinguishers in Practice? In Lucas C. K. Hui, S. H. Qing, Elaine Shi, and S. M. Yiu, editors, ICICS 2014, Revised Selected Papers, pages 200–214, Cham, 2015. Springer International Publishing.Google Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.