Introduction

Various geometries of artificial pinning centers with vortex lattice have been investigated to observe the behavior of vortex motion and pinning interaction (Baert 1995; Cuppens et al. 2011; He et al. 2012; Jaccard et al. 1998; Kamran et al. 2015; de Lara et al. 2010; Latimer et al. 2012; Martin et al. 1999, 1997). All these studies have resulted in improving the current carrying properties of superconducting materials for use applications. The critical current density is one of the most important properties of superconductors. The superconducting film with an array of artificial pinning centers by nanolithography techniques can enhance the current density. If each defect accommodates one flux, then the current density increases and resistance decreases, but as the vortex lattice becomes disorder, the current density decreases and resistance increases. In this way, the current voltage (IV) characteristics or curves of the pinning centers on a superconducting film show abrupt changes. Such abrupt changes give the IV curves measurement, also called the transport measurement, a substantial prominence in the field of superconductivity.

During our experimental work, it has been recognized that transport measurements are notorious and cumbersome to measure, especially when they are repeatedly needed (Kamran et al. 2016). We believe that an approximation model, such as one based on artificial neural networks (ANN) proposed in this work, can help in evading the repeated measurements, and rather extrapolate the curves using their smaller subset for unforeseen parameters. The proposed methodology not just relieves the researchers from this tedious procedure—saving them time, cost, and energy, but is applicable to various geometries of antidots—giving this study a significant importance.

ANN are said to be mathematical equivalents of a human brain that tend to learn in one situation, and repeat in another (Guojin et al. 2007). They are supposed to recognize patterns, and establish relationships between independent variables of a smaller subset of values. These relationships are then used in solving problems that need approximation and prediction for a larger sample space having unforeseen values, where the ANN are typically found extremely useful (Cybenko 1989). ANN may simply be described as a network of interconnected nodes, each having some weight and bias—called network coefficients. The purpose of such a network is to provide mapping between inputs and outputs, where the latter are obtained after a successful learning phase. The process of learning comprises training and validation. In the training phase, several pairs of inputs and outputs are iteratively provided to the network to enable the ANN to establish a mathematical relationship. The coefficients are updated in each iteration until the network converges to an optimal solution. The manner, in which these coefficients are updated, distinguishes one training algorithm from the rest (Hornik 1991, 1993; Hornik et al. 1989).

While the ANN have been widely ratified in engineering applications (Elminir et al. 2007; Ghanbari et al. 2009; Reyen et al. 2008), their use in the field of material science, especially for the purpose of modeling electrical properties, is inordinately constrained (Haider et al. 2017; Kamran et al. 2016; Quan et al. 2016; Zhao et al. 2016). In this work, we explore the prediction of IV curves of a superconducting film with various geometries of antidots using three commonly used ANN architectures, trained by various learning algorithms. The predicted results are then compared with the actual measurements; this step is termed validation. Although slightly different from each other, our approximated results from each training algorithm achieve high accuracy—smaller mean-squared error (MSE). Besides describing the approximation methodology, this work is intended to present a comparison between three ANN architectures, and three training algorithms in terms of prediction accuracy (given in MSE), training time, and number of iterations taken to converge to an optimal solution.

The rest of the article is organized as follows: Section “Physical measurement system and readings” presents detail of the experimental setup to obtain the transport measurements, and a brief commentary on their characteristics. In Section “Artificial neural network model”, we present the approximation methodology based on ANN, followed by the results and discussions in Section “Research methodology and simulation results”. We conclude the paper in Section “Conclusion”.

Physical measurement system and readings

Our experimental setup for transport measurements primarily comprises a physical properties measurement system (PPMS) from quantum design. An Nb film of 16 nm thickness is deposited on an \(S_i0_2\) substrate for obtaining the desired arrays of antidots. The micro bridges and nanostructured array of antidots are, respectively, fabricated by photo- and e-beam lithography techniques on a resistive layer of polymethyl metacrystalline (Mostako and Alika 2012; Shahid et al. 2016). Fabrication of the microbridges and array is followed by scanning of the samples to the desired antidots using scanning electron microscopy (SEM). We subsequently mount our sample in the PPMS for transport measurements, which are carried out by four-probe method in PPMS with temperature fluctuation within ±3 mK, and the external magnetic field applied perpendicular to the plane of the film. We have also swept the field from −8 to 8 mA at constant temperature. During the entire measurement process, which is always carried out in high vacuum, a small amount of liquid helium is placed inside the chamber to prevent overheating.

Fig. 1
figure 1

SEM of rectangular (top left), square (top right), honeycomb (bottom left), and kagome (bottom right) arrays

Figure 1 presents SEM of the four geometries that we investigate in this work, and Fig. 2 corresponds to their respective IV curves measured at different values of temperature. The top left, top right, bottom left, and bottom right in both figures, respectively, correspond to the rectangular, square, honeycomb, and kagome arrays of antidots. These curves may be divided into three regions according to their slopes: in the first region, the voltage is almost zero for gradually increasing current, followed by a sudden jump in the voltage in the second region, and finally, in the third region, there exists a linear relationship between the two variables. Two important observations that may be conveniently made from these figures are:

  1. 1.

    The IV curves show a sudden jump at critical current (\(I_c\)) in the second region, which resembles Shapiro steps. These steps usually appear when the interstitial vortex lattice is formed, and due to high vortex velocities, instability may occur; as a result of which the system shows a step.

  2. 2.

    The sharpness in the curves significantly varies for each geometry: those having larger interstitial area may accommodate a larger number of vortices, leading to increased energy conservation in those geometries. Therefore, the honeycomb and kagome arrays will exhibit flatter or smoother curves in comparison with sharp steps for rectangular and square arrays of antidots.

Fig. 2
figure 2

Transport measurements at various temperatures and zero magnetic flux

After successively performing the transport measurements, we had obtained a three-dimensional (H, T, I) data set comprising \([4\times 4\times 1600]\) values for each film having a different geometry. Note that we have taken out one curve from this data set for each geometry, and kept it isolated from the entire ANN modeling process. These four curves should be used for cross-checking our approach on, so to say, unforeseen data values, once the system had been completely designed using the MATLAB’s ANN toolbox on the modified data set (the one excluding the curves extracted for cross-checking). The toolbox, by default, divides the provided data set into three: the first subset is used for the training purpose, 50% of the remaining values is used for validation, while the rest is strictly kept isolated, which is then used for the testing purposes. The modified data set values were still copious enough to give us confidence in allocating a large data set exclusively for the training purpose. However, while performing the simulations, we realized that increasing the size of training set beyond 70% would not give us a considerable advantage in terms of prediction accuracy. MATLAB’s ANN toolbox also uses seventy percent of the data values for the training purpose by default, which further justifies our selection of training and testing data sets. In the next section, we elaborate on the ANN’s operation principle, training algorithms, and architectures used in this work.

Artificial neural network model

The structure and operation of a neuron—the basic building block of ANN—have been described on several occasions (Guclu et al. 2015). Briefly, it is a cascaded design, from an input layer to an output layer, where functional blocks are sandwiched either between input and hidden, or hidden and output layers, where each layer may comprise a specific number of neurons. It is believed that the number of hidden layers in a network is directly proportional to the prediction accuracy, i.e., the greater the number of hidden layers, the more accurate the results will be at the expense of complexity. However, problems similar to the one addressed in this work merely require up to two hidden layers for an acceptable accuracy level in most cases (Setti et al. 2014). The mapping between input and output layers, achieved through a few hidden layers, may follow one of the three most widely adopted architectures: feedforward, cascaded, and layer-recurrent neural nets. While the difference between the three architectures will be highlighted later, in what follows we make use of the simplest one, feedforward, just to describe the operation of ANN. Figure 3 depicts a fully connected feed-forward neural net with a single hidden layer (\(\delta _\mathrm{H}\)). R number of inputs are connected to input layer (\(\delta _\mathrm{I}\)) with S number of neurons, whereas outputs generated by input layer acts as a source for the hidden layer, having T number of neurons. Here, \(\delta \) is termed activation or threshold function, which in a way quantize the output of the network. The most commonly used threshold functions are step, linear, and tan-sigmoid (hyperbolic tangent sigmoid):

$$\begin{aligned} Y_k = \delta \left( \sum _{j=0}^{M}w_{kl}^{y} \delta _\mathrm{H}\left( \sum _{j=0}^{M}w_{ij}^{H}P_R\right) \right) . \end{aligned}$$
(1)
Fig. 3
figure 3

Structure of fully connected feed-forward neural network with a single hidden layer

Selection of the number of hidden layers and the number of neurons per layer are a critical process, having a high impact on the systems stability. Most of the available ANN training algorithms utilize MSE (difference between the expected and observed responses) as their objective function:

$$\begin{aligned} \phi =\frac{1}{2} \sum _{k=1}^{M}(y_k - d_k)=\frac{1}{2} \sum _{k=1}^{M}e_{k}^{2} \end{aligned}$$
(2)

where \(y_k\) is the kth output value calculated by \(d_k\), and it represents the expected value. To the best of our knowledge, all the ANN architectures follow backpropagation technique to minimize their objective function—from the basic feed-forward neural network to those widely adopted architectures, such as convolutional neural network (CNN), all the training algorithms back propagate their error in the form of sensitivities from the output to input layer.

ANN architectures

Feedforward

The simplest architecture that an ANN model may follow is termed a feed-forward neural net, in which each layer is only connected to its immediate neighbors. The mapping from input through the hidden to the output layers, therefore, is achieved in a serial and linear manner.

Cascaded

Unlike the feed-forward nets, the output layer in a cascaded network is not only connected to its immediate predecessor (hidden) layer, but also has a direct connection to the input layer. This allows the network to exploit the initial weights in addition to those provided by the predecessor to facilitate optimal convergence at the cost of added complexity.

Layer-recurrent

Unlike the feed-forward nets, the layer-recurrent nets are nonlinear, i.e., they have a feedback loop in the hidden layers with additional tap delays. The latter especially prove helpful in analyzing time-series data, where the network is supposed to have a dynamic response.

Parts (a), (b), and (c) in Fig. 4 depict feed-forward, cascaded, and layer-recurrent neural nets, respectively, where circles in (c) depict the additional tap delays.

Fig. 4
figure 4

ANN architectures: a feedforward, b cascaded, and c layer recurrent

Benchmark backpropagation algorithms

The backpropagation technique is an iterative method, which works in conjunction with gradient descent algorithm (Reed et al. 1993). While in each iteration, the network coefficients are updated, the method continues to compute gradient of the cost function accordingly. The objective of this iterative method is to minimize the cost function in terms of MSE:

$$\begin{aligned} \nabla \phi (w)=\frac{\partial \phi (w)}{\partial w_j}=0 \ \ \ \ \forall \ j \end{aligned}$$
(3)

update rule:

$$\begin{aligned} w(k+1)=w(k)+ \nabla w(k) \end{aligned}$$

where

$$\begin{aligned} \nabla w(k)= -\alpha \frac{\partial \phi (k)}{\partial w(k)} \end{aligned}$$

where \(\alpha \) is the learning parameter.

Because of its broad range of contributions, several variants of backpropagation algorithm have been proposed. Although each variant has its own pros and cons, in what follows, we discuss only the ones that have proven their efficiency for the problems as such addressed in the proposed work (Haider et al. 2017).

Levenberg Marquardt framework

The Levenberg Marquardt (LM) algorithm is a pseudo-second-order training algorithm, which works in conjunction with the steepest descent method. It has been reported that this algorithm promises better stability and convergence speed (Levenberg et al. 1944).

Let us consider the output response of feedforward neural network, calculated using Eq. 1, where initial output response is given as \(y_0 = r_k\). The network error is calculated using Eq. 2.

The network sensitivities are backpropagated through the network to update the learning rules (Demuth et al. 2014). Derived from the Newton algorithm and steepest descent method, the update rule for LM algorithm is defined as

$$\begin{aligned} \Delta W = \left( J_{we}^TJ_{we} + \delta _{r} I \right) ^{-1} J_{we}^T e \end{aligned}$$
(4)

or the above equation can be written as

$$\begin{aligned} \Delta x_k = -\left[ J_{we}^T(x_k)J_{we}(x_k) + \delta _{r} I \right] ^{-1} J_{we}^T(x_k)v(x_k) \end{aligned}$$
(5)

where \(J_{we}\) has dimensions (\(P~\times ~Q~\times ~R\)) and error vector is the matrix of dimensions (\(P~\times ~Q~\times ~1\)). The Jacobian matrix is defined using relation:

$$\begin{aligned} J_{we} = \begin{bmatrix} \frac{\partial e_{11}}{\partial w_1}&\quad \frac{\partial e_{11}}{\partial w_2}&\quad \ldots&\frac{\partial e_{11}}{\partial w_R}&\quad \frac{\partial e_{11}}{\partial b_1}\\ \frac{\partial e_{12}}{\partial w_1}&\quad \frac{\partial e_{12}}{\partial w_2}&\quad \ldots&\quad \frac{\partial e_{12}}{\partial w_R}&\quad \frac{\partial e_{12}}{\partial b_1}\\ \ldots&\quad \ldots&\quad \ldots&\quad \ldots&\quad \ldots \\ \frac{\partial e_{1Q}}{\partial w_1}&\quad \frac{\partial e_{1Q}}{\partial w_2}&\quad \ldots&\quad \frac{\partial e_{1Q}}{\partial w_R}&\quad \frac{\partial e_{1Q}}{\partial b_1}\\ \ldots&\quad \ldots&\quad \ldots&\quad \ldots&\quad \ldots \\ \frac{\partial e_{P2}}{\partial w_1}&\quad \frac{\partial e_{P2}}{\partial w_2}&\quad \ldots&\quad \frac{\partial e_{P2}}{\partial w_R}&\quad \frac{\partial e_{12}}{\partial b_1}\\ \frac{\partial e_{P2}}{\partial w_1}&\quad \frac{\partial e_{P2}}{\partial w_2}&\quad \ldots&\quad \frac{\partial e_{P2}}{\partial w_R}&\quad \frac{\partial e_{P2}}{\partial b_1}\\ \ldots&\quad \ldots&\quad \ldots&\quad \ldots&\quad \ldots \\ \frac{\partial e_{PQ}}{\partial w_1}&\quad \frac{\partial e_{PQ}}{\partial w_2}&\quad \ldots&\quad \frac{\partial e_{PQ}}{\partial w_R}&\quad \frac{\partial e_{PO}}{\partial b_1}\\ \end{bmatrix} \end{aligned}$$
(6)

where P is the number of training patterns with Q outputs, R is the number of weights and elements in error vector, and e is calculated using Eq. 2. Conventionally, Jacobian matrix J is initially calculated and later computations are performed on stored values for weights and biases updation. With fewer patterns, this method works smoothly and efficiently while with large sized patterns, calculation of Jacobian matrix faces memory limitations. This concludes LM algorithm performance degraded with larger training patterns.

figure a

Conjugate gradient

The Conjugate Gradient (CG) algorithm is known for its fast convergence rate, and has been employed for solving spare linear equations on numerous occasions. Few of its variants include Scaled CG (SCG) and Fletcher–Powell CG (CGF) (Naqvi et al. 2016; Johansson et al. 1991; Powell 1977).

Let us consider set of input vectors \({r_k}\), which is mutually conjugate with respect to positive definite Hessian matrix \(H_{wb}\), according to condition:

$$\begin{aligned} r_{k}^{T} H_{wb} r_{k} = 0. \end{aligned}$$
(7)

The quadratic function is minimized by searching along the eigenvectors of the Hessian matrix \(H_{wb}\). For the give iterative function

$$\begin{aligned} \bigtriangledown F(w) - H_{wb}r_k + \varpi \end{aligned}$$
(8)
$$\begin{aligned} \bigtriangledown ^2 F(w) = H_{wb}. \end{aligned}$$
(9)

For the iteration \(k+1\), the change in gradient can be calculated from equation:

$$\begin{aligned} \triangle U_k = U_{k+1} - U_k = (H_{wb}r_{k+1} + \varpi ) - (H_{wb}r_k + \varpi ) = H_{wb}\triangle r_k \end{aligned}$$
(10)

where

$$\begin{aligned} \triangle r_k = (r_{k+1}- r_k) = \delta _R^k r_k \end{aligned}$$
(11)

where \(\delta _R\) is a learning rate and is selected to minimize function F(w) in the direction of \(r_k\). The first search direction is arbitrary:

$$\begin{aligned} r_0= -U_0 \end{aligned}$$
(12)

where

$$\begin{aligned} U_k\equiv \bigtriangledown F(w)|_{w=w_0}. \end{aligned}$$
(13)

The Gram–Schmidt Orthogonalization (Messaoudi 1996) is used to construct \(r_k\) for each iteration, orthogonal to \( \{\triangle U_0, \triangle U_1, \ldots , \triangle U_{k-1}\}\) as

$$\begin{aligned} r_k = -U_k + \beta _k r_{k-1} \end{aligned}$$
(14)

where \(\beta _k\) is scalar and given as

$$\begin{aligned} \beta _k = \frac{\triangle U_{k-1}^{T} U_k}{U_{k-1}^{T} U_{k-1}} \end{aligned}$$
(15)

and \(\delta _{R}\) can be calculated using relation:

$$\begin{aligned} \delta _R^k = \frac{-U_{k}^{T} U_k}{r_{k}^{T}H_{wb} r_k}. \end{aligned}$$
(16)
figure b

Bayesian regularization

The Bayesian Regularization (BR) algorithm makes use of LM algorithm (hence not so different) to search for the minima from Hessian matrix of a given function. Refer to MacKay (1992) for the detailed description.

Research methodology and simulation results

Research methodology

Figure 5 concisely presents our data collection and analysis methodology. We have used the PPMS for obtaining a large set of IV curves. We divide the set into two: (1) training and testing samples and (2) sample for cross-checking, and use the first subset to train our 90 ANN models (3 architectures \(\times \) 10 configurations \(\times \) 3 training algorithms), where each configuration refers to a different number of neurons in the hidden layers. Once trained, the system provides us the training time and epochs taken by each ANN model to converge; we record these values. Following the successful training, validation, and testing phases (combined called ANN Learning), we predict the response for the second sample, and record the MSE between the predicted and measured responses. The same process is repeated for ten configurations.

Fig. 5
figure 5

Adopted research methodology

Objective statement

Let \(\Omega \subset \mathbb {R}^l\), where \(l \rightarrow \) { \(R \times D\) }, such that \(R > D\), be a bounded experimental data. Let \(\phi = \phi (i) | i \in \Omega \;\) be selected features set, \(\phi _1(i), \ldots , \phi _n(i) \subset \phi \) are n features associated with training and testing of ANN’s network to predict output vector \(\tilde{\phi }_{pred}\). Formally, \(\phi \) is mapped to \(\tilde{\phi }_{pred}\): \(\phi \rightarrow \tilde{\phi }_{pred}\). The output vector \(\tilde{\phi }_{pred}\) is the predicted version of \(\phi \) in terms of

$$\begin{aligned} \left\{ \tilde{\phi }_{pred} \overset{{{}}{\Delta }}{=} \left( \delta _{WB}^k, \delta _{NN}^2, \delta _R, \delta _S\right) \in \{-1 : 1 \}, \left( \delta _{\tiny {WB}}^{L}, \delta ^{C}_{WB}, \delta ^{B}_{WB}\right) \in \mathbb {R}^l \right\} \end{aligned}$$

where the input parameters, \(\delta _{WB}^k\) is a vector of randomly initialized weights and biases, \(\delta _{NN}^2\) represents two hidden layers with different combinations of neurons, and \(\delta _R\) and \(\delta _s\) are the learning rate and step size, respectively. The output parameters comprised of optimized weights and biases vector from three different training algorithms of \(\delta _{WB}^{L}\) (LM algorithm), \(\delta _{WB}^{L}\) (CG algorithm) and \(\delta _{WB}^{B}\) (BR algorithm). Performance parameters are selected to be MSE, number of epochs, and training time. The cost function is selected to be MSE and calculated using Eq. 2

Simulation results

The entire purpose of this research work is to highlight the ability of the proposed approach to predict the IV curves for different values of temperature and magnetic flux, which were not available while training the ANN model. As already stated in “Physical measurement system and readings” section, we kept four IV curves (one for each geometry of antidots) isolated from the modeling data, which we should use for cross-checking of the proposed approach on unforeseen data. Table 1 presents those IV curves. Figure 6 presents a plot of the predicted values against the physically measured ones. The thicker curves, for example, for the square array (shown in the top right corner), represents larger number of data points available in the transport measurements. The reason for choosing a different number of data points for each geometry was twofold. First, we deliberately wanted to evaluate performance of each training algorithm in the absence of sufficiently large data sets, and thereby estimate the minimum data points needed for an acceptable MSE. Second, we wanted to showcase the applicability of our approach on various geometries with a varying number of data points in the transport measurements. It may be clearly observed that prediction for each geometry results in a negligible error.

Table 1 Curves used for comparison
Fig. 6
figure 6

Predicted IV curves: rectangular (top left), square (top right), honeycomb (bottom left), and kagome (bottom right)

Figures 7, 8, 9, respectively, present MSE, number of iterations to converge, and training time by each of the training algorithms. Note that the horizontal axis in each figure corresponds to different ANN models; each having a different number of neurons in the hidden layer; we call this network configuration. In essence, each plot corresponds to a different geometry of antidots, trained by three benchmark algorithms for thirty different configurations.

Considering the fact that training an ANN model is a stochastic process—immensely relying upon random numbers—it is difficult to ascertain the reason behind diversity, especially sharp peaks, in each result. Therefore, it is not possible to advocate the use of one algorithm for all the geometries and architectures. Instead, let us comment on the obtained results in a case-by-case manner. BR algorithm outperforms the other two in terms of MSE for the square and honeycomb arrays, which had more data points than the remaining geometries: rectangular and kagome. However, this happens only at the cost of increased training time and number of iterations to converge. The increased MSE in case of the latter two geometries by BR reflects its impotence in approximating curves with a dearth of data points. For these two geometries, LM appears to be more promising, except for a few random peaks, see Fig. 7.

Fig. 7
figure 7

MSE: rectangular (top left), square (top right), honeycomb (bottom left), and kagome (bottom right)

LM and CGF prove to be a better option if fast convergence, both in terms of number of iterations and training time, is decisive. This is evident in Figs. 8 and 9. It is interesting to note that CGF, in contrast to BR, takes a large number of iterations to converge for square and honeycomb arrays; those having large data sets, and its training time is minimal with geometries having smaller data sets. This advocates its usage in systems requiring real-time approximation, where accuracy could be slightly compromised. However, for the application presented in this work, CGF is not the best available option. On the other hand, LM stands between the other two alternative algorithms, both in terms of prediction accuracy and convergence rate. While it has better approximation accuracy for smaller data sets, it tends to converge faster for the geometries having large number of data points.

Fig. 8
figure 8

Epochs: rectangular (top left), square (top right), honeycomb (bottom left), and kagome (bottom right)

Fig. 9
figure 9

Time: rectangular (top left), square (top right), honeycomb (bottom left), and kagome (bottom right)

Table 2 presents the best results, in terms of minimum MSE, epochs, and training time, obtained from the prediction process for each geometry. Note that number (No.) of neurons, expressed as [x, y], represents x and y neurons in the first and second layers of each neuron, respectively. The table should be interpreted as follows: for the rectangular array, the layer-recurrent architecture having eleven and five neurons in the hidden layers, once trained with LM algorithm, achieves the MSE of \(3.3\times 10^{-7}\), which is better than any other pair or architecture and algorithm. Similarly, for the same geometry, BR converges in the least number of iterations of 11 with the layer-recurrent architecture, having 18 and 10 neurons in the hidden layer, while CGF trains the cascaded network with [5, 2] neurons in just 0.093 s, which is faster than all other options. It is evident that BR, if provided with a sufficiently large data set, can outperform the rest of the algorithms in terms of MSE, whereas LM and CGF can be good options for minimum training time and epochs, even in the absence of large data sets. For the purpose of predicting IV curves in superconducting films, this work will be used as benchmark, since it points out the best pairs of architecture and algorithm for the most widely adopted assessment parameters, namely: MSE, epochs, and training time.

Table 2 Best results in terms of MSE, epochs, and training time

Conclusion

Motivated by the experience that transport measurements in superconducting films are notorious and cumbersome to obtain, a predictive model, based on artificial neural networks, has been proposed. The model takes a finite amount of data points, for each of the four geometries of antidots including rectangular, square, honeycomb, and kagome, and extrapolates the curves for a wide range of unforeseen values of temperature and magnetic flux. We have assessed three different architectures of artificial neural networks, trained by three renowned training algorithms for the purpose of predicting these current–voltage curves. Our assessment is based on mean-squared error, number of iterations to converge, and training time. Our simulations have pointed out the attributes of each architecture and algorithm, which should help all the followup works to make a choice between the available options as desired—giving this study a significant importance in the field of computational physics.