INTRODUCTION

In the new century, manufacturing industries have started to undergo a vast transformation, fueled by digitalization, automation, and the tremendous amount of data collected during the manufacturing processes. Also referred to as the Fourth Industrial Revolution (Industry 4.0 (1) or Pharma 4.0 in the pharmaceutical industry (2, 3)), the evolving digital technology includes data-driven manufacturing and the vision of smart factories, where interconnected systems can communicate with each other and make autonomous decisions (4).

In the pharmaceutical industry, modernization is also promoted by the Quality by Design (QbD) (5) and the Process Analytical Technology (PAT) (6) frameworks. The QbD approach emphasizes the need for product and process understanding, i.e., the identification of the critical material attributes (CMAs) and critical process parameters (CPPs) that significantly influence the critical quality attributes (CQAs) of the product and process. This leads to establishing a design space, within which the quality is deemed acceptable. Consequently, the flexibility of production can be increased, as operating within a regulatory approved design space is not regarded as a change. The PAT initiative also aims the science- and risk-based production by emphasizing the need for the real-time measurements of the CQAs and CPPs with in-process sensors, coupled with the corresponding data analysis methods and control strategy. As the QbD and PAT principles are being implemented, the possibility of reducing the labor-intensive and time-consuming quality control test on the final products is also becoming a reality. Instead, the real-time release testing (RTRT) approach can be used, that is, the process and product understanding and the adequate real-time monitoring and control of the process can serve as the quality assurance (7). Consequently, the QbD, PAT, and RTRT concepts—along with the advancements in data processing and automation—are indispensable for the aimed agile and innovative manufacturing, and their implementation could eventually lead to smart and eventually autonomous pharmaceutical factories (3).

To realize the aims of Industry/Pharma 4.0, artificial intelligence (AI) and machine learning (ML) techniques have emerged as versatile tools (2, 8) to tackle several arising tasks, e.g., the analysis of big data (9) or the development of digital twins (the digital counterpart of a physical system) (10, 11). Artificial intelligence is mainly referred to as the computational methods that perform tasks typically associated with human-like thought processes, such as pattern recognition or decision-making. Within AI, machine learning techniques accomplish these tasks by learning from a provided dataset to produce a response without being explicitly programmed to do so (12). ML can adjust the model behavior to continuously improve its performance as the training dataset is expanding, which makes it especially suitable for data-driven manufacturing purposes. The medical regulatory agencies also show increasing openness for AI/ML approaches. For example, the Danish Medicines Agency has recently published a list of questions to consider for developing and applying ML-based models in GxP-regulated areas (13). Furthermore, the US Food and Drug Administration has issued an action plan for establishing a “Good Machine Learning Practice” (14). Although it deals with ML-based medical devices, the approach (e.g., how to evaluate robustness, bias, and real-world performance) could be potentially further generalized for other ML applications.

ML comprises several different mathematical approaches, such as artificial neural networks (ANNs), deep learning, support vector machines (SVM), and decision trees (12, 15). This work primarily focuses on ANNs, which have gained tremendous attention due to their flexibility in describing complex, linear, or non-linear relationships for different purposes, such as pattern recognition, regression, or time-series forecasting. ML techniques have already found several applications in the different stages of pharmaceutical research and development, such as for target selection, clinical trials, quantitative structure-activity relationships studies, or formulation optimization (16,17,18,19,20,21).

However, in preparation for smart manufacturing, the real-time applicability of ML in the pharmaceutical manufacturing processes also needs to be studied, which, to the best of the authors’ knowledge, has not been studied in detail in previous review papers yet. Therefore, this paper aims to explore the current state of ML techniques (mostly ANNs) within the PAT framework. The application of ML together with analytical sensor systems for process monitoring and control purposes is reviewed, considering the most common upstream and downstream manufacturing steps of small molecule active pharmaceutical ingredients (APIs) and solid pharmaceutical formulations (e.g., tablets or capsules). This overview also aims to identify potential research directions, future challenges, and risks associated with implementing ANNs within PAT. Consequently, this review could facilitate the development of smart pharmaceutical manufacturing approaches and aid the digitalization efforts of the pharmaceutical industry.

ARTIFICIAL NEURAL NETWORKS

The development of ANNs was inspired by the information processing behavior of the human brain, as the calculation is based on interconnected information processing units, i.e., artificial neurons (also called nodes or perceptrons), which receive inputs and convert them to desired outputs (Fig. 1a). This is achieved by first weighting and summarizing the inputs by an activation function and then calculating the outputs using a predefined transfer function (see Fig. 1b). In such a way, the information is passed through numerous neurons to produce the final output (Fig. 1c).

Fig. 1
figure 1

Representation of a human neuron, b a single artificial neuron, and c a multi-layered feedforward backpropagation (FF-BP) neural network

Depending on the purpose of the model, arbitrary NN topologies can be built, e.g., by varying the number of neurons, their connections to each other, and the applied transfer functions. In most applications, the nodes with the same tasks are organized into layers. A standard topology is the feedforward neural network (Fig. 1c)—or multilayer perceptron (MLP)—where the information passes through the network without any back loops to previous layers. The nodes in the input layer receive the information from the outside world and pass it to further nodes. The actual calculations happen in the hidden layer(s), which have no direct connection to the outside world but pass the information to the output nodes. The output layer, where the final transformations happen, provides the network results. Loops are also possible within the network, creating feedback, or recurrent neural networks (RNN), where the information also travels back to previous processing units, which exhibits a memory-like behavior. This is especially important for analyzing, e.g., sequential and time-series data. Several RNNs have been developed for different purposes, such as the Elman network, layer recurrent neural network (LRNN), non-linear autoregressive exogenous model (NARX), the long short-term memory (LSTM) NN (22).

According to the universal approximation theorem (23), a network with one hidden layer with a finite number of neurons can estimate any continuous function with arbitrary transfer functions. However, the utilization of multiple hidden layers (called a deep neural network if it contains at least three hidden layers) can also be necessary or more effective for complex tasks, e.g., for processing unstructured data. Most famous deep networks include, e.g., RNNs (22) and convolutional neural networks (CNN) (24). CNNs are mainly used in image analysis for segmentation, classification, and object detection. They contain several hidden layers for different purposes, such as convolutional layers for feature extraction, pooling layers for reducing the dimension, and fully connected layers for classification or making a prediction. As training CNNs from scratch can have vast computational and data demand, several pre-trained CNN architectures are available, such as the AlexNet, GoogLeNet, or ResNet model, consisting of 8, 22, and 152 layers, respectively. These models can be utilized for transfer learning, i.e., using them as the starting point (e.g., for feature extraction) to model a new problem and to tune the model to the required training data. For more information on deep learning, CNNs, and transfer learning, see (24, 25).

Besides the network topology, the nodes’ transfer functions significantly determine the model’s behavior. Some of the frequently used transfer functions are the linear (often used in the output neurons for function fitting), the log-sigmoid (provides output between 0 and 1, used for discrete and binary outputs, e.g., in pattern recognition problems), and tangent sigmoid function (resulting in an output from −1 to +1, often used for regression tasks). Another widely used transfer function for deep NNs is the rectified linear unit (ReLU), which does not change the input if it is positive while outputting zero otherwise. Radial basis function (RBF) might also be used, creating RBF networks, which have the advantages of good generalization and the ability to learn in real-time.

ANNs need to be trained for a given task, which means adjusting the weights and biases of the neurons. In supervised learning, this is done using a training dataset consisting of known input-output pairs. Backpropagation is a widely used iterative training approach. First, the weight and biases of the neurons are initialized, either randomly or by following an initialization technique (26) to speed up the learning. Next, an error (cost function), e.g., the mean absolute error (MAE), mean square error (MSE), and the sum of squared error (SSE), is calculated between the network’s output and the known target. Based on the obtained error, the training algorithm adjusts the weights and biases, controlled by either a fixed or adaptive learning rate: the lower the rate, a smaller corrective step is taken, which causes longer training but potentially more accurate result. Finally, the cost function is calculated again. One such iteration is called an epoch, and the process is iterated until a stopping criterion is reached.

The used training algorithms can significantly affect the training time and the performance of the ANN. For example, the gradient descent training algorithm (27) is slow, aiming to find the steepest descent of the error by calculating the first derivatives of the cost function. A widely used method is the Levenberg-Marquardt algorithm (28), the fastest for medium-sized networks but only applicable when the cost function is in the form of the sum of squares. Furthermore, it is sensitive to the weight initialization and prone to overfitting. Bayesian regularization (29, 30) can tackle these problems by expanding the cost function to minimize the linear combination of the squared errors and the sum of squared weights. Consequently, the effective number of the parameters used in the model for the given problem can also be optimized, leading to better generalization and robustness.

APPLICATION OF ANNs IN UPSTREAM PROCESSES

Synthesis

The synthesis of organic compounds is the first step in industrial pharmaceutical manufacturing to produce the API. The process monitoring and control can be utilized in continuous and batch processes to maintain a steady-state, determine endpoint, or optimize operating conditions. ANNs have found applications in the optimization process parameters to improve the outcome and efficacy of the reactions and describe non-linear relationships between spectroscopic data and the desired parameters through black-box multivariate modeling.

The effect of the process parameters (i.e., time, temperature, enzyme amount, molar ratio) on the yield of an enzymatic synthesis of betulinic acid ester could be described by a feedforward (FF) ANN, using 21 training experiments (31). As learning algorithms, quick propagation, incremental backpropagation (BP), batch BP, and Levenberg-Marquardt algorithm were compared, of which the quick propagation gave the best robustness.

Several studies have dealt with the optimization of synthesis using ANNs. For example, Valizadeh et al. applied a MLP (32) to optimize the preparation of glucosamine from chitin based on three inputs, i.e., the acid concentration, the acid solution to solid ratio, and the reaction time. The built network was compared to the results of genetic algorithm (GA) and particle swarm optimization methods, which were better in model fitting than the MLP model, but the ANN outperformed them during validation. Optimization of four two-component reactions was also performed by deep reinforcement learning, using RNN (33). The method iteratively found the optimal flow rate, voltage, and pressure of the microdroplet reactions, using fewer steps than other black-box optimization algorithms. ANN-based optimization could also be performed with the combination of computational fluid dynamics (CFD) model, where the CFD model was the source of the training data (34), where several parameters such as conversion, selectivity, and yield were maximized in butadiene synthesis.

RNN could replace a true plant model (35) or state-space model (36) in control algorithms. This could be beneficial to predict the process dynamics within a model predictive control (MPC) of a continuous pharmaceutical synthesis, as the computational demand is significantly decreased compared to the mechanistic models. Moreover, the ANN can greatly benefit from the data-rich environment of the PAT-supported manufacturing.

Although the above examples show that the API synthesis could greatly benefit from ANN modeling, the results are mainly based on historical instead of PAT data. There are a few examples in the literature where the evaluation of in-line or on-line PAT measurement by ANNs was presented. For example, in a fermentation, the glucose and the glucuronic acid concentration were determined from Fourier transform infrared (FT-IR) spectroscopic measurements (37) using a multilayer feedforward network with 15 calibration samples. The ANN outperformed the classical partial least squares (PLS) regression. Phenol and chlorophenols were also simultaneously quantified employing an ANN model from UV-Vis spectra, collected by an immersion probe (38). In this case, principal component scores were used to compress the spectra for the training of networks.

Crystallization

Crystallization is crucial in connecting the API synthesis and the downstream formulation steps by providing solid crystalline API, which greatly impacts the final product’s yield, purity, further manufacturability, and even bioavailability. PAT sensors are often used, such as ATR-IR, and UV probes can be used to monitor the solute concentration, and focused beam reflectance (FBRM) or in situ microscopic measurements (e.g., particle vision and measurement, PVM) can indicate the crystal size and count (39).

ML can estimate the crystallization outcome based on historical process data. For example, Velásco-Mejía et al. developed ANN and GA models based on the records of 54 industrial batch crystallizations (40). They used nine descriptors and modeled the crystal density as the outcome, which resulted in identifying the most critical parameters and, after optimization, a substantial improvement in the product. In another work, the design space of a cocrystallization process could be explored based on 25 experimental runs and four input variables (41). Using the operating variables (such as temperature, supersaturation, agitation speed, seeding properties) as ANN inputs, a more accurate crystal growth rate could also be predicted than with multiple non-linear regression (42).

ANNs have been used for extracting information from data-rich PAT tools, such as in-line microscopic images. A ResNet CNN has proved to be effective in classifying crystals detected in PVM images, which was used for contamination classification with >98 % accuracy (43). Such in-line technique can contribute to identifying traces of undesired polymorphs and, therefore, can be used in a feedback control to improve the product purity. Furthermore, the growth rate could also be predicted by measuring the particle size distribution using CNN-based in-line image analysis (44). FBRM measurements provide chord length distribution as particle size information, which, together with the solid concentration, could be used as input for a layer RNN (45) to calculate the crystal size distribution (CSD). Szilágyi and Nagy (46) demonstrated the opposite approach: a direct and fast transformation of two-dimensional CSD (needle-shaped crystals) to chord length distribution and aspect ratio distribution was achieved by a neural network. This was necessary for enabling FBRM and PVM to be used as quantitative direct feedback control tools in a population balance model (PBM)-based control, as the outcome of the PBMs and the analytical sensors are not comparable. The presented approach resulted in 6 times faster calculation than a direct conversion, which could be essential in real-time applications.

ANNs are also getting increasing attention in the control of crystallization. Possible approaches include the self-tuning of the PID controllers by ANN (e.g., by a diagonal RNN (47)) for temperature and level control or the determination of the optimal temperature profile to control the crystal size, e.g., to reduce fines. For the latter, Paengjuntuek et al. (48) generated data with a PBM for NN training and then predicted the solution concentration and crystal volume from the temperature and solution concentration data at the previous time points. The ANN was used as a state predictor in the optimization and provided better control performance than conventional methodologies. Furthermore, the trained ANNs have a much lower computational cost than a first-principles model; therefore, they have a great potential to be used in MPC as the predictive model. This has been demonstrated by simulation (49) and experimental studies, using different network types (e.g., feedforward, recurrent, RBF networks) and batch and fed-batch crystallizations (50, 51). However, Öner et al. (52) highlighted that mostly historical data were used for model development. In their study, a fully automated laboratory crystallization system has been developed, with temperature and FBRM sensors and using a RBF network. The training was accomplished real-time, using a reference batch and in-line collected data and an updated or a growing data strategy. That is, the network was updated as new experimental data was available. Despite the limited data, the control strategy was robust to various disturbances, such as solvent impurity, seed size, or impeller speed. This approach is applicable even when limited historical data or process understanding is available.

APPLICATION OF ANNs IN DOWNSTREAM PROCESSES

Powder Blending

The proper execution of powder blending primarily ensures the homogeneous distribution of components in manufacturing solid dosage forms. ML techniques have been applied a few times to assist the real-time analysis of API concentration during a blending process and predict the powders’ behavior in various scenarios.

Since the 2000s, it has been shown that predicting the API concentration of powders by ANNs based on near-infrared (NIR) spectra (53, 54) is as effective as the PLS regression. Besides, ANNs can predict the required time to achieve a homogeneous mixture. For example, Tewari et al. (55) utilized NIR spectroscopy, ANN, and other multivariate data analysis methods for at-line blending endpoint detection techniques.

Mujumdar et al. (56) created a discrete element method model of a sectorial container subjected to oscillations and then simulated the mixing of two particle fractions with different particle sizes under various operating parameters. The simulated results were used as a training dataset to create a FF-BP ANN model—which has much less computational demand—to predict the mean mixing concentration, a parameter describing the effectiveness of the mixing process based on the amplitude of oscillations, the frequency of oscillations, the particle size of the smaller and larger fraction, and the number of cycles. It was concluded that the ANN is an excellent choice when several operating parameters have a non-linear relationship. Furthermore, such techniques could be helpful in the future for control purposes. ANNs could also be applied to process data where the effects of certain factors appear after a time delay. For example, the composition of the blend that leaves the continuous blender could be predicted by an RNN (Fig. 2)—serving as the digital twin of the blender—based on the mass flow rate of the input material streams and the residence time distribution of the system (57). It was found that a non-linear autoregressive network with exogenous inputs can yield results comparable to that of a residence time distribution model.

Fig. 2
figure 2

Concentration prediction from time-series data by ANN in continuous blending

Granulation

Granulation is a particle enlargement technique that is often quintessential to ensure further processability, which can also greatly influence the quality (e.g., content uniformity and dissolution) of the final product. Granulation is implemented as a wet or dry technique, e.g., in high-shear, fluidized bed, roller compactor apparatus, or the continuous solution of twin-screw wet granulation (TSWG) is also emerging.

For more than 25 years, ANN models have been created to predict the quality of the product based on process parameters of fluidized bed (58,59,60,61), high-shear wet (62), and dry granulation (63, 64). For example, Kesavan et al. (65) modeled both high-shear and fluid granulation by ANNs to predict the particle size, flow rate, bulk density, and tap density. The inputs were the type and percentage of diluent, the type of granulation equipment, and the amount and addition method of the binder. The product CQAs could be predicted with good accuracy, and the ANN performed better than multilinear stepwise regression analysis. The process parameters have also been applied to predict the disintegration time of tablets compressed from the granules (66). Furthermore, the scale-up of wet and fluid bed granulation processes was also facilitated by ANNs (67,68,69). Korteby et al. (70) demonstrated with a fluid hot-melt granulation process that the relative importance of the independent input variables of the ANN model could be determined when combined with the Garson equation (71). They identified that the particle size of the binder had the highest impact on the properties of the final granules, followed by binder viscosity grade and binder content. In this way, the ANN combined the advantages of first-principles and data-driven modeling by providing information about the effect of factors, while its construction was significantly easier than that of a first-principles model. In the case of dry granulation, the granule size distribution obtained after milling (72), the ribbon friability (63), or the ribbon density (73) could also be predicted. Modeling the granule quality based on operating parameters by ANN was also possible for continuous granulation (74), when the d10, d50, and d90 values of granules were calculated based on the liquid to solid ratio, screw speed, screw configuration, and material throughput. It has been suggested (74, 75) that the ANN models could be applied for the MPC of the process. Furthermore, ANNs can be combined with other data processing techniques such as Kriging of finite volume scheme to create hybrid models which ideally combine the benefits of both methods (76, 77), and consequently, ANN can be integrated into more complex systems, too.

AI can also process the data yielded by real-time sensors used as PAT tools in the granulation process, e.g., to monitor the API content or the residual moisture content. Zhao et al. (78) measured the concentration of three APIs with an off-line NIR spectrometer in sugar-free Yangwei granules manufactured on a commercial-scale apparatus. The BP ANN and other ML methods yielded similar results to the PLS regression. Rantanen et al. (79) created PLS and ANN models to predict the moisture content of granules based on NIR spectra. They found that the ANN had more predictive power for independent test samples. Gupta et al. (80) used a NIR and a microwave spectrometer to record spectra of a ribbon leaving a roll compactor. The pretreated spectra were processed by PLS and ANN models to predict the API content, moisture content, and density of the ribbons, where ANN and PLS performed similarly except for the moisture content.

The flexibility of ANNs allows us to process any kind of signal effectively. For instance, thermocouples could also serve as PAT tools. Korteby et al. (81) placed three sets of thermocouples inside a conical fluidized bed granulator. They recorded the temperatures under different conditions, as the temperature distribution inside the granulator can influence the granule properties; thus, understanding its dependence on manufacturing parameters can contribute to the creation of a more reliable process. The obtained data were used to train an ANN, which provided very accurate predictions for the test cases, and such a model could be the basis of a real-time quality control scheme. The acoustic emission of the fluidized bed granulation could also be monitored, as demonstrated by Carter et al. (82). They placed piezoelectric microphones in different positions outside the apparatus and recorded their signal while intentionally blocking parts of the distributor plate. After extracting time and frequency domain feature vectors from the sound signal, an ANN was trained to recognize different blockage scenarios based on the emitted sound. According to this revelation, AI can facilitate novel applications of acoustic signals, possibly leading to the creation of powerful PAT tools.

Tableting, Coating

In most pharmaceutical manufacturing processes, tableting creates the individual units of the end product. Ensuring that each tablet the patient receives meets the strict quality requirements is essential. The advent of predictive modeling and PAT technologies offers great help in achieving this goal.

One of the first things to consider when developing a tableting process is how the compressed powder mixture behaves inside the tablet press. The flowability of the blend must be good enough that each time the die is filled, an almost identical mass of powder moves into it. Kachrimanis et al. (83) used an FF-BP ANN to predict the flow rate of various powders through a circular orifice. They used typical powder properties as input, such as bulk density, tapped density, particle diameter, aspect ratio, roundness, convexity, and true density. The obtained flow rate predictions were more accurate than those of the flow equation proposed by Jones and Pilpel. Powder properties can also be utilized to predict the compressibility of the material. CMAs, such as the type and particle size of diluent, the type of glidant, bulk density, Carr’s compressibility index, and parameters of Kawakita’s equation (84,85,86), were used with various ML algorithms based on the results of a design of experiments (DoE) consisting of 30–50 settings.

Capping, i.e., the premature detachment of the tablet’s top layers, means serious quality problem in further processing (e.g., film coating and packaging) and should be avoided. Therefore, Belič et al. (87) predicted the capping tendency with neural networks and fuzzy logic by accounting for the particle size of the tableted powder and tablet press settings. They concluded that the technique makes formulation development significantly more effective than traditional trial-and-error approaches.

As a dosage form is developed, large datasets are created that enable the fitting of design spaces within the QbD approach by using suitable mathematical tools. Zawbaa et al. (88) applied a combination of ANN with variable selection algorithms to find which manufacturing parameters have the strongest influence on tablets’ porosity and tensile strength. Results from the variable selection enabled the authors to identify that the compaction pressure was the dominant factor.

These studies showed that ANNs are suitable to determine the design space and predicting the processability of the powder and the quality of the tablets based on CMAs, but the tableting step is still lacking PAT-based ANN model applications. The CQA of the final tablets is influenced not only by the tableting step but also by the previous manufacturing techniques and the material attributes of the raw materials. Therefore, the works tackling the characterization of the final tablets are further detailed in the next section.

CHARACTERIZATION OF THE FINAL PRODUCTS

Content Uniformity, Assay

The content uniformity (CU) of the final products or intermediates is one of the most frequently studied CQAs that must fall within certain limits. Spectroscopic PAT tools are widely used to quantify the API content in solid dosage forms to reach these goals. However, linear quantitative methods are not always feasible for evaluating multivariate data. In these cases, ANNs may provide a solution to reach a validated CU method.

Traditionally, UV-Vis spectroscopy is used to analyze assays, and ANNs have been applied several times to improve the quantification of numerous APIs (89,90,91), even in minor amounts. However, it is a destructive technique which is not compatible with the PAT concept.

In contrast, vibrational spectroscopy, e.g., Raman and NIR spectroscopy, can be helpful as an in-line, nondestructive method for the characterization of solid samples. However, only one study has been found for the quantification of API by Raman spectroscopy and ANN (92), wherein commercial tablets and capsules containing diclofenac sodium were studied. PLS, principal component regression (PCR), and counter-propagation ANNs (CP-ANN) methods were compared, the latter combining unsupervised and supervised learning. While PCR yielded consistently higher errors, the PLS and CP-ANN showed comparable results for both tablets and capsules. A 2.6–3.5% and 1.4–1.7% relative standard error of validation was reached for tablets and capsules, respectively, and a good correlation with reference analysis was obtained for commercial formulations. NIR spectroscopy is a more widely used technique, despite the severe overlapping between the signals of the components. Several APIs have been studied by ANNs, such as paracetamol, caffeine, ciprofloxacin, aspirin, and phenacetin (53, 93, 94). Different variable selection techniques have also been tested for improving the quantification by NNs. For example, variable selection by orthogonal projection to latent structures (O-PLS), combined with ANN (95), genetic algorithms (GA-ANN) (96), or wavelet transformation (WT) (97), could be applied to increase model accuracy. WT could also be used for the dimensional reduction of the original spectra (98), which is an essential step in ANN building to decrease the computational demand of the training.

It can be concluded that for the API content determination in solid samples, ANNs mainly improved results compared to the linear multivariate methods, e.g., PLS regression with the same number of calibration samples. Another emerging application of ANN might be predicting the amount of analyte from process data, even without spectroscopic measurements, which is the realization of the RTRT concept. For example, it was possible to estimate the ascorbic acid concentration in nutraceutical products from physicochemical properties, namely, pH, specific gravity, and viscosity (99). In this case, the ANN, which served as the soft sensor, provided a regression coefficient of 0.92 for the quantification.

Tensile strength, Friability

The appropriate hardness is also a CQA of the tablets, impacting the further processability, e.g., the coating and the packaging, and is mainly characterized by the tensile strength (TS) or the friability (FR). However, these properties are not easily measurable by available PAT tools. Attempts have been made to monitor the TS by NIR spectroscopy, where, e.g., the change of the baseline can correlate with the tablet hardness, which could be easily turned into a real-time technique (100, 101). By creating the optimal WT-ANN architecture, the tablet hardness was approximated satisfactorily, exceeding the accuracy of the linear PLS regression model. In another study (102), the hardness of theophylline tablets was predicted similarly by PLS and ANN at the lowest set point, but ANN produced better results for harder tablets.

Another possible approach is modeling the TS and FR based on their CMAs and CPPs. An ANN network was shown by Bourquin (103), where the weight ratio of four ingredients, the dwell time, and the compression force were used as input, and the TS and FR were predicted as outputs. The predicted TS had a good correlation with the observed values (R2=0.753), but for the friability, the ANN model gave a very slight correlation (R2=0.413). In this case, the tendency of overfitting can be recognized, which could have been avoided by using a larger training set.

Similarly, an ensemble ANN was used to study the effect of the type and amount of the filler (e.g., microcrystalline cellulose, HPMC, crospovidone/PVP) and lubricant (magnesium stearate, sodium stearyl fumarate), with different APIs (104, 105). In (105), the crushing strength could be predicted with below 0.1 N error in the range of 30–60 N.

Furthermore, tablet properties and tableting process parameters could also be incorporated into the NNs, such as diameter, compression force, weight, height, porosity, speed of sound in the radial direction, and tablet compression speed (106). In (107), the type of polymers and their concentration were varied to predict the tablets’ tensile strength, the total work of compression, the detachment work, and the ejection work with six different ML algorithms, involving four ANN methods.

In Vitro Dissolution

In vitro dissolution testing is an important indicator of product quality and therefore plays a vital role in the research, development, and routine quality control of the drug products. The tests, however, need to be carried out in standardized instruments and are labor- and time-intensive methods. Consequently, it could greatly benefit from an RTRT approach, for which ANNs have also been studied.

However, most of the ANN studies connected to the prediction of the dissolution deal mostly with formulation optimization for the required dissolution properties. In this context, several different ANN structures have proved to be applicable, such as MLP, Elman networks, and RNNs (108,109,110,111,112), and several process parameters were modeled, such as the effect of retardation polymer in the tablets (108), the tableting compression force (108, 113), and the crushing strength (114).

PAT tools can be used for predicting the dissolution if the effect of the CMAs/CPPs on the dissolution is detectable in the PAT data. For example, NIR spectra with PLS regression predicted the dissolution where the variation of the moisture content (115), compression force (116), mixing shear forces (117), or tablet composition were the critical factors (118). Pawar et al. used at-line NIR spectroscopy in a continuous direct compression process, where the API content, compression force, feed frame speed, and blender speed simultaneously influenced the dissolution (117). The use of Raman chemical maps to non-destructively predict the dissolution has also been recently demonstrated (119). In this case, not only the chemical composition of the tablets, but the spatial distribution and CSD of the components could also be derived from a chemical map. However, to use it as a PAT technique, the speed of the chemical mapping still need to be further decreased.

Applying a single PAT tool might not always be sufficient. ANNs can aid the data fusion of different PAT sources and process data for a surrogate dissolution model. Our group demonstrated the merging Raman and NIR spectra of an extended-release tablet formulation by an ANN first. The data-fused ANN models outperformed both the PLS modeling results, as well as the models built by only using a single PAT sensor (120). ANNs could be developed not only using spectroscopic data but including additional process data in the ANN (Fig. 3), such as the registered compression force (121) and CSD data (122). Furthermore, SVM and an ensemble of regression trees were also tested, but ANNs provided the most accurate results. The concept can be generalized for arbitrary numbers and types of input data, which could significantly aid the implementation of predictive dissolution models in an RTRT framework.

Fig. 3
figure 3

Prediction of in vitro dissolution by neural network from PAT data

FUTURE PROSPECTS

By reviewing the existing ANN applications for the pharmaceutical manufacturing steps, we could identify two major groups of the works: the utilization of ANNs (1) for non-linear regression for the evaluation of analytical sensor data and (2) to establish a relationship between arbitrary input and output parameters. Table I summarizes the works where the developed models were based on PAT data or where the input could be directly collected during a process.

Table I Application of Neural Networks for (Potential) PAT Purposes in the Process Steps of Pharmaceutical Manufacturing

As for the first group, mainly UV and (N)IR spectra were applied, but Raman spectroscopy can also be identified as a rapidly emerging tool. ANNs were consistently recognized as comparable and often superior to traditional PLS regression. Comparable results are expected when no significant non-linear relationship exists between the inputs and outputs, while ANNs could be superior when there is strong non-linearity. ANNs might also provide inferior results. One of the possible reasons for this is the overfitting of the ANN model when there is not enough training data to adequately capture the input-output relationship. Furthermore, outlier training data can significantly diminish the predictive power of the model, as in this case, the network might be fitted to inadequate non-linear behavior. However, these problems can be eliminated by expanding the training dataset and applying outlier filtering techniques.

Therefore, it might be worth considering the application of ANNs for spectral data evaluation when PLS models cannot provide sufficient accuracy due to a possible non-linear effect. It is also worth noting that the development of ANNs did not require significantly more calibration data than for the PLS models, which is a common preconception about ANNs. Different types of ANNs have been used in this context with varying dimensional reduction methods, such as using principal component scores or wavelet transformation. However, to the best of the authors’ knowledge, these techniques have not been thoroughly compared yet. Consequently, their systematic evaluation could significantly contribute to establishing good modeling practices to facilitate the application of ANNs. It is also noticeable that deep learning with spectroscopic data has rarely been applied. However, in (123), Zhang et al. demonstrated by using four different IR datasets (corn, wheat, soil, and pharmaceutical tablets) that deep NNs provide improved accuracy compared to conventional quantitative analysis. The same conclusions were drawn with other deep NNs with agricultural and food IR data (124, 125).

Although spectroscopies are probably the most common PAT tools, ANNs could be successfully applied for other types of PAT sensors, such as acoustic emission (82), FBRM, image analysis (44, 45), or for fusing different PAT data (e.g., NIR and Raman) (120). The application of ML might be especially important for image analysis, for which deep learning has already proved its capacity, but mainly outside the pharmaceutical manufacturing (e.g., in self-driving cars). Therefore, it would be worth further investigating NNs for machine vision in the pharmaceutical industry (126), e.g., to identify faulty tablets automatically.

Most works have been found related to establishing relationships between CMAs, CPPs, and different CQAs of the individual process steps. These studies demonstrate the capability of ANNs utilized within the QbD framework to fit, e.g., design spaces, primarily using a limited number of (off-line) designed experiments (usually 30–70 experiments) or historical data. However, the most significant shortcoming of this approach is that it does not realize the PAT initiative. Nevertheless, possibly the biggest prospect of ANNs lies in incorporating this approach into the PAT concept by utilizing these models in real-time, using the actual material and process parameters and the in situ registered PAT measurements. Furthermore, as the previous sections demonstrate, the individual unit operations have already been explored by several researchers, but ANNs have rarely been incorporated into integrated processes, yet. Roggo et al. (75) reported on its first realization by examining a manufacturing line consisting of feeding, TSWG, fluid bed drying, sieving, and tableting. Seven CPPs were recorded with a frequency of 1 s for a total of 148.000 data points, which were used to predict eight different CQAs of intermediate and end products. The developed deep NNs (3 hidden layers) could learn from noisy PAT data and, consequently, be utilized for real-time control of continuous systems. Further studies are still needed in the future to examine the capabilities of machine learning in integrated manufacturing processes, which is the ultimate aim of commercializing the concept.

Digital Transformation

Following the Pharma 4.0 concept, digitalization is expected to spread significantly in the following years, as it can considerably improve the transparency, flexibility, efficiency, productivity, and quality of manufacturing (127). The authors of (128) from Novartis Global Drug Development—a leader in the digitalization of the pharmaceutical industry—have expressed that the historical operational data could be the goldmine to represent the pharmaceutical company’s experience. However, this information is currently greatly fragmented, inconsistent, and time-consuming to reach. Digitalization platforms, such as the “Nerve Live” platform of Novartis (128), could help collect, clean, and analyze this goldmine. For example, centralized, easily accessible databases (data lakes) of raw materials’ attributes, process parameters of each unit operation and the different PAT measurements could be obtained for the manufacturing process, as illustrated in Fig. 4.

Fig. 4
figure 4

Artificial intelligence models for PAT in the Pharma 4.0 concept

Digitalization imposes multiple challenges on pharmaceutical companies and initializes changes at the business, operational, and technological levels (129). First of all, the role of data scientists and informational technology (IT) personnel significantly grows as new competencies and resources are required. For example, it is necessary to build cross-functional teams, cybersecurity needs to be addressed, and in the long term, standardization will be essential to assure compatibility (127). Multiple handbook chapters deal with the digital transformation of laboratories (e.g., analytical, research, solid-state labs) (130, 131), providing an information knowledge base to the central concepts and providing guidance for their practical realization. For example, information management tools, e.g., the Electronic Laboratory Notebook (ELN), Laboratory Information Management System (LIMS), and Enterprise Resource Planning (ERP), are introduced, and principles of cybersecurity, communication protocols, data and modeling technologies, reporting, and creating FAIR (findable, accessible, interoperable, reusable) data are discussed.

Most of these concepts are extensible to the operation of the labs and manufacturing sites of pharmaceutical companies, too. In (132), a complex IT infrastructure is proposed explicitly for pharmaceutical development, involving data management tools, knowledge modeling, and information sharing guidelines to aid the managing and interpreting different sources of complex information. The industrial realization of such an IT system is the “Nerve Live” platform of Novartis (128) or the commercially available cloud-based, open IoT operating system of Siemens, called MindSphere (129). MindSphere integrates all data sources and information management tools, connects different types of equipment in a cloud platform, and provides the highest level of data protection and storage, which is crucial for pharmaceutical companies.

In all publications related to digitalization, the central role of AI and ML is recognized to analyze the data lake, AI solutions standing above the whole hierarchy of data (Fig. 4.), having access to each data management level (130). In this way, NN models can be built to monitor CQAs of processes, develop digital twins, and realize model-predictive control for intelligent decisions, self-optimization, predictive maintenance, or making business decisions. Several software solutions are aiming to realize this, e.g., the Siemens Predictive Analysis (SiePA) tool is readily available to optimize a process based on historical data, analyze failure modes, and realize predictive maintenance (133). Despite, based on the published scientific papers, we could conclude that the application of ANN with real-time collected pharmaceutical manufacturing data and integrated multiple process steps is still scarce. However, it is worth noting that implementing such platforms means a substantial competitive advantage for pharmaceutical companies, which might cause the publication of these results to be hindered.

Nevertheless, we believe that further academic and industrial research would significantly facilitate the widespread realization of digital transformation and autonomous smart factories. For example, the adaptation and real-time training capabilities of NNs for continually increasing data should be further studied, as well as the role of time-series ANNs could be investigated in more detail. Furthermore, it is vital that an AI black-box model should not replace and diminish the scientific, chemical understanding of the research, development, and manufacturing process of drug products, which could be achieved by integrating physical-chemical process knowledge into the automatized platform, e.g., as a form of hybrid mathematical models.

CONCLUSIONS

ML techniques, such as ANNs, have emerged as one of the essential data analysis tools for processing big data and realizing the Industry/Pharma 4.0 concepts. To assess the readiness of pharmaceutical manufacturing for this, this paper aimed to review the application of ANNs in the context of PAT. It can be concluded that ANNs have already been tested for several purposes in the most common manufacturing steps, but their real-time application for PAT is still scarce. The possible future directions and research gaps have also been identified. In this way, ANNs could significantly contribute to realizing smart, autonomous pharmaceutical manufacturing lines in the future. This can help the faster, cost-effective production or the reduction of waste, i.e., reducing the environmental load, and the automatized systems could minimize human exposure to dangerous processes or drug substances, e.g., hormones or cytostatics.