Flow chemistry is increasing in popularity as synthetic chemists continue to discover the numerous advantages afforded to them by swapping their round-bottom flasks and condensers for pumps and tubes [1]. The rate of adoption of continuous flow chemistry is continuing to grow, as more enabling technologies are developed and more research groups are building their own reactor platforms [2,3,4]. Whilst there are many instances of harnessing the capabilities of flow chemistry for synthesis over traditional batch methods [5,6,7], there are also groups that have used these setups to optimise chemical processes. Flow optimisation by Design of Experiments (DoE) is a very useful and efficient method, where examples have been reported with varying experimental criteria, whether this be yield, purity, E-factor etc [8,9,10,11,12]. DoE as a technique can be used for a number of applications aside from chemical reactions, and has been reported often in the pedagogical literature [13,14,15,16]. The teaching of this technique provides students with a statistical basis for their experimentation, with the aim of solidifying transferrable skills that ultimately lead them to becoming well-rounded scientists. However, this optimisation resource is still under-utilised in a lab setting, with one-factor-at-a-time (OFAT) optimisation approaches often substituting as a method for chemical process optimisation and understanding [17,18,19]. The aim of this paper is to provide a flow-chemistry-specific optimisation case study, where students can learn DoE and statistics by performing a continuous flow based practical experiment, as well as experiencing and overcoming challenges that they would not usually encounter if they were running a synthetic experiment in batch.

The OFAT approach is a common method of optimisation, especially in academia, in which experiments guided by scientific intuition are performed, by fixing all process factors except for one [20]. These factors are experimental conditions (such as temperature, reagent stoichiometry, reaction time etc.) which when combined, make a multi-dimensional space where there are a large number of possible combinations of these factors to make up one experiment - this is called the parameter space, and is constrained by the lower and upper limits of each factor (e.g. max and min temperature). After the best value for one factor has been optimised, another set of experiments are executed to then optimise another factor, until all factors are optimised and the scientist believes that they have arrived at the optimal reaction conditions [21, 22]. However, this method gives an incomplete picture of the chemical process, as it disregards any synergistic effects between any factors in the multi-dimensional and complex parameter space. This means that interactions between the experimental factors are not considered. An example of this could be the difference between a reaction at a high temperature with a short reaction time, compared with a long reaction time - there may not be a linear relationship between these factors and the desired output, meaning that a change in temperature after the optimisation of the reaction time could lead to a suboptimal result [17, 23, 24].

As research laboratories are diversifying their equipment, by incorporating flow and automation technologies, it is also necessary for chemists to evolve at the same pace, by diversifying their skillsets to fully harness the capabilities and ways of working enabled by this new equipment. Synthetic chemists are embracing facets of process chemistry, chemical engineering, analytical chemistry, and programming, to name a few. Concurrently, better understanding and adoption of reaction optimisation methods should be implemented: OFAT optimisations need to be replaced by more robust and more efficient techniques [25,26,27,28].

In this paper, the use of a structured experimental design integrated with a flow chemistry platform is reported. It is shown how this can be used as a teaching resource to introduce students to performing flow chemistry experiments and to better understand the type of data required for the optimisation of chemical processes. We herein report a demonstration of Design of Experiments for teaching the next generation of chemistry in a practical lab setting, whereby a chemical process with a number of possible products is optimised for the highest yield of a particular product. The chemical process chosen to be optimised was the SNAr reaction between 2,4-difluoronitrobenzene, 1, and pyrrolidine, 2, to form the desired ortho-substituted product, 3, and impurities 45, shown in Scheme 1.

Scheme 1
scheme 1

The SNAr reaction of interest, where the yield of the ortho-substituted product, 3, is to be optimised in a flow setup

Learning objectives

  • To set up a flow chemistry system to execute flow experiments.

  • To methodically plan flow experiments using DoE.

  • To statistically analyse DoE results and generate empirical models for an experimental data set.

  • To use DoE models to optimise an SNAr flow process.

Design of experiments

DoE is a statistical method of reaction optimisation that is often practiced in industry [25], but is less commonly used in academia, where a OFAT approach is much more common. Although OFAT can give an idea as to how particular factors influence the yield of a reaction, the parameters of interest are explored less comprehensively and no indication of how these factors affect each other and themselves at varying levels is obtained. When using a DoE approach, however, the entire parameter space is mapped in an efficient manner, which explores multidimensional space at the same time. This is because in each sequential experiment, multiple factors are changed at the same time. A comparison of the parameter space exploration by these two methods is shown in Fig. 1, by observing three experimental factors for the optimisation of the demonstration reaction: temperature (°C), residence time (min) and pyrrolidine equivalents. It is typically the case that OFAT experiments mostly explore individual planes of parameter space which makes it difficult to infer the overall space behaviour, whereas experimental designs can interpolate factor interactions much more effectively. This is true regardless of the number of experiments undertaken in either approach. The face-centred central composite (CCF) design shown in Fig. 1 splits each experimental factor into different levels. These levels: (-1, 0, + 1), are named by convention and correspond to the degree of the experimental factor, where − 1 is the lower bound of the parameter space, + 1 is the upper bound and 0 is the midpoint. For example, if the experimental bounds for residence time were between 1 and 5 minutes, the levels would be: 1 minute (-1), 3 minutes (0) and 5 minutes (+ 1). These levels are defined to ensure that all areas of parameter space can be explored, regardless of the range of the factors.

Fig. 1
figure 1

A comparison of the parameter space exploration when conducting a OFAT optimisation alongside a structured DoE design, where represents an experiment. Note also that a OFAT optimisation does not require a pre-determined number of experiments, and may or may not exceed the number of experiments in an experimental design

These structured experiments allow statistical models to be constructed from the experimental results, that accurately describe the changes in responses to experimental factor changes. If a DoE analysis tool such as MODDE (from Umetrics) or Design-Expert (from Stat-Ease) is used, the generation of these models is performed easily and intuitively. Empirical models, made up of experimental responses, can then be used to predict further experimental results based on how the model weights a particular input variable. These variables can simply be the experimental factors, but can also be interaction terms between the different factors, or squared interactions of the same factor. These interaction and squared terms indicate how experimental factors influence the reaction output, when other factors are changed alongside them. In the case of our SNAr example, it may be insufficient to describe the experimental data by simply incorporating model terms of ‘residence time’, ‘temperature’ and ‘pyrrolidine equivalents’. It may be a significant factor in the modelling of the data to include an interaction term between residence time and temperature, meaning in real terms that there is a higher influence of residence time/temperature at higher residence times/temperatures. Similarly, a squared temperature model term could better describe larger effects of temperature changes when the temperature is generally higher, meaning that temperature has a non-linear effect.

These interaction considerations can give a better description of the experimental data, as the synergistic effects between the factors are also incorporated into the empirical model. In this case, an empirical model is a purely statistical representation of the experiments and their outcomes, as opposed to a physical model determined by the underlying chemistry. This model can then allow response surfaces to be plotted and optimum operating regions to be identified, by interpolating the areas between the equidistant experimental points.

In this paper, we describe the use of a CCF design in the MODDE software. After already determining that the three factors of residence time, temperature and pyrrolidine equivalents are significant, the CCF optimisation design identifies all interaction and squared terms between these factors. The generated model will then be used to portray the entire parameter space, and hence identify the optimum operating conditions for the highest yield of the ortho-substituted product, 3. There are also other experimental designs one can consider using depending on the outcomes that are desired, but these are not covered in this paper [29].

Necessary equipment

In order to run the experiment as described, it is recommended to have the following equipment and chemicals. A full list of recommended vendors is located in the ESI.

  • PTFE tubing, 1/16” internal diameter.

  • Tubing fittings.

  • A tubing cutter.

  • Two syringe pumps, or equivalent.

  • Three stirrer-hotplates to place water baths on.

  • Three water baths, 500 mL.

  • 2,4-Difluoronitrobenzene (CAS: 446-35-5).

  • Pyrrolidine (CAS: 123-75-1).

  • Triethylamine (CAS: 121-44-8).

  • Hydrochloric acid (CAS: 7647-01-0).

  • Common laboratory solvents: ethanol, water, isopropyl amine.

  • Access to HPLC, or an equivalent quantitative analytical technique.

  • MODDE Pro, or equivalent DoE software.

Experimental setup

The experimental bounds for each of the factors are: residence time (0.5 to 3.5 minutes), temperature (30 to 70 °C) and equivalents of pyrrolidine (2 to 10). The concentrations of 2,4-difluoronitrobenzene and triethylamine are kept constant. The rationale behind these pre-determined experimental bounds came from the kinetic understanding of the work reported by Hone et al. on the same reaction [30]. The HPLC peak areas are converted to relative concentration percent for each of the species, each of which are reported as outputs for that particular experiment. The run order of the experiments was randomised, to prevent any extraneous (uncontrolled) variables affecting the results, shown in Table 1.

Table 1 The experimental conditions ran to perform the DoE study
Fig. 2
figure 2

A schematic of the experimental flow setup used for the SNAr reaction. The pyrrolidine concentration is changed for varying equivalent experiments, and the reactor is moved by hand into different water baths corresponding to the temperature that the experiment requires

When running the experiments, undergraduate students can be placed into groups of 5 or 6. Recommended tasks within the group can be split into: making up stock solutions, preparing the tubing, connecting the tubing, running/timing the experiments, experimental sampling, running HPLC analysis etc. However, in our case the HPLC calibration was conducted in advance by a trained instructor but this could be a task for the students as part of the experimental procedure. It is recommended also for students to read introductions to DoE papers or seek advice from postgraduates or academic supervisors prior to experimentation. Key introductory reading could include references reported by Krawczyk et al. [16] and Aggerwal et al. [20], as well as the book written by Antony [31] which are all useful resources.

The experimental flow setup is shown schematically in Fig. 2, and pictorially in Fig. 3. Four reservoirs were used, one containing 2,4-difluoronitrobenzene (0.1 M) and triethylamine (0.11 M) in ethanol, then three other reservoirs containing triethylamine (0.11 M) and varied pyrrolidine concentration (0.1 M, 0.5 M and 1 M) in ethanol. Each experiment setup contains the 2,4-difluoronitrobenzene solution in one syringe, and one of the triethylamine/pyrrolidine in ethanol solutions in the second syringe, depending on the low/medium/high equivalents of pyrrolidine that were investigated in a particular run. Harvard syringe pumps are used in each experiment to pump the solutions into a PTFE length of tubing (1/16” internal diameter, 6.3 cm, equal to 1 mL volume), submerged in one of three water baths at 30 °C, 50 °C or 70 °C. Three water baths were set up so that there are no waiting times between experiments for the water baths to achieve the desired temperature. This is important as the lab time is the most crucial resource, and the experiments must be executed in a specific order; running each block of temperature experiments (for example, all 30 °C experiments at once) could introduce extraneous variables, and must be avoided.

Fig. 3
figure 3

The flow setup used for the SNAr experimentation, where the tubular reactor is submerged in one of three different temperature water baths

Each experiment was allowed to reach steady-state by equilibrating for 2 reactor volumes, meaning that for each flow experiment, a wait time of two residence times is necessary before collection of material for analysis. For example, if the residence time for the reaction is 0.5 minutes, then 1 minute of reaction mixture is purged to waste before steady-state is established. For each experiment, the desired temperature is reached by placing the tubular reactor in a separate water bath at the corresponding temperature. Samples can then be taken from the end of the reactor, by immediately quenching a few droplets of material into a vial containing a drop of hydrochloric acid at the outlet of the flow system. This can then be diluted with methanol before transferring to analysis. These samples can then be sampled by HPLC, or other analytical techniques such as GC, requiring that quantitative yields of each of the species can be obtained - this is shown in Fig. 2. HPLC analysis was performed using an Ascentis Express C18 column (5 cm x 4.6 mm x 2.7 µm), using an isocratic flow gradient (51% water/49% acetonitrile, each reservoir containing 0.1% TFA) at 1.5 mL min− 1 flow rate for 2 minutes HPLC run time. It is beneficial to have short analytical methods to allow fast analysis and turnaround between different sets of experimental conditions.


Safety goggles and lab coats should be worn throughout the course of the experiment. All handling of organic solvents and preparation of solutions should be conducted inside the fume hoods. Special care should be taken when handling concentrated hydrochloric acid to quench the reaction in the HPLC vial. If any reagent is spilled on the body, wash the area with copious amounts of water for at least 15 minutes. Consult the MSDS for the specific guidance on handling each of the chemicals. After experimentation, any tubing can be washed by pumping isopropyl alcohol through the reactor for 10 reactor volumes in order to keep the tubing, or it can be discarded.

Analysis, results and discussion

The full CCF DoE (shown in Fig. 1) was run using the experimental setup described in Figs. 2 and 3, which consisted of running the experiments shown in Table 2. Three centre-point experiments were also run throughout the course of data acquisition, to monitor the reproducibility of the experiments as time passed. These repeated experiments, or replicates, ensure that any extraneous variables are identified (uncontrolled variables that are being changed unknowingly, e.g. stock solution contamination or degradation). The outputs are shown as molar percentages, where the starting material 2,4-difluoronitrobenzene (1), the desired product (3), the para-substituted impurity (4), and the di-substituted impurity (5). We assumed that each of the materials have equivalent HPLC response and did not run prior calibrations with standards, although this could be done with additional time. Molar percentages were calculated using internal normalisation for each of the species, where the area of the HPLC peak for the species of interest was divided by the total summed HPLC area for each peak, multiplied by 100. This is shown in the equation below, where (\(\boldsymbol{x}\)) is the HPLC area for the species of interest, and (1)/(3)/(4)/(5) are the HPLC areas of the species in this study:

$$Molar \ percentage= \frac{\left(\boldsymbol{x}\right)}{\left[\left(1\right)+\left(3\right)+\left(4\right)+\left(5\right)\right]}\times 100$$
Table 2 The experimental dataset generated from the running of the DoE for the SNAr reaction

Using this dataset, MODDE can fit a model automatically using the ‘Analysis wizard’ tool. Full instructions can be found in the ESI. MODDE then fits a saturated model for each of the responses given. A saturated model is where all model terms, including all interactions and squared terms, are included in the model. When a saturated model is initially generated, the R2 value is the largest value it can be. R2 is a percentage measure of how well a given model fits the data, which is usually represented as a number between 0 and 1. When a model uses all of the possible terms available to it, the variation in the experimental response is best described. This means that as R2 tends to 1, more of the variation is explained by terms in the model, as closer to 100% of the experimental variation can be attributed to specific terms. However, saturated models typically contain non-significant model terms that lead to a low Q2 value. The Q2 value is the percentage of the variation of the response predicted by the model by using cross validation, represented as a number between 0 and 1; simply put, Q2 tells you how well the model can predict new data. For a useful model, it is necessary to have a high R2 that explains the dataset well, as well as high Q2 that can interpolate new data points accurately. To achieve this, the model for each response must be edited as to remove any nonsignificant terms. Figure 4 shows the coefficients plots for a particular response, in this case the response for the amount of the desired product, (3), which graphically indicates each model term (x axis) and their respective significance (y axis). Each of the model terms are ‘scaled and centered’, meaning factors with different units can be compared to determine the influence of model terms over the range of the factors studied.

Fig. 4
figure 4

The significance of model terms on the response for the desired product, (3). a The difference between significant and non-significant model terms. b The saturated model, R2 = 0.990, Q2 = 0.764. c The optimised model, R2 = 0.986, Q2 = 0.894. Time = residence time, Temp = temperature, Eq. = pyrrolidine equivalents

Each model term has a respective uncertainty (represented in the plot as an error bar), and if that uncertainty overlaps with y = 0, then that model term can be deemed to be statistically nonsignificant, Fig. 4a illustrates this point. This is because there is a probability that the relative effect of the model term could be zero. The saturated model for the response of (3) is shown in Fig. 4b, where there are several significant model terms and two non-significant terms: Temp2 and Temp*Eq.. The R2 and Q2 measures are shown alongside the coefficients plot as a green bar and a blue bar respectively. Upon removal of the two non-significant terms, shown in Fig. 4c, the Q2 value rises from 0.764 to 0.894, meaning that the predictability of the model is increased for an insignificant decrease in R2.

This process is then repeated for the other responses of compounds (4) and (5), shown as Fig. 5a/b and Fig. 6 respectively. The response for (4), the saturated model (Fig. 5a) appears to describe the data well as the R2 is high, however, there are many non-significant terms. Because of this, the Q2 is negative, meaning there is no acceptable degree of predictability to be obtained from the model. As these non-significant terms are removed, even more terms become non-significant, until the only significant term that remains is temperature (Fig. 5b), but the R2 and Q2 values are still very low. This is because the response for (4) remained largely unchanged throughout our experimentation, meaning it is difficult to model well, as there were no factors that could be shown to have a strong effect on the outcome of this response. Conversely, the model for the response of (5) was found to be excellent without any need for further optimisation - the R2 and Q2 measures were both high, and the saturated model contained no non-significant model terms. This means that by using the same experimental data set, a secondary response can also be modelled and optimised for. This means that response surfaces for (5) can also be predicted without any further experimentation. Interestingly, it is not possible to do the same for the response for (4) due to the low formation of the product, as there are no changes in the experimental conditions that lead to a significant amount of this product being generated. This manifests itself in the uncertainty of the model terms, as most of the error bars for these model terms intersect y = 0 and are therefore their relative effects are non-significant.

Fig. 5
figure 5

The significance of model terms on the response for (4). a The saturated model, R2 = 0.864, Q2 = -0.200. b The optimised model, R2 = 0.246, Q2 = -0.047

Fig. 6
figure 6

The significance of model terms on the response for (5), showing the saturated model with no non-significant terms, R2 = 0.997, Q2 = 0.944

As the models were further optimised to have the highest R2 and Q2 possible, the optimum operating conditions for the production of the desired ortho-substituted (3), could then be identified. By selecting the ‘4D Contour’ option in MODDE, the response for (3) can be interpolated across the entire landscape of the parameter space, providing a total insight into the chemistry that could not be achieved by other means such as OFAT. This contour plot is shown in Fig. 7, which indicates clearly the yield of (3) that would be achieved with varying experimental factors. Figure 8 shows a similar plot on how the yield of the di-substituted impurity, (5), also changes with these differing inputs. It is important to note that to sensibly use contour plots, DoE model performance metrics such as R2 and Q2 must be good. This is also a significant point for the student learning and can be adapted into leading questions such as: ‘Using the 4D Contour Plot, predict the yield of the major product at x, y and z experimental conditions?’.

Fig. 7
figure 7

The contour plot for the response of (3), showing how the yield of the ortho-substituted product changes with varying experimental conditions

Fig. 8
figure 8

The contour plot for the response of (5), showing how the yield of the di-substituted product changes with varying experimental conditions

The optimum operating region for the highest yield of the (3) have been identified using this DoE approach, whilst giving a full picture of the parameter space. The results show that high temperature, high residence times and high pyrrolidine equivalents lead to the highest yield of the desired product (3), as well as the highest yield of the di-substituted impurity (5). There are still other aspects of DoE that can be explored, such as model validity and reproducibility, predicted kinetic plots and ‘Sweet Spot’ visualisations and ‘Optimizer’ usage in MODDE. These tools can use the same data set to give further process understanding, and the empirical model can be exported to further explore responses such as E-factor, space-time yield etc. The same data set can also be used to build further models on multiple responses, each of which can be refined to give further understanding and predictability. This could be warranted if there were additional experimental needs, such as productivity of material. This can highlight areas where the highest yields are present in the shortest residence time, by compromising higher yields for quicker product generation.

Upon completion of the experimental work, students were asked to prepare a report on their findings - this can be in a word document or a research article format. Conveying their ability to report on statistical models and find optimum reaction conditions for the production of (3) serves as the main assessment criteria for this work, where > 90% of students were successful. Correct assignments of the optimum parameter regions indicates that they have performed the experiments correctly and should be considered when grading the report. Further questions can also be postulated to the students, such as ‘what are the advantages of running this reaction in flow?’ and ‘why perform a DoE?’. These questions can enhance the student learning experience as they are asked to reflect upon their work directly. Sample questions and full answers with suggested grading criteria are provided in the ESI.

Student feedback

This example has formed part of the EPSRC Dial-A-Molecule Summer School in 2018 and 2019 targeted at 1st year PhD students. The summer school was a lively and interactive event and in addition to the experiment/analysis outlined also included a series of lectures from academic and industrial experts. Furthermore, practical sessions on 3D printing and an evening session outlining Design of Experiments by designing and making paper helicopters and optimising the helicopter geometry were also conducted, simply to solidify the concepts of DoE and their applications to various real-life scenarios. These exercises create an equal baseline of background knowledge that drives the use of the DoE methodology, which forces hypotheses to be made from understandings of factor selection and level setting, rather than undisclosed assumptions based on prior experiences. The content was very well received, with 88% on the feedback rating the course as “Good” or “Very Good”, and the students particularly valued the combination of practical and theoretical examples detailed in this publication.


It has been shown that by using a simple continuous flow setup, consisting of syringe pumps, water baths and a method of quantitative analysis, alongside a methodical experimental technique such as DoE, that multistep chemical processes can be optimised for a desired output. The effect of varying reaction conditions on the outcome of a chemical reaction is explored and therefore allows better understanding of the reaction system than a OFAT approach. This particular experiment is run annually as part of the undergraduate chemistry course at the University of Leeds, but can be used as an exercise in teaching flow chemistry and optimisation to researchers at any level. Third year undergraduates that select this optional project learn the theory of DoE as part of the pre-laboratory preparation - these theory PowerPoint slides are provided in the ESI. Depending on the experience of the students, the experiment can be altered to constrain what each participant will conduct experimentally and what is provided for them. The experimental setup requires low cost equipment alongside common laboratory analytical equipment, and the experimentation itself is suitable for undergraduates and upwards; all experimental results can be obtained in a 2–3 hour lab session.

This experiment demonstrates that the outlined statistical modelling methodologies provide a greater insight into process optimisation than can be achieved by a OFAT approach and represent some of the most efficient and effective data analysis techniques to explain the chemistry and identify regions of interest. As many exercises in undergraduate courses are based around synthetic batch experiments, this continuous flow experiment can be incorporated into the course as a different approach to carrying out a synthetic reaction and obtaining reaction data, and simultaneously provide an opportunity to learn about statistics and optimisation techniques. This also enable the students to work as part of a group to design and perform the experiment, working towards a common goal, broadening their skills and encouraging new ways of thinking.

It is the hope of the authors that as the skillset required of a chemist is diversifying and expanding in sync with the increasing capabilities and technologies of a ‘typical’ laboratory - so will the teaching of both chemistry and optimisation techniques for process development. We are making strides globally in a positive and constructive way towards laboratories which contain scientists with a wide variety of skillsets, and this paper aims to serve as a guide to teaching a number of these key skills, i.e. continuous flow synthesis, statistical data analysis, experimental design and reaction optimisation. The evolution of curricula, the paradigm shift of academic labs and overall increased awareness of other methodologies means it is now a very exciting time to be in a chemistry setting: where being a chemist is more than just being a chemist.