Keywords

1 Introduction

Imaging ultrafast molecular dynamics addresses fundamental science, which helps us understand chemical reactions’ basics. Laser-induced electron diffraction (LIED) [1] is a powerful laser-based imaging method that has the ability to image the three-dimensional structure of a single gas-phase molecule with combined sub-atomic picometre and femtosecond spatiotemporal resolution [2]. Taking snapshots of molecular dynamics via the LIED technique gives an inside into the intertwining of molecules, how they react, change, break or bend.

However, retrieving complex molecular structures from diffraction patterns is challenging. As the structural complexity increases, it becomes more challenging to identify the extremum with current retrieval algorithms, and therefore they are still limited to few-atom molecular systems [3].

A machine learning (ML) algorithm is fully capable of solving these difficulties based on its ability to consider multiple degrees of freedom simultaneously. Based on an ML-LIED framework, we demonstrate the accurate retrieval of a large and complex molecule’s three-dimensional (3D) structure.

2 ML Algorithm

Our ML algorithm utilizes a convolutional neural network (CNN). It will be trained to find the relationship between a molecular structure and its molecular interference signal from the two-dimensional differential cross-section maps 2D-DCS. The DCS contains the fingerprints of the internuclear distance of atoms of the molecule and is used as the algorithm’s input data. Using a convolutional neural network, we take advantage of its ability for image recognition. Here, the convolution of the 2D-DCS maps with different filters enhances at first subtle features of the maps providing a collection of feature maps (Fig. 1a). Subsequently, the feature maps pass through the fully connected neural network by multiplying the weights between each neuron to predict the atomic position in the molecule (Fig. 2b).

Fig. 1
A flow diagram of the 2 D D C S with different filters and feature maps in a leads to be flattened and passed through the fully-connected neural network to predict the 3 D position of each atom in b. Which leads to a cost function to minimize the difference and optimize all filters and weights.

(a) Subtle features of the 2D-DCS are enhanced by different filters. (b) The generated feature maps are flattened to a 1D array and passed through the fully-connected neural network to predict the atomic position in the molecule. (c) A schematic contour plot of the used cost function for two neuron weights (ωi and ωi + 1). The blue dots symbolize the cost function value, and the red arrows indicate the direction of the gradient of the cost function. The cost function is minimized during the five iterations

Fig. 2
A scatterplot of atom position versus atom numbers. It plots vertical lines of x, y, and z at atom number 1, and the data points of equilibrium and C N N plot a fluctuating trend between the 7 atoms. A schematic of 3 D molecular structure has 6 areas of uncertainty labeled 2, 3, 4, 5, 6, and 7.

3D Cartesian coordinates (x, y, z) for seven atoms in (+)-fenchone are predicted by the CNN (green circles). ML-LIED-measured (+)-fenchone structure shows only slighty deviates from the equilibrium ground-state neutral molecular structure (red triangle). The schematic of the predicted 3D molecular structure is presented on the right. Here, the green circles indicate the area of uncertainty

For training the ML algorithm, we first generated a database containing thousands of molecular structures spanning possible geometries. We calculated the corresponding 2D-DCS map for each structure by simulating the elastic scattering of electrons on the molecule using the independent atom model (IAM).

The database is split into training, validation, and test sets to validate the ML model. To evaluate the model’s accuracy during training, we define the absolute difference between the predicted and actual atom position (mean absolute error MAE) as our prediction error.

Once the ML model is satisfactorily trained, the experimental 2D-DCS map is used as an input to generate the predicted molecular structure that most likely contributes to the measured interference signal [4].

3 Extracting Molecular Structure

Before we use our ML framework to retrieve the molecular structure of a complex molecule, we firstly exam the ML model accuracy by revisiting published experimental LIED data of a small linear 1D molecule acetylene (C2H2) [5] and a planar 2D molecule carbon disulfide (CS2) [6]. Table 1 summarizes the ML predicted structural parameters of the C2H2 and CS2 molecules. The predicted structures agree nicely with previous publications where the structure was extracted from the LIED data by a standard fitting routine.

Table 1 Summary of C2H2 and CS2 structures predicted by machine learning (ML)

Then, we use our ML framework to study the configuration of a (+)-fenchone (C10H16O; 27 atoms) molecule which experimental 2D-DCS map is also determined with LIED. Retrieving the structure of such a complex molecule by using a standard fitting routine would require an unrealistic calculation time. For example, a calculation time of 1.4e+9 h would be needed to calculate only five variations of its possible structures. ML has the decisive advantage of interpolating and learning between the course grids of precalculated molecular geometries. Thus, we can sufficiently create an interpolated database that only considers the variation of four groups of atoms and a molecule-wide global change in structure. Let the machine itself interpolate the relationship between the molecular structures and corresponding 2D-DCSs with a reduced database, drastically minimizing computational time. Observing the MAE converges to a constant value of ~0.02 involved with the training and validation data sets, we verify that the model is not over/underfitting and is satisfactorily trained. Furthermore, the Pearson correlation coefficient is found to be 0.94, confirming the strong correlation between the experimental and predicted theoretical 2D-DCS.

Figure 2 shows the predicted 3D Cartesian coordinates (x, y, z) for seven atoms in (+)-fenchone that are retrieved from the experimental data by the ML model (green circles). The error bars included the predicted model error as well as the experimental statistical error. The equilibrium ground-state 3D positions of neutral (+)-fenchone are presented by red triangles. The slight deviation of the ML-LIED measured, and the equilibrium ground-state molecular structure is caused involuntarily by the LIED laser field. In addition, the schematic of the predicted 3D (+)-fenchone molecule is also shown. Here, the green circles indicate the degree of uncertainty.

4 Summary

We implement ML-LIED to retrieve atomic positions of 1D, 2D, and complex 3D molecules with picometer and attosecond resolution. The ML-based framework achieves high-accuracy pattern matching in complex solution spaces while overcoming scaling limitations in a standard fitting routine. Not just LIED, the problem of unfavorable scaling also arises with other diffraction methods. Combining ML with LIED offers a new general solution to overcome long-standing problems and opens up new opportunities to image the structure of large, complex molecules.