1 Introduction

Time-resolved spectroscopy is widely used in photochemistry and photobiology for investigating the dynamic properties of complex systems [1,2,3,4]. The global and target analysis methodology has been developed to model multidimensional datasets from these systems [5,6,7,8,9]. Here, global refers to a simultaneous analysis of all measurements; whereas, target refers to the applicability of a particular target model [10]. Fred Brouwer inspired the early developments of global and target analysis [11,12,13]. Several tools for global and target analysis exist in the public domain [14,15,16,17,18,19,20,21]. While some of these tools provide, as pyglotaran, a modular and Python-based approach, they have different foci for the analysis. The aim of this paper is to introduce pyglotaran [22], a lego-like Python problem-solving environment [23] for global and target analysis of time-resolved spectra. Several new features that are now becoming publicly available will be demonstrated with the help of an in depth presentation of two recent transient absorption case studies [24, 25], namely the usage of guidance spectra and the parametric description of the “coherent artifact” (CA) [2, 26]. Vibrational wavepackets [27, 28] will be described with the help of damped oscillations [26, 29]. Finally, in a third case study, the time-resolved emission of whole photosynthetic cells can be described by the contributions of all the megacomplexes present, thus resolving the energy transfer pathways [30] with the help of spectral area constraints [31]. These case studies explore systems containing 1, 2, or 3, and ≈500 chromophores, with compartmental models [32] describing 5 or 6 spectrally distinct species or states. The temporal resolution ranges from ≈10 fs to ≈10 ps.

The paper is organized as follows: We assume the reader is familiar with the basics of global and target analysis [7], with some foundational information included in the Supplementary Information (SI) Jupyter notebooks [33]. In the methods section, we summarize the methodological advancements since 2004, which are then illustrated through the case studies in the results and discussion section. Next, we present the design of the pyglotaran problem-solving environment and discuss software engineering aspects and future developments. In the results and discussion section, we provide a concise description of the salient analysis results, with extended descriptions available in the SI Jupyter notebooks. Interested readers can download the Jupyter notebooks and pre-processed data to reproduce all results. Pyglotaran extends the FAIR data principles [34] by not only enabling users to share their data, pre-processing steps, and analysis results, but also by open-sourcing its software analysis tools and providing Jupyter notebooks for full reproducibility. To demonstrate this commitment to transparency and reproducibility, readers can download the Jupyter notebooks and pre-processed data associated with this paper to reproduce all the presented results, thus fostering a more collaborative and robust scientific community.

2 Methods

Crucial to virtually all global and target analysis is the superposition principle, expressing that the response of a complex system can be described by a linear superposition of the contributions from several components. We will first consider transient difference absorption spectroscopy, and then time-resolved emission spectroscopy. The superposition principle is schematically depicted in Fig. 1 for a single data set, that will be explained in detail in the first case study. The data matrix (first row) shows a complex spectral evolution (second row) with damped oscillations (third row) and a pronounced coherent artifact straddling time zero (fourth row, cf. the trace at 600.5 nm in the first row). The fifth row depicts the analysis of the remaining structure in the residual matrix of the fit.

Fig. 1
figure 1

Example of a time-resolved difference absorption Spectrum and fit thereof (row 1), with the help of a compartmental model (row 2), the damped oscillations (row 3), a “coherent artifact” (CA) (row 4) and the residual (row 5). Further explanation in text

In broadband absorption spectroscopy [35, 36], the evolution of the ground- and excited-state vibrational wave packets created by the short laser pulse is described with a superposition of damped oscillations. The amplitude of a damped oscillation \(\cos (\omega_{n} t)\exp ( - \gamma_{n} t)\) as a function of the detection wavelength constitutes a damped oscillation-associated spectrum (\(DOAS_{n} (\lambda )\)) with an accompanying wavelength-dependent phase \(\varphi_{n} (\lambda )\) [29] (cf. row three of Fig. 1). When the vibrational evolution can be considered independently from the electronic evolution (Born–Oppenheimer approximation), we arrive at a superposition of the electronic and vibrational contributions to the time-resolved spectrum (TRS):

$$\begin{gathered} TRS(t,\lambda ) = \sum\nolimits_{l = 1}^{Nstates} {c_{l}^{S} (t^{\prime},\theta )SADS_{l} (\lambda )} , \hfill \\ + \sum\nolimits_{n = 1}^{Nosc} {\cos (\omega_{n} t^{\prime} - } \varphi_{n} (\lambda ))\exp ( - \gamma_{n} t^{\prime})DOAS_{n} (\lambda ), \hfill \\ + \sum\nolimits_{m = 0}^{2} {i^{(m)} (t)IRFAS_{m} (\lambda )} + residual(t,\lambda ), \hfill \\ \end{gathered}$$

where \(N_{states}\) electronically excited states are present in the system, with populations \(c_{l}^{S} (t)\)(superscript S stands for species), and species’ spectral properties, the species associated difference spectra (\(SADS_{l} (\lambda )\)) (cf. row two of Fig. 1). The populations are determined by an unknown compartmental model [32], that depends upon the unknown kinetic parameters \(\theta\). In the target analysis constraints on the SADS are needed to estimate all parameters \(\theta\) and \(SADS_{l} (\lambda )\). \(t^{\prime}\) indicates that the actual model function still must consider the instrument response function (IRF) by means of convolution. The next term describes the coherent artifact, with a weighted sum of the zeroth, first and second derivative of the IRF \(i(t)\) [2] (cf. row four of Fig. 1). Finally, the residual represents the part of the data that is not described by the parameterized model (cf. row five of Fig. 1).

For every wavelength, the matrix formula for this superposition model is given by

$$TRS = C^{S} (\theta ,\mu ,\Delta ) \cdot SADS + {\text{Cos}} (\omega ,\gamma ,\mu ,\Delta ) \cdot A + {\text{Sin}} (\omega ,\gamma ,\mu ,\Delta ) \cdot B + IRF(\mu ,\Delta ^{\prime}) \cdot IRFAS,$$

where the matrix \(C^{S}\) consists of columns \(c_{l}^{S} (t)\). A Gaussian shaped IRF is used, with parameters μ for the time of the IRF maximum and Δ for the full width at half maximum (FWHM) of the IRF. The matrices \({\text{Cos}} (\omega ,\gamma ,\mu ,\Delta )\) and \(Sin(\omega ,\gamma ,\mu ,\Delta )\) contain the damped oscillations, and the matrices A and B comprise their amplitudes. To limit the number of free parameters, we assume wavelength independence of the eigenfrequency \(\omega_{n}\) and of the damping rate \(\gamma_{n}\).

The final term, which describes the coherent artifact, contains a matrix \(IRF(\mu ,\Delta ^{\prime})\) with as columns the \(i^{(m)} (t)\). The SADS and IRFAS and also the amplitudes A and B are unconstrained conditionally linear parameters, that can be implicitly solved for (per wavelength) using the variable projection algorithm [37, 38].

When the IRF width Δ is larger than ≈150 fs the damped oscillations will be virtually averaged out, and for every wavelength, the matrix formula for this superposition model reduces to [40]:

$$TRS = C^{S} (\theta ,\mu ,\Delta ) \cdot SADS^{{}} + IRF(\mu ,\Delta ) \cdot IRFAS^{{}} .$$

In time-resolved emission spectroscopy, the IRF is generally much wider than 150 fs, and the presence of a possible scatter component can be described by the IRF shape (i.e., the zeroth derivative). The species’ spectral properties are called the Species Associated Spectra (\(SAS_{l} (\lambda )\)) and we have for every wavelength (disregarding a possible scatter component)

$$TRS = C^{S} (\theta ,\mu ,\Delta ) \cdot SAS.$$

Since emission cannot be negative, the SAS are nonnegative conditionally linear parameters, that can be implicitly solved for (per wavelength) [39, 40].

In most experiments, the location of the maximum of the IRF is wavelength dependent. This so-called dispersion can well be described by a polynomial function of the wavelength [7] or of its reciprocal, the wavenumber [2]. This introduces, typically, 1–3 “nuisance” parameters. Moreover, the whole kinetic model must be recomputed for every wavelength, which greatly increases the computation time. An independent experiment that involves a coherent artifact with dispersion must also be modeled with the above formulas [26].

3 Results and discussion

3.1 Broadband absorption case study of intersystem crossing in 4-thiothymidine

In a recent study, it was demonstrated that coherent vibrational modes promote the ultrafast internal conversion and intersystem crossing in thiobases [25]. Here, we will present the target analysis of 4-thiothymidine (4TT). The structure and steady-state spectra of 4TT are shown in Figure S1. The IRF width of ≈35 fs FWHM enables the resolution of damped oscillations that can be assigned to vibrational modes. The coherent artifact is clearly visible in the trace at 600.5 nm in Fig. 2, with large contributions of the IRF derivatives (Fig. 3).

Fig. 2
figure 2

Overview of the data of 4TT in phosphate-buffered saline (PBS) after excitation at 330 nm (top) and selected time traces (bottom) (in mOD, red) and fit (black). Wavelength is indicated in the title of the panels. Note that the time axis of the time traces is linear until 1 ps (after the maximum of the IRF), and logarithmic thereafter. Rms error of the fit is 0.12 mOD

Fig. 3
figure 3

Coherent artifact of 4TT in PBS after excitation at 330 nm. (A) 0th, 1st, and 2nd derivative of the IRF (blue, orange, green) which possessed a FWHM Δ’ of 35 fs. (B) scaled IRFAS. Scaling of the IRFAS is such that the product of the IRFAS and the IRF derivative is the contribution to the fit. Thus, the blue IRFAS has the largest contribution to the fit

We employ a sequential kinetic scheme with four states which we tentatively name S2, S1, hot T1’, and T1, i.e., we assume the kinetic scheme S2 → S1 → hot T1’ → T1. The populations and estimated SADS are shown in Fig. 4. The kink at 1 ps in Fig. 4A results from the time axis being linear until 1 ps, and logarithmic thereafter [41].

Fig. 4
figure 4

Populations (A) of the kinetic scheme S2 → S1 → hot T1’ → T1 and estimated SADS (B, in mOD) resulting from the sequential kinetic analysis of 4TT in PBS. Note that the time axis in A is linear until 1 ps (after the maximum of the IRF), and logarithmic thereafter. Estimated lifetimes: 74 fs (black), 284 fs (red), 1.76 ps (green), and long lived (blue)

Four damped oscillations have been used, one of which was time-reversed and is attributed to the coherent artifact. The DOAS and phases of the other three damped oscillations are shown in Figure S2.

Although at first glance the fit looks satisfactory (Fig. 2, bottom) and the rms error of 0.12 mOD is small, the residual matrix of the fit still shows some structure, cf. the vertical lines in Fig. 5A. These lines can tentatively be attributed to pump laser intensity fluctuations. This suggests refining the analysis and correct for these laser intensity fluctuations by estimating whether the residual spectrum at a certain time is proportional to the data. If so, the data can, thus, be corrected. This is demonstrated in Figure S3. The rms error decreases to 0.065 mOD, and the residuals show virtually no more structure (Figure S4).

Fig. 5
figure 5

First left (B) and right (C) singular vectors resulting from the singular value decomposition (SVD) of the residual matrix (A). The black line in (A) represents the location of the maximum of the IRF, which is described by a third-order polynomial function of the wavenumber

It is difficult to interpret the S1 SADS, red in Fig. 4B, it still contains substantial stimulated emission around 430 nm, suggesting that it is a mixture of relaxed S2 and S1. Therefore, we adopted the kinetic scheme in Fig. 6 which contains five states, S2, relaxed S2’, S1, hot T1’ and T1. Associated with these states, there are five lifetimes. Several kinetic schemes have been tested, until we arrived at the scheme of Fig. 6 with well interpretable SADS that are in accordance with the theory [25]. The precision of the estimated parameters is reported in the result object (Figure S5). The relaxed S2’ SADS (green in Fig. 7) is free from triplet (blue in Fig. 7) features. The S1 SADS (red) shows two ESA bands and a small SE around 430 nm.

Fig. 6
figure 6

Kinetic scheme used for the target analysis of 4TT in PBS. All rate constants in ns−1. Key: lifetimes: 73 fs (S2(1ππ*), black), 168 fs (S2(1ππ*), green), 444 fs (S1(1nπ*), red), 2.4 ps (T1’(3ππ*), magenta) and long lived (T1(3ππ*), blue)

Fig. 7
figure 7

Populations (A) of the target kinetic scheme from Fig. 6 and estimated SADS (B, in mOD) resulting from the target analysis of 4TT in PBS. Note that the time axis in A is linear until 1 ps (after the maximum of the IRF), and logarithmic thereafter

The contribution of the damped oscillations to the fit and the quality of the fit are demonstrated for the 510 nm data in Fig. 8. Note that the main decay of the damped oscillations (turquoise) corresponds to the main decay of S2 (black), suggesting that these oscillations live on (relaxed) S2. In the refined target analysis four DOAS have been resolved, the vertical lines in Fig. 9 indicate nodes concomitant with a phase jump in the 458 cm−1 DOAS (red, at 415 and 460 nm), in the 213 cm−1 DOAS (blue, at 393, 429, and 479 nm), and in the 847 cm−1 DOAS (green, at 526 nm). These results are further discussed in [25].

Fig. 8
figure 8

Decomposition of the 4TT in PBS data at 510.4 nm (in mOD, orange, and the fit, gray). CA is the sum of the IRFAS and CA damped oscillations (decay rates > 50/ps), doas is the sum of the four damped oscillations with decay rates < 30/ps. Note that the time axis is linear until 1 ps (after the maximum of the IRF), and logarithmic thereafter

Fig. 9
figure 9

Overview of the estimated DOAS and phases. A Cosine oscillations with frequencies \(\overline{\nu }n\) (in cm−1) (where n is the DOAS number) and damping rates \(\gamma\) (in ps−1) written in the legend at the left, using the appropriate color. Scaling of the DOAS is such that the product of the DOAS and the damped oscillation is the contribution to the fit. B Estimated DOAS. (C) Estimated phase profiles of the DOAS. The vertical lines in (B, C) indicate nodes concomitant with a phase jump

3.2 Transient absorption case study of the chromophoric systems rc and rcg

The primary event in molecule-based light energy conversion systems is light harvesting. We studied perylene bisimide-calix[4]arene multichromophoric systems composed of two different types of perylene bisimide (PBI) chromophores, red (r), and green (g) PBIs (named after their colors as solids) connected by calix[4]arene (c) [24, 42]. Figure 10A depicts the chemical structure of the supramolecular system rcg, and the absorption and emission properties of the parent chromophores rc and gc are shown in Fig. 10B. Due to the excellent overlap of the rc emission and the gc absorption (dotted red and solid green lines in Fig. 10B) and the close proximity fast Förster excitation energy transfer (EET) is found after excitation of the r moiety [42]. However, the rc chromophore has inherent dynamic properties, which must be considered in a target analysis.

Fig. 10
figure 10

A Chemical structure of the supramolecular system rcg, figure adopted from [43]. B UV/Vis absorption (red, solid) and fluorescence emission spectra (red, dotted) of tetraphenoxy-substituted (red) PBI compound rc; UV/Vis absorption (green, solid) and fluorescence emission spectra (green, dotted) of dipyrrolidino-substituted (green) PBI compound gc. All spectra are taken in CH2Cl2

An important part of the data analysis is the pre-processing of the raw data. In the rc Jupyter notebook in the supplementary information, it is demonstrated how a global analysis is used to demonstrate the presence of a pre-zero baseline in the data (Figure S6, Figure S7), and how this baseline can then be estimated (Figure S8) and subtracted from the raw data. Throughout this manuscript, we refer to pre-processed data.

After 530 nm excitation of rc in CH2Cl2, the coherent artifact is well described by the wavelength-dependent \(IRF(\mu ,\Delta ^{\prime}) \cdot IRFAS\) (Fig. 11). Representative traces demonstrating the excellent quality of the fit are shown in Fig. 12.

Fig. 11
figure 11

Coherent artifact of rc in CH2Cl2. A 0th, 1st, and 2nd derivative of the IRF (blue, orange, green) which possessed a Full Width at Half Maximum (FWHM) of 119 fs. B scaled IRFAS. Scaling of the IRFAS is such that the product of the IRFAS and the IRF derivative is the contribution to the fit. Thus, the blue IRFAS has the largest contribution to the fit. It shows large amplitudes straddling 530 nm, the excitation wavelength. In addition, Raman scattering is visible as negative peaks at 580 and 636 nm, and positive peaks at 460 and 512 nm

Fig. 12
figure 12

Selected time traces of rc in CH2Cl2 after excitation at 530 nm data (in mOD, red) and fit (black). Wavelength is indicated in the title of the panels. Note that the time axis is linear until 1 ps (after the maximum of the IRF), and logarithmic thereafter. Rms error of the fit is 0.33 mOD

The extensive spectral evolution of rc can be described with four excited states r1 → r2 → r3 → r4 → ground state, resulting in four rate constants \(k_{r2,r1} ,k_{r3,r2} ,k_{r4,r3} ,k_{r4}\) (where we use the \(k_{to,from}\) convention, and \(k_{r4}\) denotes the decay rate to the ground state) and four rc-SADS. This sequential scheme (Fig. 13A) neglects the branching decay to the ground state of the first three states, since \(k_{r4}\) is much smaller than the other three rate constants. As a refinement one could add this decay channel, assuming a rate of decay to the ground state \(k_{r4}\) for all four states. The perylene red chromophore shows a strong spectral evolution in time, especially from 550 to 750 nm (Fig. 13B).

Fig. 13
figure 13

Populations A and SADS (B, in mOD) of the four sequential compartments of rc in CH2Cl2. Key: gray, orange, red, purple: successively relaxed r* states. Note that the time axis in (A) is linear until 1 ps (after the maximum of the IRF), and logarithmic thereafter

Fig. 14
figure 14

First left and right singular vectors resulting from the singular value decomposition (SVD) of the residual matrix of rcg in CH2Cl2 resulting from a target analysis using a kinetic scheme without (A, B) or with (C, D) the rcg radical pair state. Note that panels (A, B) show large trends, whereas panels (C, D) show only small trends. The rms error of the fit decreases from 0.27 to 0.23 mOD. Note that the time axis in (A, C) is linear until 1 ps (after the maximum of the IRF), and logarithmic thereafter

The first target analysis of rcg considers that the 530 nm also directly excites the g moiety (≈12% of the r absorption, Fig. 10B) and to the rc-kinetic scheme the four EET to g rate constants \(k_{g,r1} ,k_{g,r2} ,k_{g,r3} ,k_{g,r4}\) are added. By plotting the first left and right singular vectors resulting from the singular value decomposition (SVD) of the residual matrix, we can identify systematic patterns in the residuals, which may indicate potential issues with the model. Here, the fit using this model is unsatisfactory since the left and right singular vectors of the residual matrix (Fig. 14A, B) show large trends, especially during the first 10 ps around 590 and 780 nm.

Fig. 15
figure 15

Kinetic schemes used for the simultaneous target analysis of rc (A) and rcg (B) in CH2Cl2. All rate constants in ns−1. Key: gray, orange, red, purple: successively relaxed r1, r2, r3, r4; dark green: g; black, rcgRP: rcg radical pair. Vertical arrows indicate relaxation of an excited state, or decay. Horizontal arrows represent energy transfer to g

Therefore, a loss process is introduced: the formation of a radical pair state, called rcgRP, from r1 or r2, which requires two new rate constants \(k_{rcgRP,r1} ,k_{rcgRP,r2}\). Thus, the rcg system can be described by the four rc-SADS, the g-SADS and the rcgRP-SADS, and the twelve rate constants: \(k_{r2,r1} ,k_{r3,r2} ,k_{r4,r3} ,k_{r4}\);\(k_{{g,r_{1} }} ,k_{{g,r_{2} }} ,k_{{g,r_{3} }} ,k_{{g,r_{4} }}\);\(k_{rcgRP,r1} ,k_{rcgRP,r2}\) and the decay rates to the ground state \(k_{g} ,k_{{{\mathbf{rcgRP}}}}\). This kinetic scheme is schematically depicted in Fig. 15B and the differential equation is shown in Figure S10.

The main results from the rc in CH2Cl2 experiment are the four rate constants (Fig. 13A) and the four SADS (Fig. 13B). The four rate constants are used in the kinetic scheme of rcg (Fig. 15B). The estimated rc-SADS are used to guide the SADS of the r1, r2, r3, r4 species in the target analysis of rcg by adding the rc-SADS as data to be fitted using the rcg-SADS. This is demonstrated in Fig. 16, where the dashed lines indicate the fit. The formulas for the fitting of the guidance SADS are shown in Figure S10. The usage of the guidance spectra allows for some flexibility, to accommodate small differences in the experimental conditions or the wavelength calibration or the white light of the probe, when the experiments have been performed on different days. It circumvents the more complicated simultaneous analysis of multiple datasets by selectively adding only the relevant information, i.e., here the four rc-SADS and the g-SADS. Without the guidance SADS it would have been impossible to take the r* spectral evolution properly into account in the rcg target analysis, since, e.g., the population of r4 is very small (purple in Fig. 17A).

Fig. 16
figure 16

Fit of the guidance SADS used in the target analysis

Fig. 17
figure 17

Populations (A) of the target kinetic schemes from Fig. 15B (for rcg) and estimated SADS (B, in mOD) of the simultaneous refined target analysis of rcg (solid), rcgcr (dotted) and gcrcg (dashed) in CH2Cl2. Key: gray, orange, red, purple: successively relaxed r1, r2, r3, r4 states; dark green: g; black, rcgRP: rcg radical pair. Note that the time axis in (A) is linear until 1 ps (after the maximum of the IRF), and logarithmic thereafter

rcg shows the typical spectral evolution of the r chromophore, as well as EET to g (see the 730 nm bleach in the g SADS) and r−. formation, characterized by the 575 nm bleach and the 780 nm absorption (black SADS). The trichromophoric systems rcgcr and gcrcg (Figure S9) can be analyzed with slightly modified versions of this kinetic scheme resulting in the concentrations depicted with dotted and dashed lines in Fig. 17A. Note that in gcrcg which contains two accepting chromophores, the g population (dashed green) rises faster, but also that of the rcgRP (dashed black), cf. also the amplitude matrices in the gcrcg Jupyter notebook in the SI. These results are further discussed in [24].

3.3 Time-resolved emission case study of whole photosynthetic cells

The time-resolved emission spectrum of whole photosynthetic cells contains the contributions from all the pigment–protein complexes present. In cyanobacteria the phycobilisome (PB) is the light harvesting antenna, which contains phycocyanin (PC) and allophycocyanin (APC) pigments that absorb the light between 400 and 650 nm. The excitations of the antenna pigments are efficiently transferred to the chlorophyll-containing photosystems (PS) I and II [44]. Megacomplexes consisting of PB, PSI, and PSII have been demonstrated [45]. ΔPSI mutants which lack PSI [46] have been used as a model system to study the properties of the PB–PSII megacomplex [30]. PSII shows different properties when the reaction center (RC) is in the open or in the closed state [47]. To model the EET rates in a whole cell, we, thus, distinguish three basic types of megacomplexes (Figure S11): PB–PSII with PSII in the open or the closed state and non-transferring PB. Free PSII, not receiving PB input, cannot be distinguished from the PSII in a PB–PSII megacomplex. The minimal kinetic scheme of PB consists of ten compartments [48]: three core cylinders with a connected rod. Each rod consists of PC640 (cyan) and PC650 (blue) compartments, the top cylinder contains 24 APC660 pigments (magenta), the two basal cylinders consist of disks with only APC660 pigments (red) and disks with APC660 (orange) and APC680 pigments (black). APC680 is the terminal emitter that transfers the PB excitations to PSII. The biexponential decay of the PSII dimer emission is described by an equilibrium of a Chl a compartment (PSII open, green) with a radical pair (RP) compartment [47, 49]. Spectral equality constraints are employed linking the SAS of the PC640, PC650, APC660 compartments. Thus, together with APC680 and PSII there are only five different SAS. The RP SAS is by definition zero. Spectral area constraints [31] have been used to estimate the equilibria. The parameters of the PB model have been taken from [48].

To collect enough information four experiments have been done, preferentially exciting the PB (590 nm excitation) or the PSII Chla (400 nm excitation), with a shorter or longer time range (TR2, IRF ≈7 ps FWHM and TR4, IRF ≈18 ps FWHM). Representative traces demonstrating the excellent quality of the fit are shown in Fig. 19. From the simultaneous target analysis of the four experiments, the rate of energy transfer from PB to PSII can be estimated (Fig. 18), together with the SAS of the five different species (Fig. 20), and the fractions of the different complexes (Table 1). They are megacomplex scaling parameters of the model, cf. the dPSI Jupyter notebook in the SI.

Fig. 18
figure 18

Minimal kinetic scheme of the PB–PSII complex with RCs open at room temperature (RT). Key: PC640 (cyan), PC650 (blue), APC660 (red), APC680 (black), and PSII Chl a (green). Functional compartmental model, with a zoom out of a rod consisting of three lumped hexamers in the upper right. The magenta APC660 compartment represents the top cylinder. The red rectangle indicates the two basal cylinders. All microscopic rate constants are in ns−1. The common decay rate constant for excited PC and APC states of 0.78 ns−1 has been omitted for clarity

Fig. 19
figure 19

Selected time traces of the emission at 4 wavelengths (indicated in the title of the panels) after 400 or 590 nm excitation at RT. Key: 400 TR2 (gray), 400 TR4 (cyan), 590 TR2 (orange), 590 TR4 (green). Black, blue, red, and dark green lines indicate the simultaneous target analysis fit of the four data sets. Note that the time axis is linear until 100 ps and logarithmic thereafter. Note also that each panel is scaled to its maximum. Overall rms error of the fit was 8.15

Table 1 Estimated fractions of the different complexes in the experiments at RT in the four experiments (in acquisition order)

The PB–PSII megacomplex contains ≈500 chromophores of five different types PC640, PC650, APC660, APC680 and Chl a (in PSII). The properties of the estimated SAS (Fig. 20B, D) are in agreement with the literature [47, 50]. The main finding of this case study is the rate of EET from APC680 to PSII in vivo of 50 ns−1. Pyglotaran enables the combination of the kinetic models for PB [50] and for PSII [47, 49], cf. Figure 18. These results are further discussed in [30].

Fig. 20
figure 20

Target analysis of the PB–PSII complex at RT. Total concentrations and SAS estimated after 590 (A, B) or 400 (C, D) nm exc. Key: PC640 (cyan), PC650 (blue), APC660 (red), APC680 (black), and PSII Chl a (green). Note that the time axis is linear until 100 ps and logarithmic thereafter

3.4 Conclusion from the case studies

Since with these complex systems, it is unfeasible to directly fit the data by a theoretical simulation, our global and target analysis methodology provides a useful ‘middle ground’ where the theoretical description and the fit of the experimental data can meet (Fig. 1). In the first case study, we employed DOAS and IRFAS to describe the vibrational wavepackets. Theoretical chemistry computations then complemented the interpretation of the target analysis results [25]. In the second case study, computations [42] confirmed that the estimated energy transfer rates from the red to the green chromophore are in agreement with the Förster resonance energy transfer mechanism. Such computations are not possible with the PB–PSII megacomplex which contains ≈500 chromophores. Here, the functional compartmental model (Fig. 18) is the best possible theoretical description.

4 Design of the lego-like problem-solving environment pyglotaran

The design of pyglotaran is based upon the well-known cycle of scientific discovery model specification–parameter estimation–model validation [16]. This cycle is illustrated in Fig. 21 and summarized for the rc and rcg case study in Table 2.

Fig. 21
figure 21

At the left, the model specification–parameter estimation–model validation cycle of scientific discovery. The logo on the right-hand side symbolizes the crucial role of pyglotaran in this cycle

Table 2 The scientific discovery cycle of the rc and rcg case study

4.1 Model specification

The model specification in pyglotaran is designed to be lego-like, allowing for the easy declaration and reuse of building blocks. The core of pyglotaran is the modeling language which is a declarative domain-specific language (DSL) that is designed to describe the behavior of systems in terms of their states and how they interact with one another, in a modular and composable manner. A DSL enables a user unfamiliar with scientific modeling, computing, and programming to express the analysis of complex systems without detailed knowledge about the interiors of pyglotaran. Pyglotaran is a very general implementation of separable problems, which enables usage beyond the kinetic models presented here. The DSL is split up into two parts, the parameter definitions and the model definitions referencing parameters from the parameter definitions by their name. Using the DSL, pyglotaran functions as an engine that interprets the model and parameter definitions and applies them to fit the data. To reduce the mental load for users and simplify the translation between the kinetic model and its description, pyglotaran allows and encourages to use meaningful and verbose names both in the model and in the parameter definitions (Figure S12). The DSL is further detailed in the SI section Modeling language and illustrated in the Jupyter notebooks. The plain text model description feature allows for the use of version control software (e.g., git), ensuring that all changes are tracked and recorded.

4.2 Parameter estimation

The parameter estimation distinguishes nonlinear parameters (nlp) and conditionally linear parameters (clp). The clp can be either nonnegative (with SAS) or unconstrained (with SADS, A, B, IRFAS) [29]. In the model definition, this is specified using “residual_function: non_negative_least_squares” and “residual_function: variable_projection”, respectively. To link the clp across multiple datasets, the option “link_clp: True” can be used.

Spectral relations between the clp can be specified, constraints to zero, and penalties based upon the area of the SA(D)S. For a relation between the clp of two components one would add to the model specification, e.g.:

figure b

Here, the clp of s1 and s2 are related with a scaling parameter (rel.r1) over the interval from 0 to 1000.

For a zero-constraint, one would add to the model specification, e.g.:

figure c

Here, the clp for the s12 component are forced to be zero in the interval from 1 to 1000.

For penalties based upon the (difference) in area of the SA(D)S [31] one would add to the model specification, e.g.:

figure d

Here, the difference in the area of the clp (which are the SAS of components s11 (PS2) and s1 in this case) in the interval between 1 and 1000 is penalized, where the area of the s1 SAS is scaled with the parameter area.PS2. The penalty itself is scaled with a weight of 0.1 in this case before adding it to the residual vector that affects the minimization process.

The examples given above are all taken from the ΔPSI mutant emission case study where they can be studied in context. Since the clp constraints decrease the amount of the free clp parameters, they are very important in the target analysis [7, 31].

The starting values for the nonlinear parameters can be specified in two ways. Initially, with the help of a parameters.yml file analogous to model.yml file described in the SI. After optimization a csv file of the estimated nonlinear parameters can be written, which can then more easily be modified in model refinement, and subsequently be used as the new starting values for the nonlinear parameters.

The actual parameter estimation process employs the nonlinear least squares function scipy.optimize.least_squares [51], which is based upon an optional optimization algorithm, and is demonstrated in the Jupyter notebooks in the SI. After the fit, summary statistics are computed, most importantly the rms error of the fit and the t-values of the estimated nonlinear parameters (Figure S5).

4.3 Model validation

The model validation process in the pyglotaran framework is essential for ensuring that the generated models are accurately capturing the underlying dynamics of the molecular systems. This process involves a series of steps, which we outline below, along with relevant figures and supplementary information that can be found in the Jupyter notebooks provided in the SI.

  1. 1.

    Plotting overlays of data and fits: Visual inspection of the fitted model against the experimental data is the first step in assessing the quality of the model (Fig. 2, Fig. 12, Fig. 19). This comparison helps to identify any significant deviations between the model's predictions and the observed data.

  2. 2.

    Analyzing residuals: Examining the matrix of residuals for each dataset provides valuable insight into the model's performance. By plotting the first left and right singular vectors resulting from the singular value decomposition (SVD) of the residual matrix (Fig. 14), researchers can identify systematic patterns in the residuals, which may indicate potential issues with the model.

  3. 3.

    Inspecting t-values of estimated parameters: After validating the residuals, it is essential to check the t-values of the estimated parameters (Figure S5). Ideally, t-values should be larger than two, indicating that the parameters are statistically significant.

  4. 4.

    Assessing the scientific interpretability of nonlinear parameters (nlp) and conditionally linear parameters (clp): Once the fit's residuals and estimated parameters are deemed acceptable, researchers must evaluate whether the obtained nlp and clp are scientifically interpretable. This process often marks the beginning of a new round in the scientific discovery cycle (Fig. 21), where researchers refine their models and hypotheses based on the insights gained from the analysis.

4.4 Reporting and conclusion

An essential aspect of the scientific process, not fully covered by Fig. 21 is the effective reporting and communication of the results. Clear and concise reporting of the models, their parameters, and validation outcomes is crucial for the broader scientific community to understand, evaluate, and build upon the findings. In this context, the Jupyter notebook-style interface of pyglotaran proves to be highly advantageous.

The Jupyter notebook-style interface, typically implemented through Jupyter notebooks, allows researchers to combine code, output, visualizations, and descriptive text in a single, interactive document. This format greatly facilitates the reporting process by enabling researchers to:

  1. 1.

    Document their work in a transparent and reproducible manner: The Jupyter notebook interface makes it easy to share the complete analysis pipeline, from data pre-processing to model validation, with colleagues and collaborators.

  2. 2.

    Visualize and explain their results: Pyglotaran's integration with popular Python plotting libraries, such as matplotlib [52], allows for the creation of compelling and informative visualizations. Researchers can seamlessly incorporate these visualizations into the Jupyter notebook, alongside explanations of their significance and interpretation.

  3. 3.

    Collaborate and share their findings: Jupyter notebooks can be easily shared with collaborators, who can then review, modify, or extend the analysis. Moreover, the Jupyter notebook format is conducive to sharing research findings in online repositories or supplementary materials, allowing for greater visibility and accessibility of the results.

The Jupyter notebook-style interface plays a pivotal role in streamlining the reporting process, fostering collaboration, and promoting transparency in the scientific discovery process. By enabling researchers to effectively communicate their findings, pyglotaran not only contributes to the advancement of time-resolved spectroscopy analysis but also supports the broader scientific community in uncovering new insights and understanding of complex molecular systems.

4.5 Software engineering: the pyglotaran ecosystem

The development of pyglotaran [22] is standing on the shoulders of giants in multiple ways. On the one hand, it benefits from decades of knowledge and lessons learned by interacting with users of predecessor software TIM [23], TIMP [16], and Glotaran [18]. On the other hand, it relies on battle-proven Python scientific libraries like scipy [51], numpy [53], numba [54] and xarray [55], instead of trying to reinvent the wheel.

To ensure efficient development and high-quality code, several key practices have been implemented in the development of the pyglotaran [22] ecosystem (see also the SI section Software development):

  1. 1.

    Version control: Managed through Git and GitHub, using the GitHub flow model and branch protection to manage changes and ensure code quality. This enables multiple developers to collaborate on the codebase simultaneously while maintaining version history and control over changes.

  2. 2.

    Organization: All development happens within the Glotaran organization on GitHub, which, besides the new Python projects, also contains the legacy projects Glotaran [18] and TIMP [16]. These legacy projects are still maintained but not further developed. The most notable components of the pyglotaran ecosystem include pyglotaran extras, pyglotaran examples, and pyglotaran validation, which together form the basis for the pyglotaran validation framework.

  3. 3.

    Code structure: Code is organized using packages and modules, making it easier to navigate and manage the codebase.

  4. 4.

    Quality assurance: Linters, formatters, and type checkers are used to catch errors and enforce consistency in the code.

  5. 5.

    Continuous Integration and Delivery (CI/CD): These processes automate the building and testing of the software, ensuring that the code is always in a working state (Figure S15, Figure S16, Figure S17, Figure S18).

  6. 6.

    Documentation: Automated documentation, both generated and manually curated, is provided to facilitate understanding of the codebase and help new users get up to speed quickly.

  7. 7.

    Dependency management: Automated dependency updates ensure that the software remains up-to-date with the latest libraries/frameworks and potential problems are discovered early.

  8. 8.

    Deployment: Handled through PyPI and Conda-forge, making it easier for users to install and use the software.

Overall, these practices ensure that pyglotaran is a high-quality, well-maintained software package that is efficient to develop, test, and use.

4.6 Glotaran versus pyglotaran

The pyglotaran project was developed based on the lessons learned with glotaran + TIMP and its support. While the learning curve of pyglotaran is steeper it provides a lot more capabilities, extendability and customizability. Due to its text-based nature, it can in principle be integrated into a GUI. Table 3 contains a feature comparison of glotaran + TIMP and pyglotaran.

Table 3 Feature comparison of glotaran + TIMP and pyglotaran