Abstract
The dynamics of molecular systems can be studied with time-resolved spectroscopy combined with model-based analysis. A Python framework for global and target analysis of time-resolved spectra is introduced with the help of three case studies. The first study, concerning broadband absorption of intersystem crossing in 4-thiothymidine, demonstrates the framework's ability to resolve vibrational wavepackets with a time resolution of ≈10 fs using damped oscillations and their associated spectra and phases. Thereby, a parametric description of the “coherent artifact” is crucial. The second study addresses multichromophoric systems composed of two perylene bisimide chromophores. Here, pyglotaran's guidance spectra and lego-like model composition enable the integration of spectral and kinetic properties of the parent chromophores, revealing a loss process, the undesired production of a radical pair, that reduces the light harvesting efficiency. In the third, time-resolved emission case study of whole photosynthetic cells, a megacomplex containing ≈500 chromophores of five different types is described by a combination of the kinetic models for its elements. As direct fitting of the data by theoretical simulation is unfeasible, our global and target analysis methodology provides a useful ‘middle ground’ where the theoretical description and the fit of the experimental data can meet. The pyglotaran framework enables the lego-like creation of kinetic models through its modular design and seamless integration with the rich Python ecosystem, particularly Jupyter notebooks. With extensive documentation and a robust validation framework, pyglotaran ensures accessibility and reliability for researchers, serving as an invaluable tool for understanding complex molecular systems.
Graphical abstract
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Time-resolved spectroscopy is widely used in photochemistry and photobiology for investigating the dynamic properties of complex systems [1,2,3,4]. The global and target analysis methodology has been developed to model multidimensional datasets from these systems [5,6,7,8,9]. Here, global refers to a simultaneous analysis of all measurements; whereas, target refers to the applicability of a particular target model [10]. Fred Brouwer inspired the early developments of global and target analysis [11,12,13]. Several tools for global and target analysis exist in the public domain [14,15,16,17,18,19,20,21]. While some of these tools provide, as pyglotaran, a modular and Python-based approach, they have different foci for the analysis. The aim of this paper is to introduce pyglotaran [22], a lego-like Python problem-solving environment [23] for global and target analysis of time-resolved spectra. Several new features that are now becoming publicly available will be demonstrated with the help of an in depth presentation of two recent transient absorption case studies [24, 25], namely the usage of guidance spectra and the parametric description of the “coherent artifact” (CA) [2, 26]. Vibrational wavepackets [27, 28] will be described with the help of damped oscillations [26, 29]. Finally, in a third case study, the time-resolved emission of whole photosynthetic cells can be described by the contributions of all the megacomplexes present, thus resolving the energy transfer pathways [30] with the help of spectral area constraints [31]. These case studies explore systems containing 1, 2, or 3, and ≈500 chromophores, with compartmental models [32] describing 5 or 6 spectrally distinct species or states. The temporal resolution ranges from ≈10 fs to ≈10 ps.
The paper is organized as follows: We assume the reader is familiar with the basics of global and target analysis [7], with some foundational information included in the Supplementary Information (SI) Jupyter notebooks [33]. In the methods section, we summarize the methodological advancements since 2004, which are then illustrated through the case studies in the results and discussion section. Next, we present the design of the pyglotaran problem-solving environment and discuss software engineering aspects and future developments. In the results and discussion section, we provide a concise description of the salient analysis results, with extended descriptions available in the SI Jupyter notebooks. Interested readers can download the Jupyter notebooks and pre-processed data to reproduce all results. Pyglotaran extends the FAIR data principles [34] by not only enabling users to share their data, pre-processing steps, and analysis results, but also by open-sourcing its software analysis tools and providing Jupyter notebooks for full reproducibility. To demonstrate this commitment to transparency and reproducibility, readers can download the Jupyter notebooks and pre-processed data associated with this paper to reproduce all the presented results, thus fostering a more collaborative and robust scientific community.
2 Methods
Crucial to virtually all global and target analysis is the superposition principle, expressing that the response of a complex system can be described by a linear superposition of the contributions from several components. We will first consider transient difference absorption spectroscopy, and then time-resolved emission spectroscopy. The superposition principle is schematically depicted in Fig. 1 for a single data set, that will be explained in detail in the first case study. The data matrix (first row) shows a complex spectral evolution (second row) with damped oscillations (third row) and a pronounced coherent artifact straddling time zero (fourth row, cf. the trace at 600.5 nm in the first row). The fifth row depicts the analysis of the remaining structure in the residual matrix of the fit.
In broadband absorption spectroscopy [35, 36], the evolution of the ground- and excited-state vibrational wave packets created by the short laser pulse is described with a superposition of damped oscillations. The amplitude of a damped oscillation \(\cos (\omega_{n} t)\exp ( - \gamma_{n} t)\) as a function of the detection wavelength constitutes a damped oscillation-associated spectrum (\(DOAS_{n} (\lambda )\)) with an accompanying wavelength-dependent phase \(\varphi_{n} (\lambda )\) [29] (cf. row three of Fig. 1). When the vibrational evolution can be considered independently from the electronic evolution (Born–Oppenheimer approximation), we arrive at a superposition of the electronic and vibrational contributions to the time-resolved spectrum (TRS):
where \(N_{states}\) electronically excited states are present in the system, with populations \(c_{l}^{S} (t)\)(superscript S stands for species), and species’ spectral properties, the species associated difference spectra (\(SADS_{l} (\lambda )\)) (cf. row two of Fig. 1). The populations are determined by an unknown compartmental model [32], that depends upon the unknown kinetic parameters \(\theta\). In the target analysis constraints on the SADS are needed to estimate all parameters \(\theta\) and \(SADS_{l} (\lambda )\). \(t^{\prime}\) indicates that the actual model function still must consider the instrument response function (IRF) by means of convolution. The next term describes the coherent artifact, with a weighted sum of the zeroth, first and second derivative of the IRF \(i(t)\) [2] (cf. row four of Fig. 1). Finally, the residual represents the part of the data that is not described by the parameterized model (cf. row five of Fig. 1).
For every wavelength, the matrix formula for this superposition model is given by
where the matrix \(C^{S}\) consists of columns \(c_{l}^{S} (t)\). A Gaussian shaped IRF is used, with parameters μ for the time of the IRF maximum and Δ for the full width at half maximum (FWHM) of the IRF. The matrices \({\text{Cos}} (\omega ,\gamma ,\mu ,\Delta )\) and \(Sin(\omega ,\gamma ,\mu ,\Delta )\) contain the damped oscillations, and the matrices A and B comprise their amplitudes. To limit the number of free parameters, we assume wavelength independence of the eigenfrequency \(\omega_{n}\) and of the damping rate \(\gamma_{n}\).
The final term, which describes the coherent artifact, contains a matrix \(IRF(\mu ,\Delta ^{\prime})\) with as columns the \(i^{(m)} (t)\). The SADS and IRFAS and also the amplitudes A and B are unconstrained conditionally linear parameters, that can be implicitly solved for (per wavelength) using the variable projection algorithm [37, 38].
When the IRF width Δ is larger than ≈150 fs the damped oscillations will be virtually averaged out, and for every wavelength, the matrix formula for this superposition model reduces to [40]:
In time-resolved emission spectroscopy, the IRF is generally much wider than 150 fs, and the presence of a possible scatter component can be described by the IRF shape (i.e., the zeroth derivative). The species’ spectral properties are called the Species Associated Spectra (\(SAS_{l} (\lambda )\)) and we have for every wavelength (disregarding a possible scatter component)
Since emission cannot be negative, the SAS are nonnegative conditionally linear parameters, that can be implicitly solved for (per wavelength) [39, 40].
In most experiments, the location of the maximum of the IRF is wavelength dependent. This so-called dispersion can well be described by a polynomial function of the wavelength [7] or of its reciprocal, the wavenumber [2]. This introduces, typically, 1–3 “nuisance” parameters. Moreover, the whole kinetic model must be recomputed for every wavelength, which greatly increases the computation time. An independent experiment that involves a coherent artifact with dispersion must also be modeled with the above formulas [26].
3 Results and discussion
3.1 Broadband absorption case study of intersystem crossing in 4-thiothymidine
In a recent study, it was demonstrated that coherent vibrational modes promote the ultrafast internal conversion and intersystem crossing in thiobases [25]. Here, we will present the target analysis of 4-thiothymidine (4TT). The structure and steady-state spectra of 4TT are shown in Figure S1. The IRF width of ≈35 fs FWHM enables the resolution of damped oscillations that can be assigned to vibrational modes. The coherent artifact is clearly visible in the trace at 600.5 nm in Fig. 2, with large contributions of the IRF derivatives (Fig. 3).
We employ a sequential kinetic scheme with four states which we tentatively name S2, S1, hot T1’, and T1, i.e., we assume the kinetic scheme S2 → S1 → hot T1’ → T1. The populations and estimated SADS are shown in Fig. 4. The kink at 1 ps in Fig. 4A results from the time axis being linear until 1 ps, and logarithmic thereafter [41].
Four damped oscillations have been used, one of which was time-reversed and is attributed to the coherent artifact. The DOAS and phases of the other three damped oscillations are shown in Figure S2.
Although at first glance the fit looks satisfactory (Fig. 2, bottom) and the rms error of 0.12 mOD is small, the residual matrix of the fit still shows some structure, cf. the vertical lines in Fig. 5A. These lines can tentatively be attributed to pump laser intensity fluctuations. This suggests refining the analysis and correct for these laser intensity fluctuations by estimating whether the residual spectrum at a certain time is proportional to the data. If so, the data can, thus, be corrected. This is demonstrated in Figure S3. The rms error decreases to 0.065 mOD, and the residuals show virtually no more structure (Figure S4).
It is difficult to interpret the S1 SADS, red in Fig. 4B, it still contains substantial stimulated emission around 430 nm, suggesting that it is a mixture of relaxed S2 and S1. Therefore, we adopted the kinetic scheme in Fig. 6 which contains five states, S2, relaxed S2’, S1, hot T1’ and T1. Associated with these states, there are five lifetimes. Several kinetic schemes have been tested, until we arrived at the scheme of Fig. 6 with well interpretable SADS that are in accordance with the theory [25]. The precision of the estimated parameters is reported in the result object (Figure S5). The relaxed S2’ SADS (green in Fig. 7) is free from triplet (blue in Fig. 7) features. The S1 SADS (red) shows two ESA bands and a small SE around 430 nm.
The contribution of the damped oscillations to the fit and the quality of the fit are demonstrated for the 510 nm data in Fig. 8. Note that the main decay of the damped oscillations (turquoise) corresponds to the main decay of S2 (black), suggesting that these oscillations live on (relaxed) S2. In the refined target analysis four DOAS have been resolved, the vertical lines in Fig. 9 indicate nodes concomitant with a phase jump in the 458 cm−1 DOAS (red, at 415 and 460 nm), in the 213 cm−1 DOAS (blue, at 393, 429, and 479 nm), and in the 847 cm−1 DOAS (green, at 526 nm). These results are further discussed in [25].
3.2 Transient absorption case study of the chromophoric systems rc and rcg
The primary event in molecule-based light energy conversion systems is light harvesting. We studied perylene bisimide-calix[4]arene multichromophoric systems composed of two different types of perylene bisimide (PBI) chromophores, red (r), and green (g) PBIs (named after their colors as solids) connected by calix[4]arene (c) [24, 42]. Figure 10A depicts the chemical structure of the supramolecular system rcg, and the absorption and emission properties of the parent chromophores rc and gc are shown in Fig. 10B. Due to the excellent overlap of the rc emission and the gc absorption (dotted red and solid green lines in Fig. 10B) and the close proximity fast Förster excitation energy transfer (EET) is found after excitation of the r moiety [42]. However, the rc chromophore has inherent dynamic properties, which must be considered in a target analysis.
An important part of the data analysis is the pre-processing of the raw data. In the rc Jupyter notebook in the supplementary information, it is demonstrated how a global analysis is used to demonstrate the presence of a pre-zero baseline in the data (Figure S6, Figure S7), and how this baseline can then be estimated (Figure S8) and subtracted from the raw data. Throughout this manuscript, we refer to pre-processed data.
After 530 nm excitation of rc in CH2Cl2, the coherent artifact is well described by the wavelength-dependent \(IRF(\mu ,\Delta ^{\prime}) \cdot IRFAS\) (Fig. 11). Representative traces demonstrating the excellent quality of the fit are shown in Fig. 12.
The extensive spectral evolution of rc can be described with four excited states r1 → r2 → r3 → r4 → ground state, resulting in four rate constants \(k_{r2,r1} ,k_{r3,r2} ,k_{r4,r3} ,k_{r4}\) (where we use the \(k_{to,from}\) convention, and \(k_{r4}\) denotes the decay rate to the ground state) and four rc-SADS. This sequential scheme (Fig. 13A) neglects the branching decay to the ground state of the first three states, since \(k_{r4}\) is much smaller than the other three rate constants. As a refinement one could add this decay channel, assuming a rate of decay to the ground state \(k_{r4}\) for all four states. The perylene red chromophore shows a strong spectral evolution in time, especially from 550 to 750 nm (Fig. 13B).
The first target analysis of rcg considers that the 530 nm also directly excites the g moiety (≈12% of the r absorption, Fig. 10B) and to the rc-kinetic scheme the four EET to g rate constants \(k_{g,r1} ,k_{g,r2} ,k_{g,r3} ,k_{g,r4}\) are added. By plotting the first left and right singular vectors resulting from the singular value decomposition (SVD) of the residual matrix, we can identify systematic patterns in the residuals, which may indicate potential issues with the model. Here, the fit using this model is unsatisfactory since the left and right singular vectors of the residual matrix (Fig. 14A, B) show large trends, especially during the first 10 ps around 590 and 780 nm.
Therefore, a loss process is introduced: the formation of a radical pair state, called rcgRP, from r1 or r2, which requires two new rate constants \(k_{rcgRP,r1} ,k_{rcgRP,r2}\). Thus, the rcg system can be described by the four rc-SADS, the g-SADS and the rcgRP-SADS, and the twelve rate constants: \(k_{r2,r1} ,k_{r3,r2} ,k_{r4,r3} ,k_{r4}\);\(k_{{g,r_{1} }} ,k_{{g,r_{2} }} ,k_{{g,r_{3} }} ,k_{{g,r_{4} }}\);\(k_{rcgRP,r1} ,k_{rcgRP,r2}\) and the decay rates to the ground state \(k_{g} ,k_{{{\mathbf{rcgRP}}}}\). This kinetic scheme is schematically depicted in Fig. 15B and the differential equation is shown in Figure S10.
The main results from the rc in CH2Cl2 experiment are the four rate constants (Fig. 13A) and the four SADS (Fig. 13B). The four rate constants are used in the kinetic scheme of rcg (Fig. 15B). The estimated rc-SADS are used to guide the SADS of the r1, r2, r3, r4 species in the target analysis of rcg by adding the rc-SADS as data to be fitted using the rcg-SADS. This is demonstrated in Fig. 16, where the dashed lines indicate the fit. The formulas for the fitting of the guidance SADS are shown in Figure S10. The usage of the guidance spectra allows for some flexibility, to accommodate small differences in the experimental conditions or the wavelength calibration or the white light of the probe, when the experiments have been performed on different days. It circumvents the more complicated simultaneous analysis of multiple datasets by selectively adding only the relevant information, i.e., here the four rc-SADS and the g-SADS. Without the guidance SADS it would have been impossible to take the r* spectral evolution properly into account in the rcg target analysis, since, e.g., the population of r4 is very small (purple in Fig. 17A).
rcg shows the typical spectral evolution of the r chromophore, as well as EET to g (see the 730 nm bleach in the g SADS) and r−. formation, characterized by the 575 nm bleach and the 780 nm absorption (black SADS). The trichromophoric systems rcgcr and gcrcg (Figure S9) can be analyzed with slightly modified versions of this kinetic scheme resulting in the concentrations depicted with dotted and dashed lines in Fig. 17A. Note that in gcrcg which contains two accepting chromophores, the g population (dashed green) rises faster, but also that of the rcgRP (dashed black), cf. also the amplitude matrices in the gcrcg Jupyter notebook in the SI. These results are further discussed in [24].
3.3 Time-resolved emission case study of whole photosynthetic cells
The time-resolved emission spectrum of whole photosynthetic cells contains the contributions from all the pigment–protein complexes present. In cyanobacteria the phycobilisome (PB) is the light harvesting antenna, which contains phycocyanin (PC) and allophycocyanin (APC) pigments that absorb the light between 400 and 650 nm. The excitations of the antenna pigments are efficiently transferred to the chlorophyll-containing photosystems (PS) I and II [44]. Megacomplexes consisting of PB, PSI, and PSII have been demonstrated [45]. ΔPSI mutants which lack PSI [46] have been used as a model system to study the properties of the PB–PSII megacomplex [30]. PSII shows different properties when the reaction center (RC) is in the open or in the closed state [47]. To model the EET rates in a whole cell, we, thus, distinguish three basic types of megacomplexes (Figure S11): PB–PSII with PSII in the open or the closed state and non-transferring PB. Free PSII, not receiving PB input, cannot be distinguished from the PSII in a PB–PSII megacomplex. The minimal kinetic scheme of PB consists of ten compartments [48]: three core cylinders with a connected rod. Each rod consists of PC640 (cyan) and PC650 (blue) compartments, the top cylinder contains 24 APC660 pigments (magenta), the two basal cylinders consist of disks with only APC660 pigments (red) and disks with APC660 (orange) and APC680 pigments (black). APC680 is the terminal emitter that transfers the PB excitations to PSII. The biexponential decay of the PSII dimer emission is described by an equilibrium of a Chl a compartment (PSII open, green) with a radical pair (RP) compartment [47, 49]. Spectral equality constraints are employed linking the SAS of the PC640, PC650, APC660 compartments. Thus, together with APC680 and PSII there are only five different SAS. The RP SAS is by definition zero. Spectral area constraints [31] have been used to estimate the equilibria. The parameters of the PB model have been taken from [48].
To collect enough information four experiments have been done, preferentially exciting the PB (590 nm excitation) or the PSII Chla (400 nm excitation), with a shorter or longer time range (TR2, IRF ≈7 ps FWHM and TR4, IRF ≈18 ps FWHM). Representative traces demonstrating the excellent quality of the fit are shown in Fig. 19. From the simultaneous target analysis of the four experiments, the rate of energy transfer from PB to PSII can be estimated (Fig. 18), together with the SAS of the five different species (Fig. 20), and the fractions of the different complexes (Table 1). They are megacomplex scaling parameters of the model, cf. the dPSI Jupyter notebook in the SI.
The PB–PSII megacomplex contains ≈500 chromophores of five different types PC640, PC650, APC660, APC680 and Chl a (in PSII). The properties of the estimated SAS (Fig. 20B, D) are in agreement with the literature [47, 50]. The main finding of this case study is the rate of EET from APC680 to PSII in vivo of 50 ns−1. Pyglotaran enables the combination of the kinetic models for PB [50] and for PSII [47, 49], cf. Figure 18. These results are further discussed in [30].
3.4 Conclusion from the case studies
Since with these complex systems, it is unfeasible to directly fit the data by a theoretical simulation, our global and target analysis methodology provides a useful ‘middle ground’ where the theoretical description and the fit of the experimental data can meet (Fig. 1). In the first case study, we employed DOAS and IRFAS to describe the vibrational wavepackets. Theoretical chemistry computations then complemented the interpretation of the target analysis results [25]. In the second case study, computations [42] confirmed that the estimated energy transfer rates from the red to the green chromophore are in agreement with the Förster resonance energy transfer mechanism. Such computations are not possible with the PB–PSII megacomplex which contains ≈500 chromophores. Here, the functional compartmental model (Fig. 18) is the best possible theoretical description.
4 Design of the lego-like problem-solving environment pyglotaran
The design of pyglotaran is based upon the well-known cycle of scientific discovery model specification–parameter estimation–model validation [16]. This cycle is illustrated in Fig. 21 and summarized for the rc and rcg case study in Table 2.
4.1 Model specification
The model specification in pyglotaran is designed to be lego-like, allowing for the easy declaration and reuse of building blocks. The core of pyglotaran is the modeling language which is a declarative domain-specific language (DSL) that is designed to describe the behavior of systems in terms of their states and how they interact with one another, in a modular and composable manner. A DSL enables a user unfamiliar with scientific modeling, computing, and programming to express the analysis of complex systems without detailed knowledge about the interiors of pyglotaran. Pyglotaran is a very general implementation of separable problems, which enables usage beyond the kinetic models presented here. The DSL is split up into two parts, the parameter definitions and the model definitions referencing parameters from the parameter definitions by their name. Using the DSL, pyglotaran functions as an engine that interprets the model and parameter definitions and applies them to fit the data. To reduce the mental load for users and simplify the translation between the kinetic model and its description, pyglotaran allows and encourages to use meaningful and verbose names both in the model and in the parameter definitions (Figure S12). The DSL is further detailed in the SI section Modeling language and illustrated in the Jupyter notebooks. The plain text model description feature allows for the use of version control software (e.g., git), ensuring that all changes are tracked and recorded.
4.2 Parameter estimation
The parameter estimation distinguishes nonlinear parameters (nlp) and conditionally linear parameters (clp). The clp can be either nonnegative (with SAS) or unconstrained (with SADS, A, B, IRFAS) [29]. In the model definition, this is specified using “residual_function: non_negative_least_squares” and “residual_function: variable_projection”, respectively. To link the clp across multiple datasets, the option “link_clp: True” can be used.
Spectral relations between the clp can be specified, constraints to zero, and penalties based upon the area of the SA(D)S. For a relation between the clp of two components one would add to the model specification, e.g.:
Here, the clp of s1 and s2 are related with a scaling parameter (rel.r1) over the interval from 0 to 1000.
For a zero-constraint, one would add to the model specification, e.g.:
Here, the clp for the s12 component are forced to be zero in the interval from 1 to 1000.
For penalties based upon the (difference) in area of the SA(D)S [31] one would add to the model specification, e.g.:
Here, the difference in the area of the clp (which are the SAS of components s11 (PS2) and s1 in this case) in the interval between 1 and 1000 is penalized, where the area of the s1 SAS is scaled with the parameter area.PS2. The penalty itself is scaled with a weight of 0.1 in this case before adding it to the residual vector that affects the minimization process.
The examples given above are all taken from the ΔPSI mutant emission case study where they can be studied in context. Since the clp constraints decrease the amount of the free clp parameters, they are very important in the target analysis [7, 31].
The starting values for the nonlinear parameters can be specified in two ways. Initially, with the help of a parameters.yml file analogous to model.yml file described in the SI. After optimization a csv file of the estimated nonlinear parameters can be written, which can then more easily be modified in model refinement, and subsequently be used as the new starting values for the nonlinear parameters.
The actual parameter estimation process employs the nonlinear least squares function scipy.optimize.least_squares [51], which is based upon an optional optimization algorithm, and is demonstrated in the Jupyter notebooks in the SI. After the fit, summary statistics are computed, most importantly the rms error of the fit and the t-values of the estimated nonlinear parameters (Figure S5).
4.3 Model validation
The model validation process in the pyglotaran framework is essential for ensuring that the generated models are accurately capturing the underlying dynamics of the molecular systems. This process involves a series of steps, which we outline below, along with relevant figures and supplementary information that can be found in the Jupyter notebooks provided in the SI.
-
1.
Plotting overlays of data and fits: Visual inspection of the fitted model against the experimental data is the first step in assessing the quality of the model (Fig. 2, Fig. 12, Fig. 19). This comparison helps to identify any significant deviations between the model's predictions and the observed data.
-
2.
Analyzing residuals: Examining the matrix of residuals for each dataset provides valuable insight into the model's performance. By plotting the first left and right singular vectors resulting from the singular value decomposition (SVD) of the residual matrix (Fig. 14), researchers can identify systematic patterns in the residuals, which may indicate potential issues with the model.
-
3.
Inspecting t-values of estimated parameters: After validating the residuals, it is essential to check the t-values of the estimated parameters (Figure S5). Ideally, t-values should be larger than two, indicating that the parameters are statistically significant.
-
4.
Assessing the scientific interpretability of nonlinear parameters (nlp) and conditionally linear parameters (clp): Once the fit's residuals and estimated parameters are deemed acceptable, researchers must evaluate whether the obtained nlp and clp are scientifically interpretable. This process often marks the beginning of a new round in the scientific discovery cycle (Fig. 21), where researchers refine their models and hypotheses based on the insights gained from the analysis.
4.4 Reporting and conclusion
An essential aspect of the scientific process, not fully covered by Fig. 21 is the effective reporting and communication of the results. Clear and concise reporting of the models, their parameters, and validation outcomes is crucial for the broader scientific community to understand, evaluate, and build upon the findings. In this context, the Jupyter notebook-style interface of pyglotaran proves to be highly advantageous.
The Jupyter notebook-style interface, typically implemented through Jupyter notebooks, allows researchers to combine code, output, visualizations, and descriptive text in a single, interactive document. This format greatly facilitates the reporting process by enabling researchers to:
-
1.
Document their work in a transparent and reproducible manner: The Jupyter notebook interface makes it easy to share the complete analysis pipeline, from data pre-processing to model validation, with colleagues and collaborators.
-
2.
Visualize and explain their results: Pyglotaran's integration with popular Python plotting libraries, such as matplotlib [52], allows for the creation of compelling and informative visualizations. Researchers can seamlessly incorporate these visualizations into the Jupyter notebook, alongside explanations of their significance and interpretation.
-
3.
Collaborate and share their findings: Jupyter notebooks can be easily shared with collaborators, who can then review, modify, or extend the analysis. Moreover, the Jupyter notebook format is conducive to sharing research findings in online repositories or supplementary materials, allowing for greater visibility and accessibility of the results.
The Jupyter notebook-style interface plays a pivotal role in streamlining the reporting process, fostering collaboration, and promoting transparency in the scientific discovery process. By enabling researchers to effectively communicate their findings, pyglotaran not only contributes to the advancement of time-resolved spectroscopy analysis but also supports the broader scientific community in uncovering new insights and understanding of complex molecular systems.
4.5 Software engineering: the pyglotaran ecosystem
The development of pyglotaran [22] is standing on the shoulders of giants in multiple ways. On the one hand, it benefits from decades of knowledge and lessons learned by interacting with users of predecessor software TIM [23], TIMP [16], and Glotaran [18]. On the other hand, it relies on battle-proven Python scientific libraries like scipy [51], numpy [53], numba [54] and xarray [55], instead of trying to reinvent the wheel.
To ensure efficient development and high-quality code, several key practices have been implemented in the development of the pyglotaran [22] ecosystem (see also the SI section Software development):
-
1.
Version control: Managed through Git and GitHub, using the GitHub flow model and branch protection to manage changes and ensure code quality. This enables multiple developers to collaborate on the codebase simultaneously while maintaining version history and control over changes.
-
2.
Organization: All development happens within the Glotaran organization on GitHub, which, besides the new Python projects, also contains the legacy projects Glotaran [18] and TIMP [16]. These legacy projects are still maintained but not further developed. The most notable components of the pyglotaran ecosystem include pyglotaran extras, pyglotaran examples, and pyglotaran validation, which together form the basis for the pyglotaran validation framework.
-
3.
Code structure: Code is organized using packages and modules, making it easier to navigate and manage the codebase.
-
4.
Quality assurance: Linters, formatters, and type checkers are used to catch errors and enforce consistency in the code.
-
5.
Continuous Integration and Delivery (CI/CD): These processes automate the building and testing of the software, ensuring that the code is always in a working state (Figure S15, Figure S16, Figure S17, Figure S18).
-
6.
Documentation: Automated documentation, both generated and manually curated, is provided to facilitate understanding of the codebase and help new users get up to speed quickly.
-
7.
Dependency management: Automated dependency updates ensure that the software remains up-to-date with the latest libraries/frameworks and potential problems are discovered early.
-
8.
Deployment: Handled through PyPI and Conda-forge, making it easier for users to install and use the software.
Overall, these practices ensure that pyglotaran is a high-quality, well-maintained software package that is efficient to develop, test, and use.
4.6 Glotaran versus pyglotaran
The pyglotaran project was developed based on the lessons learned with glotaran + TIMP and its support. While the learning curve of pyglotaran is steeper it provides a lot more capabilities, extendability and customizability. Due to its text-based nature, it can in principle be integrated into a GUI. Table 3 contains a feature comparison of glotaran + TIMP and pyglotaran.
Data availability statement
The Jupyter notebooks and the preprocessed data can be downloaded from https://github.com/glotaran/pyglotaran-releasepaper-supplementary-information/releases, so that the reader can reproduce all results.
References
Holzwarth, A. R. (1995). Time-resolved fluorescence spectroscopy, in Methods in Enzymology. Academic Press, 246, 334–362.
Kovalenko, S. A., Dobryakov, A. L., Ruthmann, J., & Ernsting, N. P. (1999). Femtosecond spectroscopy of condensed phases with chirped supercontinuum probing. Physical Review A, 59, 2369–2384.
vandeVen, M., Ameloot, M., Valeur, B. & Boens, N. (2005) Pitfalls and Their Remedies in Time-Resolved Fluorescence Spectroscopy and Microscopy, Journal of Fluorescence, 15, 377–413.
Berera, R., van Grondelle, R., & Kennis, J. T. M. (2009). Ultrafast transient absorption spectroscopy: Principles and application to photosynthetic systems. Photosynthesis Research, 101, 105–118.
Beechem, J. M., Ameloot, M., & Brand, L. (1985). Global and Target Analysis of Complex Decay Phenomena. Instrumentation Science & Technology, 14, 379–402.
Holzwarth, A. (1996) Data analysis of time-resolved measurements, in Biophysical Techniques in Photosynthesis, eds. J. Amesz and A. Hoff, Kluwer Academic Press, Dordrecht, pp. 75–92.
van Stokkum, I. H. M., Larsen, D. S., & van Grondelle, R. (2004). Global and target analysis of time-resolved spectra. Biochimica Et Biophysica Acta, 1657, 82–104.
van Stokkum, I. H. M., Larsen, D. S., & van Grondelle, R. (2004). Erratum to “Global and target analysis of time-resolved spectra.” Biochimica Et Biophysica Acta, 1658, 262–262.
Ruckebusch, C., Sliwa, M., Pernot, P., de Juan, A., & Tauler, R. (2012). Comprehensive data analysis of femtosecond transient absorption spectra: A review. Journal of Photochemistry and Photobiology C: Photochemistry Reviews, 13, 1–27.
Arcioni, A., & Zannoni, C. (1984). Intensity deconvolution in fluorescence depolarization studies of liquids, liquid crystals and membranes. Chemical Physics, 88, 113–128.
van Stokkum, I. H. M., Brouwer, A. M., van Ramesdonk, H. J. & Scherer, T. (1993) Multiresponse parameter estimation and compartmental analysis of time resolved fluorescence spectra: Application to conformational dynamics of charge-separated species in solution. Proc. Kon. Ned. Akad. v. Wetensch., 96, 43–68.
Hoff, W. D., Van Stokkum, I. H. M., Van Ramesdonk, H. J., Van Brederode, M. E., Brouwer, A. M., Fitch, J. C., . . . Hellingwerf, K. J., (1994). Measurement and Global Analysis of the Absorbency Changes in the Photocycle of the Photoactive Yellow Protein from Ectothiorhodospira-Halophila, Biophysical Journal, 67, 1691–1705.
van Stokkum, I. H. M., Scherer, T., Brouwer, A. M., & Verhoeven, J. W. (1994). Conformational dynamics of flexibly and semirigidly bridged electron donor-acceptor systems as revealed by spectrotemporal parametrization of fluorescence. Journal of Physical Chemistry, 98, 852–866.
Beechem, J. M. (1989). A second generation global analysis program for the recovery of complex inhomogeneous fluorescence decay kinetics. Chemistry and Physics of Lipids, 50, 237–251.
Dioumaev, A. K. (1997). Evaluation of intrinsic chemical kinetics and transient product spectra from time-resolved spectroscopic data. Biophysical Chemistry, 67, 1–25.
Mullen, K. M., & van Stokkum, I. H. M. (2007). TIMP: An R Package for Modeling Multi-way Spectroscopic Measurements. Journal of Statistical Software, 18, 1–46.
van Wilderen, L. J. G. W., Lincoln, C. N., & van Thor, J. J. (2011). Modelling Multi-Pulse Population Dynamics from Ultrafast Spectroscopy. PLoS ONE, 6, e17373.
Snellenburg, J. J., Laptenok, S. P., Seger, R., Mullen, K. M., & van Stokkum, I. H. M. (2012). Glotaran: A Java-based Graphical User Interface for the R-package TIMP. Journal of Statistical Software, 49, 1–22.
Slavov, C., Hartmann, H., & Wachtveitl, J. (2015). Implementation and Evaluation of Data Analysis Strategies for Time-Resolved Optical Spectroscopy. Analytical Chemistry, 87, 2328–2336.
Müller, C., Pascher, T., Eriksson, A., Chabera, P., & Uhlig, J. (2022). KiMoPack: A python Package for Kinetic Modeling of the Chemical Mechanism. The Journal of Physical Chemistry A, 126, 4087–4099.
Uhlig, J. (2022). KiMoPack - Open source tool for the analysis of transient spectral data. https://doi.org/10.5281/zenodo.6049186
Weißenborn, J., Snellenburg, J. J., Weigand, S. & van Stokkum, I. H. M. (2022) pyglotaran: a Python library for global and target analysis, https://doi.org/10.5281/zenodo.4534043
van Stokkum, I. H. M., & Bal, H. E. (2006). A Problem Solving Environment for interactive modelling of multiway data. Concurrency and computation: Practice and experience, 18, 263–269.
van Stokkum, I. H. M., Wohlmuth, C., Würthner, F., & Williams, R. M. (2022). Energy transfer in supramolecular calix[4]arene—Perylene bisimide dye light harvesting building blocks: Resolving loss processes with simultaneous target analysis. Journal of Photochemistry and Photobiology, 12, 100154.
Teles-Ferreira, D. C., van Stokkum, I. H. M., Conti, I., Ganzer, L., Manzoni, C., Garavelli, M., . . . de Paula, A. M. (2022). Coherent vibrational modes promote the ultrafast internal conversion and intersystem crossing in thiobases, Physical Chemistry Chemical Physics, 24, 21750–21758.
van Stokkum, I. H. M., Kloz, M., Polli, D., Viola, D., Weißenborn, J., Peerbooms, E., . . . Kennis, J. T. M. (2021). Vibronic dynamics resolved by global and target analysis of ultrafast transient absorption spectra, The Journal of Chemical Physics, 155, 114113.
Dobryakov, A. L., Kovalenko, S. A., & Ernsting, N. P. (2003). Electronic and vibrational coherence effects in broadband transient absorption spectroscopy with chirped supercontinuum probing. The Journal of Chemical Physics, 119, 988–1002.
Dobryakov, A. L., Kovalenko, S. A., & Ernsting, N. P. (2005). Coherent and sequential contributions to femtosecond transient absorption spectra of a rhodamine dye in solution. The Journal of Chemical Physics, 123, 044502.
van Stokkum, I. H. M., Jumper, C. C., Snellenburg, J. J., Scholes, G. D., van Grondelle, R., & Malý, P. (2016). Estimation of damped oscillation associated spectra from ultrafast transient absorption spectra. The Journal of Chemical Physics, 145, 174201.
Acuña, A. M., Van Alphen, P., Van Grondelle, R., & Van Stokkum, I. H. M. (2018). The phycobilisome terminal emitter transfers its energy with a rate of (20 ps)–1 to photosystem II. Photosynthetica, 56, 265–274.
Snellenburg, J. J., Dekker, J. P., van Grondelle, R., & van Stokkum, I. H. M. (2013). Functional Compartmental Modeling of the Photosystems in the Thylakoid Membrane at 77 K. The Journal of Physical Chemistry B, 117, 11363–11371.
Godfrey, K. (1983). Compartmental models and their application. Academic Press.
Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic, J., . . . Willing, C. (2016). Jupyter Notebooks - a publishing format for reproducible computational workflows, in Positioning and Power in Academic Publishing: Players, Agents and Agendas, eds. F. Loizides and B. Schmidt, IOS Press, pp. 87–90.
Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., . . . Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data, 3, 160018.
Liebel, M., & Kukura, P. (2013). Broad-Band Impulsive Vibrational Spectroscopy of Excited Electronic States in the Time Domain. The Journal of Physical Chemistry Letters, 4, 1358–1364.
Liebel, M., Schnedermann, C., Wende, T., & Kukura, P. (2015). Principles and Applications of Broadband Impulsive Vibrational Spectroscopy. The Journal of Physical Chemistry A, 119, 9506–9517.
Golub, G. H. & LeVeque, R. J. (1979). Extensions and uses of the variable projection algorithm for solving nonlinear least squares problems, Proc. of the 1979 Army Numerical Analysis and Comp. Conf., ARO Report 79-3, pp. 1–12.
Nagle, J. F. (1991). Solving complex photocycle kinetics - theory and direct method. Biophysical Journal, 59, 476–487.
Lawson, C. L., & Hanson, R. J. (1974). Solving Least Squares Problems. Prentice Hall.
Mullen, K. M., & van Stokkum, I. H. M. (2009). The variable projection algorithm in time-resolved spectroscopy, microscopy and mass spectrometry applications. Numerical Algorithms, 51, 319–340.
Satzger, H., & Zinth, W. (2003). Visualization of transient absorption dynamics – towards a qualitative view of complex reaction kinetics. Chemical Physics, 295, 287–295.
Hippius, C., van Stokkum, I. H. M., Gsanger, M., Groeneveld, M. M., Williams, R. M., & Würthner, F. (2008). Sequential FRET processes in calix[4]arene-linked orange-red-green perylene bisimide dye zigzag arrays. Journal of Physical Chemistry C, 112, 2476–2486.
Hippius, C. (2007). Multichromophoric Arrays of Perylene Bisimide Dyes - Synthesis and Optical Properties; Multichromophore Perylenbisimidkaskaden - Synthese und optische Eigenschaften, PhD Thesis, Universität Würzburg, Fakultät für Chemie und Pharmazie, 2007.
Tian, L., van Stokkum, I. H. M., Koehorst, R. B. M., Jongerius, A., Kirilovsky, D. & van Amerongen, H. (2011). Site, Rate, and Mechanism of Photoprotective Quenching in Cyanobacteria, Journal of the American Chemical Society, 133, 18304–18311.
Liu, H., Zhang, H., Niedzwiedzki, D. M., Prado, M., He, G., Gross, M. L., & Blankenship, R. E. (2013). Phycobilisomes Supply Excitations to Both Photosystems in a Megacomplex in Cyanobacteria. Science, 342, 1104–1107.
Shen, G., Boussiba, S., & Vermaas, W. F. (1993). Synechocystis sp PCC 6803 strains lacking photosystem I and phycobilisome function. The Plant Cell, 5, 1853–1863.
Tian, L., Farooq, S., & van Amerongen, H. (2013). Probing the picosecond kinetics of the photosystem II core complex in vivo. Physical Chemistry Chemical Physics, 15, 3146–3154.
van Stokkum, I. H. M., Gwizdala, M., Tian, L., Snellenburg, J. J., van Grondelle, R., van Amerongen, H., & Berera, R. (2018). A functional compartmental model of the Synechocystis PCC 6803 phycobilisome. Photosynthesis Research, 135, 87–102. https://doi.org/10.1007/s11120-017-0424-5
van Stokkum, I., (2018). Systems biophysics: Global and target analysis of light harvesting and photochemical quenching in vivo, in Light Harvesting in Photosynthesis, eds. R. Croce, R. van Grondelle, H. van Amerongen and I. van Stokkum, CRC Press, Boca Raton, ch. 20, pp. 467–482.
van Stokkum, I. H. M., Gwizdala, M., Tian, L., Snellenburg, J. J., van Grondelle, R., van Amerongen, H., & Berera, R. (2018). A functional compartmental model of the Synechocystis PCC 6803 phycobilisome. Photosynthesis Research, 135, 87–102.
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., et al. (2020). SciPy 1.0: fundamental algorithms for scientific computing in Python, Nature Methods, 17, 261–272.
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science and Engineering, 9, 90–95.
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., et al. (2020). Array programming with NumPy, Nature, 585, 357–362.
Lam, S. K., Pitrou, A. & Seibert, S. (2015). Numba: a LLVM-based Python JIT compiler, presented in part at the Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, Austin, Texas.
Hoyer, S., & Hamman, J. (2017). xarray: N-D labeled Arrays and Datasets in Python. Journal of Open Research Software. https://doi.org/10.5334/jors.148
Acknowledgements
We thank Sergey Laptenok for help with designing the more modern logo features, critical reading, and helpful discussion. We thank Artur Nenov for helpful discussion.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
To be submitted to a special collection of scientific papers in the "Photochemical and Photobiological Sciences" in honor of Fred Brouwer https://www.springer.com/journal/43630.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
van Stokkum, I.H.M., Weißenborn, J., Weigand, S. et al. Pyglotaran: a lego-like Python framework for global and target analysis of time-resolved spectra. Photochem Photobiol Sci 22, 2413–2431 (2023). https://doi.org/10.1007/s43630-023-00460-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s43630-023-00460-y