Selection of the most sensitive configuration of strip array detectors for x-ray beam monitoring in radiotherapy of cancer utilizing singular value decomposition

We propose a concise mathematical framework in order to compare detector configurations efficiently for x-ray beam monitoring in radiotherapy of cancer. This framework consists of the singular value decomposition (SVD) of the system matrix and the definition of an effective information threshold based on the relative error inequality utilizing the condition number of a matrix. The goal of this paper is to present the mathematical argument as well as to demonstrate its use for modeling the best detector configuration for monitoring x-ray beams in external beam therapy. This analysis depends neither on specific measurements of a given set of x-ray beams, nor does it depend in specific reconstruction algorithms of the beam shape, and therefore represents a configuration meta-analysis. In the results section, we compare three possible detector designs, each leading to a highly underdetermined system, and are able to determine their effective information content relative to each other. Furthermore, by changing design parameters, such as the geometric detector configuration, number of detectors, detector pixel size, and the x-ray beam blur, deeper insight in this challenging inverse problem is achieved and the most sensitive monitoring scheme is determined. Graphical Abstract Illustration of the general approach for performing configuration meta–analysis.

thin ( 2-5 mm 8 cm 20 cm) tungsten leaves shaping the treatment beam with submillimeter precision. The leaves are moving in and out orthogonally to the x-ray beam blocking it in some regions and allowing radiation to pass in another region as the linac irradiates tumor volume. Tumor and normal organs, which have to be spared from radiation as much as possible, have different sizes ( 0.5-40 cm) and shapes and therefore the motion of MLC leaves and the resulting x-ray dose can be complex. In this process, realtime monitoring and verification of MLC performance are crucial [9,15,27]. Development of beam monitoring detectors capable of determination of MLC motions and dose rate delivered to cancer patient in real time is of great interest to radiotherapy.
Due to technological challenges in the development and practical use to radiation detectors, several types of detector types and detection schemes have been considered: ionization chambers or diode arrays [1,8], integral dose ionization chamber [20], flat panel detector array (TFT) [21], or high energy current (HEC) detectors [3]. During treatment, some of the x-rays interact with these radiation detectors before reaching the patient and give rise to a signal, which is converted to information about dose rate and/or MLC positions via various algorithms. The signal of the detectors arises from x-ray energy deposited inside the detector and converted to charge carriers, which are collected by a system of electrodes. The algorithms that derive clinically relevant information (MLC positions, dose rate inside detector, patient dose) depend on which features are actually measured by the specific detector and this in turn depends on the data acquisition scheme of the detector.
The question of which acquisition configuration gives rise to the most accurate estimation of dosimetric parameters in general is essential for the development of new detectors. In this study, the configuration of the detector is defined by the number of strip detector arrays [11], their orientation (angle) with respect to the x-ray beamline of the MLC collimator and with respect to the other arrays. Throughout this paper, we use the term detector configuration when we speak of our particular MLC monitoring device and system configuration when we refer to the more general mathematical framework.
In this paper, we propose and utilize a higher level mathematical configuration meta-analysis of different detection schemes for a specific detector type. The reason to call it a configuration meta-analysis is that it relies not on a specific set of shapes of x-ray fields collimated by the MLC, but on the entirety of all possible MLC apertures, and therefore, allows the evaluation of the system/detector configuration itself. The method we present is employing the concept of singular value decomposition (SVD), which offers a more general framework of evaluation of hypothetical detector performance for various configurations. This means, in this investigation, we are not concerned with the actual reconstruction of specific x-ray field parameters (MLC shapes) from the measured detector signals, and therefore do not rely on any specific reconstruction algorithm, but want to investigate the quality of the detector configuration itself for arbitrary x-ray fields, which is summarized in the system matrix. The system matrix does not depend on specific measurements but is a general property of the detector configuration.
System configuration comparison based directly on SVD has been applied in a few specific fields, e.g., in fluorescence diffuse optical tomography [4,6,13] and near-infrared tomography [26] under the name singular value analysis of those observation systems. It showed to be a very efficient and simple measure to utilize the number of singular values above a certain noise threshold, which are most useful for performing the tomographic reconstruction. There were also critical discussions [14], which showed that although singular value analysis is a very good overall measure for tomography setups, there are specific imaging applications for which other concepts are more helpful.
Unfortunately, the mathematical argumentation in these publications was lacking and/or the authors failed to connect the numerical results for the singular values with the general singular value properties, such as the relative error inequalities employing the condition number of the system matrix. The goal of this paper is therefore to employ the relative error inequality for a concise mathematical argument suitable for configuration meta-analysis of alternative detection schemes in beam monitoring of radiotherapy x-ray fields. The mathematical framework we propose gains deeper insight into challenging detector configuration problems which rely on the current clinical practice.
Although SVD is highly popular in many application fields, especially in data analysis combined with principal component analysis, for reconstruction algorithms based on the pseudo-inverse [24] and also for direct system analysis [5,12], to the best of our knowledge, the full mathematical argument based on SVD and the relative error bound has not been presented for direct system configuration comparisons. We believe that this concise mathematical argumentation and the demonstration for the present challenging problem can be extended to other applications where different configurations have to be compared to each other to determine the most suitable design.

Methods
In the following, we must distinguish between the specific radiation device and its application to radiotherapy from a more general mathematical framework that we propose for the investigation of potential design configurations of radiation devices in general. We first present a novel linear formulation of a mathematical model for a specific detector for radiotherapy beam monitoring with multiple strip detector arrays (SDA) [11]. Second, we present the essential summary of linear algebra relevant for the general singular value analysis, which allows the comparison of different linear system configurations in general (e.g., see [23] for example as a standard reference). Third, we present numerical results for the application of the general mathematical framework to the specific radiotherapy beam monitoring device (SDA). Finally, we provide a critical discussion of the proposed mathematical method and its most salient features as well as possible other uses of the method to radiotherapy problems requiring comparison of different design configurations of instrumentation.

Application and detector
During cancer patient treatment, a medical linac radiates a high-energy megavoltage photon beam via a delivery system composed of a rotating linac gantry with a rotating dynamic multileaf collimator (MLC). The patient is lying on a robotic treatment table. The MLC of a medical linac is composed of typically 120 thin leaves and is able to shape this photon beam right at the exit of the gantry head. Based on the clinical target and organs at risk, a radiation plan is generated which exactly defines the motion of the rotating medical linac as well as all the MLC positions at each time point. Furthermore, the beam intensity is also modulated by the monitor units (MUs) for each MLC segment. All treatment parameters are used in the treatment planing system (TPS) to compute the dose in the patient and are exported to the linac for delivery of the treatment.
The clinical parameters (e.g., the MLC leaf positions and the MU) in the TPS constitute the ideal or the reference for the actual delivery. Thus, a detector that monitors the actual treatment delivery measures signals from which the clinical parameters are derived and provides deviations from the ideal TPS parameters. The deviations between these MLC/MU parameters lead to dose differences in the patient anatomy. In the present paper, we limit the work to MLC leaf positions and MU fraction per segment as clinical parameters (i.e., typically 121 free parameters) which are hypothetically determined by a specific detector that has a number of possible embodiments (configurations). The purpose in this paper is not to determine the actual values of particular clinical beams (as for example performed in [11]) but the efficacy of each detector configuration to provide a certain level of accuracy in determination of the clinical parameters for arbitrary clinical beams.
The detector configuration is presented and discussed in the subsequent Section 2.1.6. The detector is mounted on the linac below the MLC. The MLC is composed of two leaf banks (left and right), and each bank typically consists of 60 parallel leaves aligned along x-axis (Fig. 1). The leaves move and form MLC segments that are delivered with a certain amount of radiation controlled by monitor unit fraction.
The detector configuration considered here is the following: The detector consists of parallel strip detectors and each strip detector measures dose independently as an integral dose along the whole length of the strip. The strip array covers the whole irradiation region plus extra margins. Therefore, the strip detector array measures projected/cumulative doses. The strip detector array can be oriented a priori at any angle with respect to the MLC. However, in this paper, we consider only 3 possible angles, which have certain motivation behind: 0 (along xaxis), 90 (along y-axis), 45 /135 (diagonal). A similar detector design was studied in reference [2] using a different mathematical framework.

X-ray generated treatment fields
For each point in time, there is a well-defined MLC shape (aperture) collimated by typically 60 leaves on the left and 60 leaves on the right bank, leading to typically 120 unknown parameters. For each MLC segment, there is a certain dose delivery at constant rate, referred to as MU fraction. Based on the known MLC geometry as well as the given MUs, the medical linac will generate a 2D flux or fluence field, where is defined to be parallel to the leaf pair motions, is defined to be orthogonal, and is time. The flux depends on the leaf pair positions, which we will denote as and for the left and right MLC leaves, and the MU, which we will denote as , leading to the explicit description of dependencies . We assume that based on a beam model we can sufficiently accurately calculate the 2D fluence field, given the leaf pair positions as well as the MUs. For the investigations at hand, we will discuss different dose models from very unrealisitic to very realistic ones in order to investigate the dependence of the inverse problem with respect to the modelling (see Fig. 1 for examples of the 2D fluence field).
At first, we realize that the original fluence field is a nonlinear function with regard to the leaf position, but it is linear with regard to MU. This can be directly seen by the violation of the scalar multiplication (for a given ): since the field intensity is not uniformly changed only if a subset of MLC leave positions is locally scaled. For this investigation of the optimal geometry, we try to find an approximate linear presentation of all possible 2D fluence fields that can be shaped by the original field parameters .

X-ray beam model
The portion of the flux that is stopped by the detector gives rise to detector signal which can be computed by convolution of the MLC aperture and Gaussian dose deposition kernels [16][17][18]. Since each MLC segment is composed of small rectangular shapes corresponding to each open MLC leaf pair, this gives rise to a sum of beamlets. Such a beamlet representation of dose can also be used for computation of dose in patients and mathematically it represents superposition-convolution dose models when the heterogeneities associated with the patient anatomy are properly accounted for in the model. In our case, the detector is homogeneous and thin and thus the Gaussian convolution model is sufficient [22]. A useful representation in order to describe all possible field shapes is the pencil beam approach: All fluence fields can be approximated by the weighted sum of pencil beams. We utilize here singular pencil beams of 1-mm width and the corresponding leaf pair height. Since the blur of the flux can be well approximated by a Gaussian convolution model, this is not affecting the linearity of the fluence field. Such a beam model can be extended to slightly different blur magnitudes for different layers in overlapping detector geometries. It is pointed out that this would not change the theory or application of the detector comparison approach in this paper.
Utilizing the pencil beams for all leaf pairs (e.g., with a practically useful number of 45 leaf pairs) and 1-mm gap (e.g., 150 pencil beams per leaf pair, corresponding to a 15 cm maximum leaf pair opening), we get in total 43 150 6750 individual pencil beams ( 1 ). Then we can approximate the true fluence field by 1 when still neglecting the time dependency of the field shape. We get parameters which are the weights of the individual pencil beams that approximate the real the fluence field shape. It is pointed out that the clinical relevant parameters can only indirectly be determined by the parameters . Obviously, this is only an approximation to real fluence fields, since 1. due to the discretization to 1-mm gaps, we cannot perfectly generate submillimeter MLC leaf positions (this can actually be resolved by utilizing smaller gap sizes and will be done in the results section), and 2. with this linear representation, we can generate many more fluence fields than are actually possible. For example, in reality, MLC leaf pairs are open or not -in this representation, we could generate every millimeter another totally arbitrary intensity which corresponds to the huge amount of degrees of freedom.
Despite of these differences between the true fluence fields and this linear representation, we can still summarize: All real field shapes can be described with minor differences with such a pencil beam approximation.

The forward problem of measurement
For the proposed general setup of this paper, we assume to have one or several 1D detectors, which are directly mounted to the MLC frame. This means that even if the MLC is rotating around itself, the detector will rotate accordingly. This means for the detector, the -anddirections are always the same as the -and -directions of the MLC. Furthermore, we consider narrow strip line detectors, i.e., each of those detectors is providing a 1D signal (we call projections). The strip detectors are thin along the beam line and narrow/long across the beam line. Essentially, each detector pixel collects all the intensities of the 2D fluence field along a line integral. The line detector is at a fixed angle relative to the fluence field. Furthermore, let us say, the line detector has a 1D pixel resolution of and pixels. Then for pixel the collecting line (the path over which the integration occurs) can be written as ( ) (see [11]) cos sin 2 sin cos .
We assume that the strip detector array is oriented at some fixed angle with respect to the MLC coordinate system. The detector signal at pixel can then be calculated as (see [11]) 1 2 d .
For example, see Fig. 2 for an illustration of the integral paths of one strip array detector and the resulting detector signal.

Derivation of the system matrix based on the forward problem
When inserting the 2D fluence field decomposition to the forward problem, we get for every detector pixel with the projecting matrix and vector of the degrees of freedom of the field . Please note, in every column of this matrix , the detector signal of the corresponding single pencil beam is stored.
If we not only have one detector, but many, say , detectors then this matrix equation still holds, but for the corresponding line integral paths described by we need to show the dependency on detector orientation and rewrite this to . This leads to the system matrix equation  with the left hand side vector being element of as the vector of all measured detector signals, the system matrix of the detector signal generation and the degrees of freedom that describe all possible fluence field shapes.

Specific detector configurations
In order to demonstrate the potential of the system configuration comparison, we will utilize three different detector configurations shown in Fig. 3, with two configurations with two detectors (setup A and B) and one configuration with three detectors (setup C).

The general singular value analysis process
We now introduce the general mathematical framework for performing the configuration meta-analysis based on singular value analysis applied to linear observation equations. We use the more general term system configuration in order to highlight the generality of this approach.

SVD of an observational system matrix
In a detection scenario, we assume to have a linear or approximately linear relationship between parameters we want to learn about and the actually observed signals : with the system matrix of the observation , such as in the presented mathematical formulation of the MLC QA measurements with strip detector arrays. Such equations appear, e.g., directly in static observations (e.g., see the explanatory example in Appendix 1), as an observation equation in stochastic dynamic Kalman filters [23], or as in our case: as a linearization of a nonlinear observation equation, see [7] for an example in control theory. In many practical purposes, we typically have and a system matrix which has full column rank. In this situation, one can reconstruct the parameters directly from observations utilizing the pseudo-inverse or similar methods.
In this investigation, we are not concerned with the actual reconstruction of but want to investigate the quality of the measurement configuration itself which is summarized in the system matrix (without measurements). Therefore, we can focus also on completely non-sufficient observation scenarios. Indeed, in the application, we will present a case with . When applying the singular value decomposition (SVD) to , as a partial result, we get the sorted list of the singular values 1 2 and the residual list of zeros 1 min 0. In this nomenclature, represents the rank of matrix . A more detailed description of the SVD is presented in the Appendix 2. The singular values contain relevant information about , e.g., the scaling of coordinates that a vector undergoes when applied to .

Condition number of the system matrix vs. measurement noise
In general, the condition number of a matrix is defined as 1 with 1 the largest and the smallest singular value greater than 0. A very well-known fact of numerical linear algebra is: If the measurement has an additive error , maybe due to measurement noise or data precision, we get a relative error bound of the solution [23] 1 which can be read in the following way: If there is a given relative measurement error (which can be directly related to the signal-to-noise ratio (SNR) of the measurement) and we still want to have a worst case relative error in the result of then -according to this upper bound -we would be allowed to have a condition number eff for the sharpest bound ( eff 1). Although it is not necessary in our investigation, one can always modify the real system matrix in order to meet this condition number approximately, i.e., eff , by utilizing the corresponding low rank approximation [23].
Clearly, this is an upper estimation for the maximum relative error estimating for all possible and . In real measurement scenarios, typically this limit is not achieved since the vectors and would need to point in very specific directions (see the explanatory example in Appendix 1 and the Appendix 2 for a detailed discussion on this).
To give a numerical example, a relative measurement error 10 2 and a worst case relative solution error of 10 1 (which means a relative error of 10%) would simply allow for a condition number of eff 10 for the sharpest bound. With respect to the practically small probability of the worst case in this upper error bound, in a very pragmatic way, one could even risk to allow a maximum relative error of 1 (i.e., at worst relative errors of the solution up to 100%), which would then lead to eff 100.

Effective information threshold
We have on the one side, the system matrix and its singular values, on the other side, we get (by looking at the maximum allowed relative measurement and solution errors) a desired condition number eff to fulfill the relative error bound sharply.
At this point, it is important to conclude, independently to which one sets these relative errors and what the original of the system matrix is, there is an effective information threshold for all singular values of , which must fulfill 1 1 eff 1 in order to meet these criteria sharply. The number of singular values that fulfill this effective information threshold represents the effectively usable number of linearily independent equations in the system matrix (i.e., an effective rank with respect to the magnitude of the measurement noise). For the practical extraction of these corresponding equations, one needs to apply the reduced rank approximation of where all singular values not fulfilling this bound are set to zero.
By working with 1 eff as a threshold, we essentially are focussing on thresholding normalized singular value distributions 1 ( 1 ) for comparison of different system configurations and not on thresholding the absolute (not normalized) singular values [4,6,13,26].
What does this tell us about comparing different system/ detector configurations and their information content? Quite straightforwardly, the system with more singular values above this effective information threshold does contain more information about than the system that has a smaller number of above-threshold values. Based on this number, different system configurations can be directly compared and optimized. This comparison can be done quickly in an overview plot of the normalized singular values with the effective information threshold as in Fig. 4. The further the intersection of the normalized singular values with the effective information threshold is to the right, the more effective information is contained in the system configuration.

Summarizing the singular value analysis process
This framework can be summarized to the following algorithm: 1. Model the specific observation/detection scheme in a specific application by expressing it in terms of linear equations where represent the measurements and are the unknown parameters. 2. Define the expected level of relative measurement error by assessing the measurement device. This can be the typical or maximal error magnitudes, depending on the perspective in the application. 3. Define the desired or target relative solution error for the application (keeping in mind that this could a small or a large number, even 1 in some applications, see the discussion in the Appendix 2). 6. Repeating steps 1-5 for several competing system configurations, one can directly compare them. The system with the highest number of singular values above the effective information threshold overall contains the most information about . In order to judge the differences between different configurations, one can directly set the number of singular values above this threshold relative to the number of unknown parameters (i.e., the number of entries of ). As an extreme case, if this ratio achieves 1 (i.e., 100%), then one can completely reconstruct based on inverting the linear system of equations even with the presence of a relative measurement error.
We highly recommend the reader to the explanatory example in the Appendix 1 in order to gain a straightforward understanding of this approach for a simple reconstruction problem.

Comparison of specific detection configurations
When looking at the linear system of equations, one realizes quickly, that for example for 3 detectors with a pixel resolution of 400 pixels per detector (corresponding to a pixel width of 0.75 mm) and 6750 pencil beams we would get an observational system matrix of shape 1200 6750, which is an underdetermined system. If we would want to solve the inverse problem with this equation, we definitely would need to incorporate additional information, such as that not every millimeter a totally different intensity could occur. For this specific investigation of comparing different detector geometries, this is actually not very important. Because the system matrix itself tells us how much information we gather by given detector geometries relative to each other as described in the general theory section.
The system matrices for the three detector configurations (setup A and B: 800 6750, setup C: 1200 6750) are shown as maps (blue: nonzero value, white: zero) in Fig. 5. Each column of each system matrix consists of the detection signal for each pencil beam. This means, for setup A and B, we have two signal spots in each column, since we have two detectors stacked under each other, and for setup C, we have three signals. Going the columns of the system matrix to the right, we change the individual pencil beam, until all possible pencil beam signals are covered by the system matrix. The order of how we go through the pencil beams generates the characteristic patterns in Fig. 5. In Rows that are always zero correspond to detector pixels outside the MLC aperture area covered by the pencil beams consequence, the system matrix can be viewed as the pencil beam representation in the detector configuration. Based on these system matrices, the singular value distribution is present in Fig. 6. Essentially one needs to focus on the singular values with logarithmic scaling and at which point the values drop under the effective information threshold. The number of singular values above this threshold tells how many useful constraints (information) is present in Matrix about the values .
A range of scenarios is presented which help interpreting the results. In this plot also the effective information threshold is shown. For example, if we assume that the amplitude of the measurement signal of each detector pixel is a 100 times higher than the noise (an already good signal in practice), this corresponds to a SNR 20 log 10 signal noise dB 20 log 10 100 dB 40 dB. If we further allow in the worst case for a 100% error in (although it typically is much better), then the effective condition number is eff 100. In the case, that this estimation seems too risky, also a lower effective condition number, such as eff 10, can be chosen, without changing the result of system comparison. The simulations show (most effectively looking at the logarithmic graph at the bottom of Fig. 6): 1. If one uses more detectors, say 6 and arrange them equiangularly around the MLC this generates truly new information (green line) which helps reconstructing the representation with logarithmic -scaling. The additional simulation scenarios include as follows: using 6 detectors equiangularly arranged around the MLC (leading to a 2400 6750 system matrix) (green line), smaller pencil beam step sizes of 0.5 mm which doubles the pencil beams (leading to 800 13500 and 1200 13500 system matrices) (dashed lines), reducing the detector pixel size from 0.75 to 0.375 mm (leading to 1600 6750 and 2400 6750 system matrices) (fat dotted lines), and doubling the blur in the fluence signal from 0.5 mm to 1 mm (which is not changing the system matrix size relative to the start scenario) (dash-dotted lines) true fluence field. This shows the way of optimal signal reconstruction: if one increases the number of detectors, one can reconstruct the original signal better and better. 2. If one uses smaller pencil beams (0.5-mm steps instead of 1-mm steps, i.e., doubling the number of pencil beams) in order to approximate true fluence fields even better (dashed lines), this essentially does not change information content in these system matrices (although a slight improvement for the setup A is visible). 3. If one reduces the pixel size of the detector (i.e., doubling each detector signal), we see that for setup A, there is no improvement at all and for setup B and C, there is improvement, but at a neglectable information level since this new information is way below the effective information threshold corresponding to the condition number 100. 4. If there is twice as much blur in the fluence field, then the total information content for all detector configurations is equivalently reduced.
Based on this overview plot, it is clearly favorable to use setup B compared to setup A, since much more constraints and information are generated by this detector configuration. The introduction of another horizontal projection in setup C compared to setup B shows a rather minor improvement concerning new information introduced by this additional detector.

Determing optimal detection geometry
Since we have a number which allows the comparison of different detector configurations, we can formally optimize detector configurations. Before starting optimization, certain constraints need to be defined, such as number of detectors, degrees of freedom (e.g., angular position of each detector), and an inital detector configuration which will be modified by the degrees of freedom. Furthermore, an effective information threshold must be set which should be adjusted to the measurement quality of the observation at hand.
After this, an optimization scheme (e.g., see Fig. 7) can be applied. It must be stated that this optimization can be very time consuming, since possibly very large system matrices need to be calculated in a loop.

Discussion
The singular value decomposition (SVD) and the condition number of a matrix are a mathematical standard toolset in order to judge about the practical/numerical difficulty of inverting a linear equation. Applied to an inversion problem, it summarizes information about how challenging this problem actually is, and not how well a specific inversion algorithm performs. If, e.g., the condition number of a matrix is very large with respect to the involved uncertainties (numerical precision, measurement noise, etc.) of the linear equation, every inversion/reconstruction algorithm will be confronted with this challenging problem. The SVD-based analysis of a system allows measuring how difficult an inversion/reconstruction problem is at its core, and demonstrates a very useful tool for (linearized) inverse problems in general.
In this paper, we utilized the relation between SVD and the condition number in the following way: For competing configurations of an inverse problem, we state that if the difficulty of inversion/reconstruction is less for one configuration than for another, then more useful information is actually acquired and it is preferable. This corresponds to maximizing the number of singular values greater than an effective information threshold. The practical difficulty of this approach lies in the modeling and mathematical formulation of a given measurement scenario and an at least approximate linear formulation of the observation equation, as presented. The strength of the approach is, once this modeling worked out, the simple steps in Section 2.2.4 for comparing different observation scenarios can be directly applied without any inverse computations.
It is important to notice that this investigation with SVD of the system matrix essentially only focuses on how much new information is generated by alternative detector configurations compared to each other. There still is a point to be made that redundant information has its own merit, since this typically increases robustness and reduces dependencies on measurement noise. Furthermore, even if in a specific application it is unclear how to determine the effective information threshold exactly, system comparison can be performed well, by using several reasonable thresholds in a given range and compare these systems for all these thresholds simultaneously.
We are not aware of any other method that would perform a similar type of high-or general-level analysis that could be used for the determination of detector/system configurations. That is performing a meta-analysis without the need to use specific radiotherapy beam shapes for specific patient anatomies and specific radiotherapy beam reconstruction algorithms.
The concrete application of this methodology is to compare detector geometries for multileaf collimator measurements in radiotherapy and it demonstrates that this method can even be applied to not obvious cases and still allows for a solid judgment. In this specific example, the linearization also comes at a cost: In our linear model, we allow for much more 2D fluence fields than would be practically possible, which enormously increases the degrees of freedom from 2 45 1 91 degrees of Fig. 7 Optimization scheme, which includes the presented system performance estimation and allows for algorithmic maximization of the information contained in the system configuration Fig. 8 Illustration of the general approach for performing configuration meta-analysis freedom (each leaf position MU) to 6750 degrees of freedom. Still, with this approach, all real 2D fluence fields can be approximated well, and therefore, this investigation is practically useful. It is pointed out that the clinical use of setup C investigated with this singular value analysis is additionally demonstrated for several patient treatment plans for a specific reconstruction approach as presented in [11]. In the schematic Fig. 8 we are summarizing this general modeling approach directly, which may be applied also to different technical applications in the future.
In radiotherapy, other applications of this general approach might be beneficial such as in external beam treatment planing optimization which is not strictly speaking a detection or measurement scenario. However, it can still be expressed in similar terms as follows: clinical parameters of interest (MLC apertures), the dose metric prescribed by the physician (which corresponds to the measurements in the detection design), system matrix (dose calculation model), and configurations (on which the dose calculation model depends). In this application, the modeling, optimization, and analysis of an overall dose delivery to the patient are typically done by a superposition (i.e., linearization) of pencil beams. In this example, optimal beam angles, number of MLC segments per beam angle, or even comparisons between different dose delivery techniques, such as conformal treatment, IMRT, and VMAT, could be investigated as clinically relevant parameters.
Finally, we want to point out that the linearization method presented in this paper may not necessarily be the best way for reconstructing the clinical parameters , but it provides a quality figure of the system configuration. In specific cases, it is possible (e.g., if more observations are made than there are degrees of freedom ) to directly invert the observation equation with a low rank approximation of the system matrix , leading, e.g., to the pseudo-inverse [23]. In our specific application, we demonstrate that this investigation via SVD is still possible and useful, even if such an inversion is not possible, since we focus on a highly underdetermined system.

Conclusion
We have demonstrated a general approach based on the singular values of a system matrix, which allows to compare different system configurations regarding their effective information content. This presentation contains an extension to previous argumentation found in the literature with respect to the definition, meaning, and utility of the information content, cp. [4,6,13,25,26]. In order to demonstrate the use of this method for a challenging nonlinear example, we compared possible detection configurations for multileaf collimator control measurements in external beam therapy in radiotherapy. Based on this, the application of this method is demonstrated, including the linearization of the nonlinear model, and the best system configuration is determined.

Appendix 1. Explanatory example
In order to present the strength of this approach, we are briefly presenting the following measurement setup taken from reference [10], presented in Fig. 9.
In this example, we want to measure the 2D point coordinates 1 2 . Furthermore, we have a rotating detector which measures the parallel beam projection of the 2D point coordinate in the detector coordinate . First, there is a horizontal measurement performed and 1 is measured, then the detector is rotated by and the second detector point 2 is measured. The configuration parameter simply corresponds to different detector configurations. It is clear, if there is no noise in the measurements 1 and 2 and we Fig. 9 Explanatory example: The 2D point coordinates 1 2 need to be reconstructed with a given detector configuration of two detectors which are rotated relatively to each other by , leading to detection signals 1 and 2 . Different angles correspond to different detector configurations use 0 the coordinates 1 2 can be reconstructed directly by just inverting this linear system of equations. On the other side, it is clear that using 0 will be worse than 90 if there is measurement noise. In order to quantify this, we want to apply the proposed process: Step 1: This leads to the linear observation/detection model Step 2: We define the expected level of relative measurement error, e.g. 10 2 .
Step 3: We define a target relative solution error, e.g., 10 1 Step 4: This leads to the condition number of the sharp bound eff 10.
Step 5: We calculate the singular values of , which are 1 1 cos 2 1 cos In this example, the qualitatively most important question is simply, whether the second measurement contains enough information for the given relative errors in order to invert the linear system of equations or not. This means, we need the second singular value to be above the effective information threshold Step 6: The comparison of different detector configurations then is simply: if is in this calculated interval, then in general, we have enough information to invert the linear system of equations with the defined target relative solution error. This is a qualitative step. Inside this interval for one could further distinguish between all configurations, which leads to the optimum of 90 corresponding to the highest possible ratio 2 1 1. Since we compare system configurations with a large number of singular values in practice, we typically end up with a different number of singular values above the effective information threshold, leading to a well-defined quantity for comparison. Only if two configurations have roughly the same number, it makes sense to take the shape of the distribution of the singular values into account.
Furthermore, we are demonstrating in the Appendix 2 that the relative error inequality bound is sharp, i.e., this means for specific measurement and noise configurations , exactly an equality occurs. We want to illustrate this in this explanatory example. If 0, we can directly solve this equation system with If we find a configuration of and leading to the equality of left and right hand side, than we see this bound is sharp and cannot be improved.
Utilizing the left singular vectors of (see the Appendix 2 for the general case), we get these vectors pointing to their worst directions with 1 1 and 1 1 , with arbitrary constants . By inserting them into the inequality, we get on both sides 1 cos 1 cos , i.e., the equality holds and the bound is sharp. On the other hand, we realize that a very specific configuration is necessary that the equality holds, i.e., 1 2 and 1 2 (remember, the latter is random noise). In total, this demonstrates that the relative error bound is sharp, but in practice, equality is probably never reached. In cases with much higher dimensionality of and , such as in the application presented next, the likelihood of achieving equality decreases further since the configurations for the worst case are getting more specific.

Appendix 2. Worst case discussion of the relative error bound
In order to actually fulfill the equality of the relative error upper bound, and would need to simultaneously point in their worst directions. In order to see this, we start with the full linear system including the errors: . By applying the SVD (here in its compact form, i.e., representing the rank of , , and , and and semi-unitary, i.e., ), we get . with containing the first left singular vectors as columns, is the diagonal matrix with decreasing order of the singular values and containing the first right singular vectors as columns. Due to the semi-unitarity of we can rewrite this to . If we specifically choose to be a scaling of the first left singular vector, i.e., 1 , and to be a scaling of the th left singular vector, i.e., ( ), we get 0 . . . An obvious solution to this is 1 1 with the right singular vectors 1 and . When inserting these , , , and into the relative error upper bound, one gets the worst case with equality. In this solution, one can also directly interpret the worst case interaction between measurement quality and the system configuration of the relative error bound: If we have a SNR of the measurement, say 100, this can be interpreted as 100, then in order to allow for a maximum error to be in the range of , one can only allow for 1 100 . For lower condition numbers , we get 1 , leading to higher quality solutions in this worst case.
From a practical perspective, in order to actually run into this worst case region, and simultaneously would need to be linear combinations of very specific directions, such for in the span of vectors with the highest singular values span 1 , and in the span of vectors with the lowest singular values span . This means, if is a random detector measurement error, expected to be independently distributed for each coordinate, this could be assigned to a certain low probability. Therefore, practically in many cases, one can expect much better results than what the relative error upper bound would suggest. See also the explanatory example, where this argument is directly demonstrated.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.