1 Med Biol Eng Comput (2006) 44(4):257–274

Due to a processing error the presentation of Section 6 was incorrect.

The correct version of Section 6 is given below.

2 Design of the cost functions

2.1 Approach 1: maximization of the sum of the covariance matrix elements

According to the principle of isoelectric focusing, for the gel array, in which the pH gradient is created along the vertical axis (that corresponds to the index i in image matrix notation), similar values of intensities should be obtained in a lane image for each fixed coordinate i. It means that if we interpret each column \(P_{j\in \{ 1,2,\ldots,n\}}\) of the lane L as a vector of m observations of a random variable X j , then in ideal case all random variables (having the same probability distributions) in the random vector X = [X 1,X 2,...,X n ] should be maximally correlated.

For a shift vector k = [k 1,k 2,...,k n ], we can introduce another random vector

$$ \tilde{\mathbf{X}}^\mathbf{k}=\left[\tilde{X}_1^{k_1},\tilde{X}_2^{k_2},\ldots,\tilde{X}_n^{k_n}\right],$$
(16)

where \(\tilde{X}_j^{k_j}\) corresponds to the shifted column profile \(\tilde{P}_j^{k_j}.\) Each such random vector is characterized by the covariance matrix defined as

$$ \hbox{cov}(\tilde{\mathbf{X}}^{\mathbf{k}})=\left[ \begin{array}{ccc} \hbox{cov}(\tilde{X}_1^{k_1}, \tilde{X}_1^{k_1})&\ldots&\hbox{cov}(\tilde{X}_1^{k_1}, \tilde{X}_n^{k_n})\\ \vdots&\ddots&\vdots\\ \hbox{cov}(\tilde{X}_n^{k_n}, \tilde{X}_1^{k_1})&\ldots&\hbox{cov}(\tilde{X}_n^{k_n}, \tilde{X}_n^{k_n}) \end{array} \right].$$
(17)

For each shift vector k and the given lane image L, we define a measure of the overall covariance among all shifted columns as the sum of components of the covariance matrix

$$M_1(\tilde{L}^{\mathbf{k}})=\sum\hbox{cov}(\tilde{\mathbf{X}}^{\mathbf{k}})=\sum_{i,j=1}^n\hbox{cov}(\tilde{X}_i^{k_i},\tilde{X}_j^{k_j}),$$
(18)

where the indices ij are chosen so that the inequality i ≤ j holds, since any covariance matrix is symmetric.

We propose to use the measure M 1 as the first possible cost function to be tested for controlling the optimization of the band straightening process. The optimum shift vector is then searched, for which the value of M 1 reaches its maximum.

We can evaluate computational complexity of the measure M 1 in the following way. The calculation of one value of M 1 consists of (n 2+n)/2 calculations of cov(X,Y) requiring 2m accesses to the data. Therefore the measure M 1 belongs to the asymptotic complexity class O(mn 2). The total computational time of Stage 1 for M 1 belongs to O(mn 3), while the total complexity of Stage 2 is O(mn 4).

2.2 Approach 2: maximization of the sum of the correlation matrix elements

Instead of covariance matrix, for measuring the correlations between the lane columns, correlation matrix cor(X) with entries equal to correlation coefficients cor(X i ,X j ) for i,j = 1,2,...,n can be used. By the shift vector k = [k 1,k 2,...,k n ] we can transform the given column vectors {P j } of observations to the vectors \(\{\tilde{P}_j^{k_j}\}\) of “artificially” shifted observations. Thus, we define the cost function \(M_2(\tilde{L}^{\mathbf{k}})\) for a shift vector k:

$$M_2(\tilde{L}^{\mathbf{k}})=\sum\hbox{cor}(\tilde{\mathbf{X}}^{\mathbf{k}})=\sum_{i,j=1}^n\hbox{cor}(\tilde{X}_i^{k_i},\tilde{X}_j^{k_j}),$$
(19)

where the indices ij are chosen so that the inequality i ≤ j holds.

Similar to the case of M 1, the measure M 2 needs (n 2+n)/2 calculations of the function cor(X,Y) that requires 2m accesses to the data but constantly more numerical operations. For that reason the measure M 2 is constantly smaller than M 1 but still part of the complexity class O(mn 2). The total computational time of Stage 1 for this measure belongs to the class O(mn 3). The total complexity of Stage 2 is O(mn 4).

2.3 Approach 3: maximization of the sum of the row standard deviations

Another reasonable approach uses information involved in the row profiles instead of the column profiles. Looking at the particular row profile we can notice that its shape is much less deformed for straightened image than for the distorted one. By finding the shift vector k that minimizes the total deformation of the row profiles, we can produce a straightened lane image. There are several ways of how to understand the deformation of the particular row profile. The first possibility is to introduce a random vector for lane image rows

$$ \mathbf{Y}=\left[Y_1,Y_2,\ldots,Y_m\right]^\prime.$$
(20)

Thus, the row profile R i can be treated as a realization of the random variable Y i . For the given shift vector k we can derive the shifted random vector

$$ \tilde{\mathbf{Y}}^{\mathbf{k}} = \left[\tilde{Y}_1^{\mathbf{k}}, \tilde{Y}_2^{\mathbf{k}}, \ldots, \tilde{Y}_m^{\mathbf{k}}\right]^\prime,$$
(21)

where \(\tilde{Y}_i^{\mathbf{k}}\) corresponds to the shifted row profile \(\tilde{R}_i^{\mathbf{k}}\) (see formula 7). Then we can consider the standard deviation \(\hbox{std}(\tilde{Y}_i^{\mathbf{k}})\) to be a convenient measure of the deformation of the individual row profile. The standard deviation of the whole random vector \(\tilde{\mathbf{Y}}^{\mathbf{k}}\) is defined as a vector of standard deviations of the individual random variables \(\tilde{Y}_i^{\mathbf{k}}:\)

$$ \hbox{std}(\tilde{\mathbf{Y}}^{\mathbf{k}}) = \left[ \begin{array}{c} \hbox{std}(\tilde{Y}_1^{\mathbf{k}})\\ \hbox{std}(\tilde{Y}_2^{\mathbf{k}})\\ \vdots\\ \hbox{std}(\tilde{Y}_m^{\mathbf{k}})\\ \end{array} \right].$$
(22)

Given the shifted lane image \(\tilde{L}^{\mathbf{k}},\) we can introduce a total measure of the lane deformation as the negative sum of row standard deviations:

$$ M_3(\tilde{L}^{\mathbf{k}}) = -\sum\hbox{std}(\tilde{\mathbf{Y}}^{\mathbf{k}}) = -\sum_{i=1}^m\hbox{std}(\tilde{Y}_i^{\mathbf{k}}).$$
(23)

We propose to use the measure M 3 as the next possible cost function to be tested for controlling the optimization of the band straightening process.

Computational complexity of the measure M 3 depends on the number and complexity of the standard deviation estimates. One calculation of the measure M 3 consists of m calculations of the function std(X) that constantly requires n data accesses. Accordingly, the measure M 3 belongs to the asymptotic complexity class O(mn). The total computational time of Stage 1 for this measure belongs to the class O(mn 2). The total complexity of Stage 2 simplifies to the class O(mn 3).

2.4 Approach 4: maximization of the sum of the row tensions

Treating of the deformation of the particular row profile can be based on geometric interpretation of the row profiles. Each such profile can be understood as a curve in 2D plane and therefore it can be characterized by the so-called energy terms such as curve tension, curve rigidity, etc. In accordance with the objective of the band straightening task, we propose to use the curve tension as a measure of the row profile deformation.

The tension of the curve X(s) = [x(s),y(s)], s ∈ 〈0,1〉 is usually defined by the following integral [9]:

$$ \hbox{tens}(X)=\int\limits_0^1 |X'(s)|^2{\rm d}s.$$
(24)

In discrete case, the integral and the first derivatives are substituted by the sum operator and differences, respectively. Given the shifted row profile \(\tilde{R}_{i}^{\mathbf{k}} = \left[\tilde{a}_{i,1},\tilde{a}_{i,2},\ldots, \tilde{a}_{i, n}\right],\) the tension of the row can be characterized by the following formula:

$$ \hbox{tens}(\tilde{R}_i^{\mathbf{k}}) = \sum_{j=2}^n \left(\tilde{a}_{i,j}-\tilde{a}_{i,(j-1)}\right)^2.$$
(25)

Thereafter, the total measure of the lane deformation can be defined as the negative sum over all row tensions in the shifted lane image:

$$ M_4(\tilde{L}^{\mathbf{k}}) = -\sum_{i=1}^m\hbox{tens}(\tilde{R}_i^{\mathbf{k}}).$$
(26)

Maximizing the measure M 4 by finding an optimum shift vector k, an increase of the band straightness is achieved. We involve the measure M 4 as the last cost function we want to test for controlling the optimization of the band straightening process.

One computation of the function \(\hbox{tens}(\tilde{R}_i^{\mathbf{k}})\) requires constantly n data accesses. The overall calculation of the measure M 4 consists of m row tension estimates and thus M 4 belongs to the asymptotic complexity class O(mn). The total computational time of Stage 1 for this measure belongs to the class O(mn 2). The total complexity of Stage 2 belongs to the class O(mn 3).