We experimented the method proposed above for the separation of overlapped texts in recto–verso documents and for the extraction of the underwriting from multispectral images of palimpsests. The algorithms derived are very fast since the data model can be inverted in a single step.
As per the see-through problem, we tested the method on several grayscale real recto–verso pairs, affected by either show-through, bleed-through, or both. In the following, we analyze in detail the results obtained in some of such experiments, by also providing comparisons with the results obtained by a stationary linear model, the stationary nonlinear model in [15], and other state-of-the-art methods.
Figure 1a, e shows the original degraded recto and verso images. It is not easy to decide whether the degradation is caused by show-through, bleed-through, or both. As apparent, the interfering patterns are highly non-stationary. The results of the application of ICA, improved by histogram clipping, are shown in Fig. 1b, f, whereas Fig. 1c, g shows the results obtained with the stationary nonlinear model of [15]. The results of the method proposed in this paper are shown in Fig. 1d, h. The stationary linear model produces lower ink density in the occlusion areas and leaves some interferences as well. On the other hand, the results of the stationary nonlinear model highlight that a single value of the interference level is not sufficient to remove all the interference, although “holes” are not present anymore in the occlusions, for this image. The superior performance of the method proposed here is apparent. In other words, the adoption of a stationary data model, even nonlinear and convolutional, necessarily entails a compromise between see-through removal and preservation of the occlusions. Transforming the model, either linear or nonlinear, from stationary to non-stationary allows us to meet both requirements.
Figure 2 shows the maps of the estimated interference levels \(q_r\) and \(q_v\) and of the estimated occlusion pixels. Darker pixels in the interference level maps indicate higher values of \(q_r\) and \(q_v\). In the third map, black pixels represent occlusion areas.
Recently, a database of high-resolution grayscale images of ancient documents affected by bleed-through has been published online [23]. This database comprises \(25\) registered recto–verso sample grayscale image pairs, taken from larger high-resolution manuscript images, with varied degrees of bleed-through. In addition, for each image, a binary ground truth mask of the foreground text is provided. Although these ground truth images are synthetic, i.e., created manually, they can be useful for a quantitative analysis of the results. This database is diffusely described in [24].
In Fig. 3, we present the results obtained on one of such images, a manuscript belonging to the Allan and Maria Myers Academic Centre, University of Melbourne (Fig. 3a, d). The results of using the stationary nonlinear model of [15] are shown in Fig. 3b, e, whereas Fig. 3c, f shows the much better results obtained by using the non-stationary, linear model proposed in this paper. From a visual point of view, the higher quality of the reconstructions in this case can be appreciated especially in the recto side. Indeed, in Fig. 3b, the lower ink density in the occlusion areas is well visible, whereas it does not affect the homologous image of Fig. 3c.
In Fig. 4, the maps of the estimated interference levels \(q_r\) and \(q_v\) and of the estimated occlusion pixels are shown.
Finally, Fig. 5 shows, for a detail of the images of Fig. 3a, d, the comparisons with recent state-of-the-art methods. The recto sides are depicted in the left and the verso sides in the right, respectively. From top to bottom, we see the original degraded images, the results from the methods in [5, 10, 18], and [6], respectively, and finally, the results from the method proposed herein. It is apparent that the results obtained with our method are fully qualitatively comparable to the best ones, produced by the method in [6].
In another experiment, we processed a second pair from the same dataset [23]. The manuscript belongs to The James Hardiman Library, National University of Ireland, Galway. The original recto–verso pair and the corresponding reconstructions obtained with the method proposed in [6], which was the best performing for the previous experiment, and with our method are shown in Fig. 6. Again, our results are of very similar quality.
For a quantitative analysis, having available binary ground truth images, we binarized the reconstructions with the adaptive Sauvola algorithm [25] and then computed as quality indices the probability \(FgError\) that a pixel in the foreground text was classified as background, the probability \(BgError\) that a background or bleed-through pixel was classified as foreground, and the \(WTotError\), that is, the weighted mean of \(FgError\) and \(BgError\), with the weights being the numbers of the foreground pixels and the background pixels as they result from the corresponding ground truth images, indicating the probability that any pixel in the image was misclassified. According to [24], these quality indices are defined as:
$$\begin{aligned}&FgError=\frac{1}{N}\sum \limits _ {t\in GT(Fg)}|GT(t)-B(t)| \nonumber \\&BgError=\frac{1}{N}\sum \limits _ {t\in GT(Bg)}|GT(t)-B(t)| \nonumber \\&WTotError=\frac{N_{Fg}FgError+N_{Bg}BgError}{N} \end{aligned}$$
(10)
where \(GT\) is the ground truth, \(B\) is the binarized restoration result, \(GT(Fg)\) is the foreground region of the ground truth image constituted of \(N_{Fg}\) pixels, \(GT(Bg)\) is the complementary background region of the ground truth image constituted of \(N_{Bg}\) pixels, and \(N\) is the total number of pixels in the image. The values of these indices for the two experiments above are summarized in Table 1.
Table 1 Quality indices for the results in Figs. 3c, f, and 6e, f
The authors of [6] quantified the performance of their method and those of the methods in [5, 18] and [10] by binarizing the reconstructions with Gatos binarization [26] and then computing the mean quality indices of Eq. (10) for the entire dataset. Their method was the best, with \(FgError=~0.0696\), \(BgError=~0.0085\) and \(WTotError=~0.0196\). Though no comparison can be done on each single image, we observe that the quality indices for the two images shown above are below those mean values. We also computed the quality indices on the whole dataset. The measured errors are summarized in the following Table 2, where the mean errors are completed with the standard deviations and the best and worst quality indices. The mean errors reported in [6] are shown in the last column.
Table 2 Quality indices for the entire dataset
From Table 2, it appears that, on average, our results have lower FgError and WeightedTotError, and higher BgError, than those of [6]. From a check on some of our worst results, we observed that the wrong background pixels are usually confined at the character boundaries. This means that the binarization algorithm underestimates the threshold between foreground and background. In general, we expect different values of the quality indices on the same image when using different binarization algorithms. Thus, for a fair quantitative comparison using this peculiar kind of ground truth images, the same binarization algorithm, with the same parameters, should be used. Finally, it is worth highlighting again the simplicity of our method, which leads to a very fast algorithm. Indeed, a non-optimized Matlab code takes 0.77 s. for restoring the \(1{,}745\times 1{,}070\) images of Fig. 3a, d, on an Intel core i7 3GHz CPU.
In the case of palimpsests, we show the potentialities of our model by presenting the results obtained on a real–fake palimpsest, i.e., a document created by printing a first vertical text in a light color and then overprinting it with a horizontal darker text. Scans of the documents have then been used for processing. With this choice of the colors, in the red channel, the underwriting (the vertical text) almost disappears, whereas it is slightly visible in the green channel and well visible in the blue channel. We chose as observation at wavelength \(\lambda _1\) the blue channel, and as observation at wavelength \(\lambda _2\) the green channel. Note that if we had chosen as second observation the red channel, we could have set \(q^u(t)=0\), \(\forall t\), thus making the data system to reduce to a single equation. Note also that none of the standard thresholding techniques applied to the blue channel alone could permit separation of the two texts. Figure 7a, d shows the green and blue channels of the RGB original scan. Figure 7b, e shows the results of applying ICA to the density data images, with the aim at showing that global coefficients for the linear model do not allow the two texts to be separated. The promising results of our method are shown in Fig. 7c, f.