Window memoization: toward high-performance image processing software


In this paper, we present a new performance improvement technique, window memoization, for software implementations of local image processing algorithms. Window memoization combines the memoization techniques proposed in software and hardware with data redundancy in image processing to improve the performance of local image processing algorithms. It minimizes the number of redundant computations performed on an image by identifying similar neighborhoods of pixels in the image and skipping the computations that are not necessary. This leads to performance improvement in software. We have developed an optimized architecture for window memoization in software and applied it to six image processing algorithms. We have also developed a performance model to predict the speedups obtained by window memoization in software. The typical (average) speedups range from 1.2x to 7.9x while the total average speedup for different algorithms with different input images across different processors is 3.95x.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    This method of fast symbol generation, which benefits from overlapping windows in the image, is similar to Huang’s [10] method for fast median filter.

  2. 2.

    The error in an image (Img) with respect to a reference image (R Img ) is usually measured by signal-to-noise ratio (SNR) as [8]: \(\hbox{SNR} = 20\hbox{log}_{10}(\frac{A_{\hbox{signal}}}{A_{\hbox{noise}}})\) where A is the RMS (root mean squared) amplitude. \(A^2_{\hbox{noise}}\) is defined as: \(A^{2}_{\hbox{noise}} =\frac{1}{rc} \sum\nolimits_{i=0}^{r-1}\sum\nolimits_{j=0}^{c-1}(Img(i,j)-R_{Img}(i,j))^2\) where \(r \times c\) is the size of Img and R Img .

  3. 3.

    c a: average number of CPU cycles for arithmetic operations

  4. 4.

    c l: average number of CPU cycles for logical operations

  5. 5.

    c mul: average number of CPU cycles for multiplication operations

  6. 6.

    c m: average number of CPU cycles for memory operations

  7. 7.

    Area overlap for two sets A and B is calculated as \(\frac{|A \cap B|}{|A \cup B|}. \)


  1. 1.

    Alvarez, C., Corbal, J., Salami, E., Valero, M.: On the potential of tolerant region reuse for multimedia applications. In: International Conference on Supercomputing, pp. 218–228 (2001)

  2. 2.

    Alvarez, C., Corbal, J., Valero, M.: Fuzzy memoization for floating-point multimedia applications. IEEE Trans. Comput. 54(7), 922–927 (2005)

    Article  Google Scholar 

  3. 3.

    Bird, R.S.: Tabulation techniques for recursive programs. ACM Comput. Surv. 12(4), 403–417 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  4. 4.

    Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein C.: Introduction to Algorithms. The MIT Press, Cambridge, MA (2001)

  5. 5.

    Semacode Corporation: Accessed 29 Feb 2012

  6. 6.

    Ding, Y., Li, Z.: Operation reuse on handheld devices. In: Languages and Compilers for Parallel Computing (LCPC-03), vol. 2958/2004, pp. 273–287. Springer, Berlin (2003)

  7. 7.

    Egmont-Petersen, M., de Ridder, D., Handels, H.: Image processing with neural networks, a review. Pattern Recognit. 35, 2279–2301 (2002)

    Article  MATH  Google Scholar 

  8. 8.

    Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice Hall, Upper Saddle River (2008)

  9. 9.

    Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Morgan Kaufmann, Boston, MA (2003)

  10. 10.

    Huang, T.S., Yang, G.J., Tang, G.Y.: A fast two-dimensional median filtering algorithm. IEEE Trans. Acoust. Speech Signal Process. ASSP 27(1), 13–18 (1979)

    Article  Google Scholar 

  11. 11.

    Hughes, J.: Lazy memo-functions. In: A Conference on Functional Programming Languages and Computer Architecture, pp. 129–146. Springer, New York (1985)

  12. 12.

    Philips Breast Images: Accessed 29 Feb 2012

  13. 13.

    Intel: IA-32 Intel Architecture Optimization (2004)

  14. 14.

    Jain, A.K.: Image data compression: a review. Proc. IEEE 69, 349–389 (1981)

    Article  Google Scholar 

  15. 15.

    Khalvati, F., Aagaard, M.D.: Window memoization: an efficient hardware architecture for high-performance image processing. J. Real-Time Image Process. doi:10.1007/s11554-009-0128-y (2009)

  16. 16.

    Khalvati, F., Aagaard, M.D., Tizhoosh, H.R.: Accelerating image processing algorithms based on the reuse of spatial patterns. In: Canadian Conference on Electrical and Computer Engineering (CCECE), pp. 172–175 (2007)

  17. 17.

    Khalvati, F., Tizhoosh, H.R., Aagaard, M.D.: Opposition-based window memoization for morphological algorithms. In: IEEE Symposium on Computational Intelligence in Image and Signal Processing (CIISP), pp. 425–430 (2007)

  18. 18.

    Kirsch, R.A.: Computer determination of the constituent structure of biological images. Comput. Biomed. Res. 4, 315–328 (1971)

    Article  Google Scholar 

  19. 19.

    Robarts Imaging Research Laboratories: Accessed 29 Feb 2012

  20. 20.

    Lipasti, M.H., Wilkerson, C.B., Shen, J.P.: Value locality and load value prediction. In: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 138–147 (1996)

  21. 21.

    Mayfield, J., Finin, T., Hall, M.: Using automatic memoization as a software engineering tool in real-world AI systems. In: The 11th Conference on Artificial Intelligence for Applications (CAIA-95), pp. 87–93 (1995)

  22. 22.

    Michie, D.: Memo functions and machine learning. Nature 218, 19–22 (1968)

    Article  Google Scholar 

  23. 23.

    Pugh, W.: An improved replacement strategy for function caching. In: The 1988 ACM Conference on LISP and Functional Programming (LFP-88), pp. 269–276. ACM (1988)

  24. 24.

    Pugh, W., Teitelbaum, T.: Incremental computation via function caching. In: The 16th Annual ACM Symposium on Principles of Programming Languages, pp. 315–328 (1989)

  25. 25.

    Richardson, S.E.: Exploiting trivial and redundant computation. In: IEEE Symposium on Computer Arithmetics, pp. 220–227 (1993)

  26. 26.

    Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging 13(1), 146 (2004)

    Article  Google Scholar 

  27. 27.

    Shen, J.P., Lipasti, M.H.: Modern Processor Design. McGraw-Hill, New York (2004)

  28. 28.

    Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis, and Machine Vision. PWS, Pacific Grove, CA (1999)

  29. 29.

    Trajkovi, M., Hedley, M.: Fast corner detection. Image Vis. Comput. 16, 75–87 (1998)

    Article  Google Scholar 

  30. 30.

    Tuytelaars, T., Mikolajczyk, K.: Survey on local invariant features. FnT Comput. Graph. Vis. 1(1), 1–94 (2008)

    Google Scholar 

  31. 31.

    Wang, W., Raghunathan, A., Jha, N.K.: Profiling driven computation reuse: An embedded software synthesis technique for energy and performance optimization. In: IEEE International Conference on VLSI Design (VLSID-04 Design), p. 267 (2004)

Download references

Author information



Corresponding author

Correspondence to Farzad Khalvati.



In this section, we present the numerical values of speedups and results accuracy for window memoization in software. For natural images, we also present the original results for a sample image along with the results for window memoization for all six case study algorithms used in this paper. The algorithms include Canny edge detector (Canny), morphological gradient (Morpho), Kirsch edge detector (Kirsch), corner detector (Corner), median filter (Median), and local variance calculator (Variance) (Tables 11, 12, 13, 14; Figs. 11, 12).

Fig. 11

Results for a sample natural image. Top to bottom Canny, morphological, and Kirsch edge detectors. Left original results, right window memoization results

Fig. 12

Results for a sample natural image. Top to bottom Corner detection, median filter, and local variance. Left original results, right window memoization results

Table 11 Speedups (average) for processor 1 (high-end)
Table 12 Speedups (average) for processor 2 (mid-range)
Table 13 Speedups (average) for processor 3 (low-end)
Table 14 Accuracy (average) of the results

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Khalvati, F., Aagaard, M.D. & Tizhoosh, H.R. Window memoization: toward high-performance image processing software. J Real-Time Image Proc 10, 5–25 (2015).

Download citation


  • Computational redundancy
  • Memoization
  • Reuse
  • High-performance real-time image processing