A methodology for speeding up edge and line detection algorithms focusing on memory architecture utilization


In this paper, a new methodology for speeding up edge and line detection algorithms is presented, achieving improved performance over the state of the art software library OpenCV (speedup from 1.35 up to 2.22) and other conventional implementations, in both general and embedded processors, by reducing the number of load/store and arithmetic instructions, the number of data cache accesses and data cache misses in memory hierarchy and the algorithm memory size. This is achieved by fully exploiting the combination of the software and hardware parameters which are considered simultaneously as one problem and not separately. Furthermore, the edge and line detection algorithms have been simplified for a computer vision application in a Virtex-5 Xilinx FPGA using Microblaze soft processor (detection and measurement of flow fronts in a microfluid device); it achieves speedup up to 660 times in comparison with conventional software implementations.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

    Shen J (1995) Multi-edge detection by isotropical 2-d isef cascade. Pattern Recogn 28(12):1871–1885. doi:10.1016/0031-3203(95)00056-9

  2. 2.

    Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell PAMI-8(6):679–698

    Google Scholar 

  3. 3.

    Smith TG Jr, Marks WB, Lange GD, Sheriff WH Jr, Neale EA (1988) Edge detection in images using Marr–Hildreth filtering techniques. J Neurosci Methods 26:75–81

    Article  Google Scholar 

  4. 4.

    Kanopoulos N, Vasanthavada N, Baker R (1988) Design of an image edge detection filter using the sobel operator. IEEE J Solid-State Circuits 23:358–367

    Google Scholar 

  5. 5.

    Kirsch RA (1970) Computer determination of the constituent structure of biological images. Comput Biomed Res 315–328

  6. 6.

    Gunn S (1998) Edge detection error in the discrete laplacian of gaussian. In: ICIP 98 Proceedings of international conference on image processing, vol 2, pp 515–519

  7. 7.

    Roberts LG (1963) Machine perception of three-dimensional solids. Outstanding Dissertations in the Computer Sciences. Garland Publishing, New York

    Google Scholar 

  8. 8.

    Prewitt JMS (1970) Object enhancement and extraction. Academic Press, New York

    Google Scholar 

  9. 9.

    Shrivakshan GT, Chandrasekar C (2012) A comparison of various edge detection techniques used in image processing. IJCSI Int J Comput Sci 9(1). http://www.ijcsi.org/articles/A-survey-of-edge-detection-techniques-used-in-image-processing-with-a-case-study.php

  10. 10.

    Sharifi M, Fathy M, Mahmoudi MT (2002) A classified and comparative study of edge detection algorithms. In: ITCC. IEEE Computer Society, New York, pp 117–120. http://dblp.uni-trier.de/db/conf/itcc/itcc2002.html

  11. 11.

    Shin MC, Goldgof DB, Bowyer KW, Nikiforou S (2001) Comparison of edge detection algorithms using a structure from motion task. IEEE Trans Syst Man Cybern Part B 31(4):589–601. http://dblp.uni-trier.de/db/journals/tsmc/tsmcb31.html

    Google Scholar 

  12. 12.

    Bin L, Yeganeh MS (2012) Comparison for image edge detection algorithms. IOSR J Comput Eng (IOSRJCE) 2:1–4

    Google Scholar 

  13. 13.

    Raman M, Aggarwal H (2009) Study and comparison of various image edge detection techniques. Int J Image Process (IJIP) 3:1–11

    Google Scholar 

  14. 14.

    Duda RO, Hart PE (1972) Use of the hough transformation to detect lines and curves in pictures. Commun ACM 15:11–15. doi:10.1145/361237.361242

  15. 15.

    Guru DS, Shekar BH, Nagabhushan P (2004) A simple and robust line detection algorithm based on small eigenvalue analysis. Pattern Recogn Lett 25(1):1–13. http://dblp.uni-trier.de/db/journals/prl/prl25.html

    Google Scholar 

  16. 16.

    Zheng Y, Li H, Doermann D (2005) A parallel-line detection algorithm based on hmm decoding. IEEE Trans Pattern Anal Machine Intell 27(5):777–792. doi:10.1109/TPAMI.2005.89

    Google Scholar 

  17. 17.

    Mattavelli M, Noel V, Amaldi E (1999) A new approach for fast line detection based on combinatorial optimization. In: ICIAP. IEEE Computer Society, New York, pp 168–173. http://dblp.uni-trier.de/db/conf/iciap/iciap1999.html

  18. 18.

    Sun J, Zhou F, Zhou J (2006) A new fast line detection algorithm. In: 1st International symposium on systems and control in aerospace and astronautics, 19–21 January 2006, ISSCAA 2006, pp 831–833

  19. 19.

    Bradski G (2000) The opencv library. Dr. Dobb’s J Softw Tools 25(1):122–125

    Google Scholar 

  20. 20.

    Antoine CW, Petitet A, Dongarra JJ (2000) Automated empirical optimization of software and the atlas project. Parallel Comput 27:2001

    Google Scholar 

  21. 21.

    Puschel M, Moura JMF, Johnson J, Padua D, Veloso M, Singer B, Xiong J, Franchetti F, Gacic A, Voronenko Y, Chen K, Johnson RW, Rizzolo N (2005) SPIRAL: code generation for DSP transforms. Proceedings of the IEEE, special issue on “Program Generation, Optimization, and Adaptation”, vol 93, no 2, pp 232–275

  22. 22.

    Frigo M, Johnson SG (1997) The fastest Fourier transform in the west. Technical report, Cambridge, MA, USA

  23. 23.

    Pinter SS (1996) Register allocation with instruction scheduling: a new approach. J Prog Lang 4(1): 21–38. http://compscinet.dcs.kcl.ac.uk/JP/jp040102.abs.html

    Google Scholar 

  24. 24.

    Shobaki G, Shawabkeh M, Rmaileh NEA (2008) Preallocation instruction scheduling with register pressure minimization using a combinatorial optimization approach. ACM Trans Archit Code Optim 10(3):14:1–14:31 (2008). doi:10.1145/2512432

    Google Scholar 

  25. 25.

    Bacon DF, Graham SL, Sharp OJ (1994) Compiler transformations for high-performance computing. ACM Comput Surv 26:345–420

    Article  Google Scholar 

  26. 26.

    Granston E, Holler A (2001) Automatic recommendation of compiler options. In: Proceedings of the workshop on feedback-directed and dynamic optimization (FDDO)

  27. 27.

    Triantafyllis S, Vachharajani M, Vachharajani N, August DI (2003) Compiler optimization-space exploration. In: Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO ’03. IEEE Computer Society, Washington, DC, pp 204–215. http://dl.acm.org/citation.cfm?id=776261.776284

  28. 28.

    Cooper KD, Subramanian D, Torczon L (2001) Adaptive optimizing compilers for the 21st century. J Supercomput 23:2002

    Google Scholar 

  29. 29.

    Kisuki T, Knijnenburg PMW, O’Boyle MFP, Bodin F, Wijshoff HAG (1999) A feasibility study in iterative compilation. In: Proceedings of the second international symposium on high performance computing, ISHPC ’99. Springer, London, pp 121–132. http://dl.acm.org/citation.cfm?id=646347.690219

  30. 30.

    Kulkarni PA, Whalley DB, Tyson GS, Davidson JW (2009) Practical exhaustive optimization phase order exploration and evaluation. TACO 6(1):1–36

    Google Scholar 

  31. 31.

    Kulkarni P, Hines S, Hiser J, Whalley D, Davidson J, Jones D (2004) Fast searches for effective optimization phase sequences. SIGPLAN Notices 39(6):171–182. doi:10.1145/996893.996863

    Google Scholar 

  32. 32.

    Park E, Kulkarni S, Cavazos J (2011) An evaluation of different modeling techniques for iterative compilation. In: Proceedings of the 14th international conference on compilers, architectures and synthesis for embedded systems, CASES ’11. ACM, New York, pp 65–74. doi:10.1145/2038698.2038711

  33. 33.

    Monsifrot A, Bodin F, Quiniou R (2002) A machine learning approach to automatic production of compiler heuristics. In: Proceedings of the 10th international conference on artificial intelligence: methodology, systems and applications, AIMSA ’02. Springer, London, pp 41–50. http://dl.acm.org/citation.cfm?id=646053.677574

  34. 34.

    Stephenson M, Amarasinghe S, Martin M, O’Reilly UM (2003) Meta optimization: improving compiler heuristics with machine learning. SIGPLAN Not. 38(5), 77–90. doi:10.1145/780822.781141

    Google Scholar 

  35. 35.

    Tartara M, Crespi Reghizzi S (2013) Continuous learning of compiler heuristics. ACM Trans Archit Code Optim 9(4):46:1–46:25. doi:10.1145/2400682.2400705

    Google Scholar 

  36. 36.

    Agakov F, Bonilla E, Cavazos J, Franke B, Fursin G, O’Boyle MFP, Thomson J, Toussaint M, Williams CKI (2006) Using machine learning to focus iterative optimization. In: Proceedings of the international symposium on code generation and optimization, CGO ’06. IEEE Computer Society, Washington, DC, pp 295–305. doi:10.1109/CGO.2006.37

  37. 37.

    Kelefouras VI, Athanasiou G, Alachiotis N, Michail HE, Kritikakou A, Goutis CE (2011) A methodology for speeding up fast fourier transform focusing on memory architecture utilization. IEEE Trans Signal Process 59(12):6217–6226

    Article  MathSciNet  Google Scholar 

  38. 38.

    Kelefouras VI, Kritikakou AS, Siourounis K, Goutis CE (2013) A methodology for speeding up mvm for regular, toeplitz and bisymmetric toeplitz matrices. J Signal Process Syst. doi:10.1007/s11265-013-0812-9.

  39. 39.

    Xilinx (2012) Virtex-5 fpga ml507 evaluation platform. http://www.xilinx.com/products/boards-and-kits/HW-V5-ML507-UNI-G.htm

  40. 40.

    Austin T, Larson E, Ernst D (2002) Simplescalar: an infrastructure for computer system modeling. Computer 35:59–67. doi:10.1109/2.982917. http://dl.acm.org/citation.cfm?id=619072.621910

    Google Scholar 

  41. 41.

    Intel image processing library (ipl) (2000). http://downloadcenter.intel.com

  42. 42.

    van den Braak GJ, Mesman B, Corporaal H (2010) Compile-time gpu memory access optimizations. In: ICSAMOS, pp 200–207

  43. 43.

    Duraiswami R (2007) Canny edge detection on nvidia cuda. In: Computer vision and pattern recognition

  44. 44.

    Ogawa K, Ito Y, Nakano K (2010) Efficient canny edge detection using a gpu. In: International conference on natural computation 279–280. doi:10.1109/IC-NC.2010.13

  45. 45.

    Palomar R, Palomares JM, Castillo JM, Olivares J, Gómez-Luna J (2010) Parallelizing and optimizing lip-canny using nvidia cuda. In: Proceedings of the 23rd international conference on industrial engineering and other applications of applied intelligent systems, vol Part III, IEA/AIE’10. Springer, Berlin, pp 389–398. http://portal.acm.org/citation.cfm?id=1945955.1946001

  46. 46.

    Ogawa K, Ito Y, Nakano K (2010) Efficient canny edge detection using a gpu. In: Proceedings of the 2010 first international conference on networking and computing, ICNC ’10. IEEE Computer Society, Washington, DC, pp 279–280. doi:10.1109/IC-NC.2010.13

  47. 47.

    Jovanovic R, Tuba M, Simian D (2012) Parallelization of the local threshold and boolean function based edge detection algorithm using cuda. In: Proceedings of the 5th WSEAS congress on applied computing conference, and Proceedings of the 1st international conference on biologically inspired Computation, BICA’12. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, pp 157–161. http://dl.acm.org/citation.cfm?id=2230596.2230625

  48. 48.

    Cheikh T, Beltrame G, Nicolescu G, Cheriet F, Tahar S (2012) Parallelization strategies of the canny edge detector for multi-core cpus and many-core gpus. In: New circuits and systems conference (NEWCAS), 2012 IEEE 10th International, pp 49–52. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6328953

  49. 49.

    Park SI, Ponce SP, Huang J, Cao Y, Quek FKH (2008) Low-cost, high-speed computer vision using nvidia’s cuda architecture. In: Applied imagery pattern recognition workshop, pp 1–7. doi:10.1109/AIPR.2008.4906458

  50. 50.

    Zhang X, Dykes SG, Deng H (1997) Distributed edge detection: issues and implementations. In: IEEE computational science and engineering, Spring Issue, pp 72–82

  51. 51.

    Sanduja V, Patial R (2012) Article: Sobel edge detection using parallel architecture based on fpga. Int J Appl Inf Syst 3(4):20–24; Published by Foundation of Computer Science, New York

    Google Scholar 

  52. 52.

    Khalid NEA, Ahmad SA, Noor NM, Fadzil AFA, Taib MN (2011) Analysis of parallel multicore performance on sobel edge detector. In: Proceedings of the 15th WSEAS international conference on Computers, pp 313–318. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point. http://dl.acm.org/citation.cfm?id=2028299.2028359

  53. 53.

    Moore C, Devos H, Stroobandt D (2009) Optimizing the fpga memory design for a sobel edge detector. In: ERSA, pp 299–300

  54. 54.

    Osman M, Elhassan Z, Hussin Azmadi F, Ali Z, Basheer N (2010) Hardware implementation of an optimized processor architecture for sobel image edge detection operator. In: International conference on intelligent and advanced systems (ICIAS 2010)

  55. 55.

    Osman ZEM, Hussin FA, Ali NBZ (2010) Optimization of processor architecture for image edge detection filter. In: Proceedings of the 2010 12th international conference on computer modelling and simulation, UKSIM ’10, pp 648–652. IEEE Computer Society, Washington, DC

  56. 56.

    Yasri I, Hamid N, Yap V (2010) Real-time video edge detection with the memory access improvement. In: International conference in intelligent advanced systems (ICIAS)

  57. 57.

    Kornaros G (2010) A soft multi-core architecture for edge detection and data analysis of microarray images. J Syst Archit 56:48–62

    Article  Google Scholar 

  58. 58.

    Lacassagne L, Lohier F, Garda P, Pierre U, Bâtiment MC (1998) Real time execution of optimal edge detectors on risc and dsp processors

  59. 59.

    Austin U, Atiquzzaman M, John O (1999) Performance of the hough transform on a distributed memory multiprocessor. Elsevier Microprocess Microsyst 22(7):355–362

    Google Scholar 

  60. 60.

    Chen YK, Li W, Li J, Wang T (2008) Novel parallel hough transform on multi-core processors. In: ICASSP, pp 1457–1460. IEEE, New York

  61. 61.

    Li W, Chen YK (2008) Parallelization, performance analysis, and algorithm consideration of hough transform on chip multiprocessors. SIGARCH Comput Archit News 36:10–17. doi:10.1145/1399972.1399977

    Google Scholar 

  62. 62.

    Mattavelli M, Noel V, Amaldi E (1999) A new approach for fast line detection based on combinatorial optimization. In: Proceedings of the 10th international conference on image analysis and processing, ICIAP ’99. IEEE Computer Society, Washington, DC. http://portal.acm.org/citation.cfm?id=839281.840739

  63. 63.

    Kim D, Jin SH, Thuy NT, Kim KH, Jeon JW (2008) A real-time finite line detection system based on fpga. In: 6th IEEE international conference on industrial informatics

  64. 64.

    Khan MUK, Bais A, Yahya KM, Hassan GM, Arshad R (2009) A swift and memory efficient hough transform for systems with limited fast memory muhammad u. k. khan. In: Image analysis and recognition. Springer, Berlin

  65. 65.

    Intel core 2 duo processor e6550 (2012) http://ark.intel.com/Product.aspx?id=30783

  66. 66.

    PRICE J (1995) Mips iv instruction set, revision 3.1. Technical report, MIPS Technologies Inc., Mountain View

  67. 67.

    Thoziyoor DTS, Tarjan D, Thoziyoor S (2006) Cacti 4.0. Technical report

  68. 68.

    Documentation for ubuntu 10.04 lts (2012) https://help.ubuntu.com/10.04/index.html

  69. 69.

    Ubuntu manuals (2012) http://manpages.ubuntu.com/manpages/lucid/man1/time.1.html

  70. 70.

    Nethercote N, Seward J (2007) Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Notices 42(6):89–100. doi:10.1145/1273442.1250746

  71. 71.

    Opencv Manual (2012) http://www.cs.unc.edu/Research/stc/FAQs/OpenCV/OpenCVReferenceManual.pdf

  72. 72.

    Demiris A, Blionas S (2011) Integrated system for the visual control, quantitative and qualitative flow measurement in microfluidics

Download references


This research has been co-financed by the European Union (European Social Fund ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF)–Research Funding Program: Heracleitus II. Investing in knowledge society through the European Social Fund. The results were co-financed by Hellenic Funds and the European Regional Development Fund (ERDF) under ESPA 2007–2013 (MICRO2-SE-G). The machine vision algorithm and the SW model were introduced/patented by Micro2gen [72]. Part of this research has been supported by the Public Welfare Foundation ‘Propondis’ research funds.

Author information



Corresponding author

Correspondence to Vasilios Kelefouras.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kelefouras, V., Kritikakou, A. & Goutis, C. A methodology for speeding up edge and line detection algorithms focusing on memory architecture utilization. J Supercomput 68, 459–487 (2014). https://doi.org/10.1007/s11227-013-1049-x

Download citation


  • Data reuse
  • Data cache
  • Assosiativity
  • FPGA
  • Memory management
  • Tiling
  • Canny
  • Hough