Detecting image edges in IOT nodes without FPU

Parsons, Lyle; Deng, Guang; Ross, Robert

doi:10.1007/s11042-023-16672-4

Detecting image edges in IOT nodes without FPU

Open access
Published: 13 September 2023

Volume 83, pages 31161–31175, (2024)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Tools and Applications Aims and scope Submit manuscript

Detecting image edges in IOT nodes without FPU

Download PDF

539 Accesses
Explore all metrics

Abstract

Internet of Things (IoT) applications continue to expand into new applications with a growing need for image processing on the edge. Many edge devices are resource limited microcontrollers, which significantly prohibits many of the mature image processing algorithms. This paper proposes an approach optimized, for resource constrained processors removing the need for computationally expensive floating point arithmetic by using a framework based on unsigned integer arithmetic for image processing. The proposed framework (OptInt) is demonstrated using edge detection algorithms evaluated on two typical low-power IoT-ready micro-controllers and for comparison a more powerful Raspberry Pi. Results indicate that the OptInt approach for basic image processing in resource constrained devices reduces the computation time as well as the memory requirements, thereby allowing for more edge computing capabilities in these devices. Furthermore, the images produced using OptInt produce results of similar quality to mature edge detection algorithms.

An Edge Computing Architecture for Object Detection

Energy-Efficient Edge Intelligence: A Comparative Analysis of AIoT Technologies

Article 17 March 2023

Profiling and improving the duty-cycling performance of Linux-based IoT devices

Article 16 January 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The Internet of Things (IoT) is one of the major enablers driving the fourth Industrial Revolution by providing an ecosystem where processes can be remotely monitored and controlled. A typical IoT ecosystem would consist of a number of edge nodes sending sensor information to a gateway device, or directly to a cloud server for analysis [1]. In low bandwidth applications resource constrained edge devices with limited computing capabilities can be used. In these cases, all the computing is done on the cloud server or on a more powerful gateway node (fog computing [2]).

There are a vast array of applications for image analysis from fields as broad as medical diagnostics, ecology research, fire detection systems, vehicle tracking and traffic management [3,4,5,6,7,8,9].

In situations where increasing the bandwidth capability of the IoT system is not feasible (e.g in LoRa [10] communications infrastructures), the images cannot be sent to the cloud or gateway device for analysis. Image processing would therefore need to be performed on the edge device, requiring a device with more computing capabilities – leading to higher cost and power consumption, or by using optimizing algorithms to use computational power more effectively. As the trend towards edge computing over cloud computing grows [11], it is fitting to find ways of increasing the amount of data analysis that can be done locally on existing IoT infrastructures. The edge devices targeted in this paper are the low cost ones which come with limited processing power and on-board memory (i.e resource constrained devices).

In microcontrollers with no dedicated hardware floating point unit (FPU), the execution time of an algorithm using floating point arithmetic is much longer than in a device with an FPU [12]. It has also been shown that in older architectures, integer arithmetic is significantly faster than floating point arithmetic [13]. In off-grid IoT edge devices where battery life needs to be conserved, increasing the speed of algorithms reduces the amount of time that the device needs to be on. In applications where image processing is required, performing local image processing on the edge device means that only the processed information needs to be sent to the cloud server or gateway device, instead of the entire image – further conserving battery power. It also reduces the need for high bandwidth (and high power) wireless communication infrastructures such as WiFi to be used in IoT ecosystems which require image analysis.

The advantage of floating point arithmetic lies in its precision. However using integer scaling floating point arithmetic may be used whilst only using integer data types [14]. The accuracy of the result is dependant on the size of the integer used, with 8-bit integers being the least accurate. This means that integer scaling is not as effective for floating point arithmetic when used in 8-bit processors. In terms of storing the floating point numbers, most processors use the IEEE 754 standard [15]. The memory requirements coupled with the slower computational speeds associated with floating point arithmetic therefore make it difficult to perform this arithmetic on resource constrained devices (such as low cost IoT nodes).

One of the fundamental steps in a typical image processing pipeline is detecting edges in the image. This step allows the image to be reduced in complexity, with only the key features (edges) of the image being used further down the pipeline. There are a number of widely used edge detection algorithms available [16]. Two popular ones are the Sobel and Laplacian edge detectors [17]. Both of these are gradient-based edge detection methods, which require the use of floating point arithmetic for computation.

The primary motivation in this paper is to improve and enable image processing on the edge for low-power resource limited processors. This paper proposes a method of detecting edges in an image using unsigned integer arithmetic (OptInt framework), and storing the edge image using 8-bit unsigned integer memory. This OptInt implementation eliminates the need for floating point computation, while producing similar results to the popular Sobel and Laplacian gradient based edge detectors. This method of edge detection is capable of detecting strong edges in a noisy environment. This is significant when low resolution images are used (as is the case for resource constrained microcontrollers).

The proposed method is an adaptation of the Laplacian edge detection method where pixels in a rectangular window patch are subtracted from the centre pixel in the window to compute the edge gradients, in a sliding window approach [17]. However, unlike the Laplacian method which uses floating point arithmetic, the OptInt method uses unsigned integer arithmetic for computation of the edges. A Rectified Linear Unit operation (ReLu) [18] is applied at each subtraction stage, thereby eliminating the need for signed memory storage. A scaling factor (using bit shifting) was also applied after each ReLu operation, to prevent the output from saturating. This has the effect of reducing noise in the edge image.

Hence the contributions of this paper are as follows:

To demonstrate how edge detection algorithms can be optimised using integer only arithmetic.
To benchmark implementations of optimised algorithms on resource limited CPU’s.

2 Related work

In [19], the authors implemented a 2D Gaussian filter using fixed point arithmetic in an FPGA. Gaussian filtering is another common image processing technique, and inherently involves convolutions using floating point arithmetic. The authors found that by using fixed point arithmetic in the FPGA, their algorithm was much faster than the conventional approach to Gaussian filtering. More recently an integer based approach was used in FPGAs for neural networks giving an efficiency improvement between 1.7x to 7.3x [20]. Integer-only Gaussian filtering was also implemented in [21], as an approximation to floating point based Gaussian filtering. Fixed point arithmetic has also been used to simplify deep learning architectures. In [22], the weights of a deep learning network were constrained to the integer values +1, 0, and -1 – resulting in a model which required less memory to store the weights. A similar approach was implemented in [23], where the quantization of the weights allowed for integer only arithmetic for inference.

In [13], benchmarking tests were performed to compare the computational speeds of floating point and integer arithmetic on different CPU architectures. The tests indicated that floating point arithmetic is slower than integer arithmetic for similar data sizes. Similarly, for image segmentation using deep learning an integer based approach provides near full precision with a significant improvement to processing requirements [24]. Most modern architectures come with one or more dedicated hardware floating point units (FPU) to speed up the floating point arithmetic to a point where there is a negligible difference in computational speed between floating point and integer arithmetic [25, 26]. However, despite the inclusion of these FPU’s in some embedded devices, there is still a significant gain in computational speed associated with using integer arithmetic over floating point [13].

Two common metrics used to benchmark microprocessors are the MIPS (Million Instructions per Second), and the MFLOPS (Million Floating Point Instructions per Second). The MIPS metric represents the number of integer instructions which can be executed by the microprocessor per second, while the MFLOPS metric represents the number of integer instructions which can be executed by the microprocessor per second [25]. The MIPS metric will be orders of magnitude higher than the MFLOPS metric for resource constrained devices which lack a FPU [13]. For benchmarking integer operations, the Dhrystone test [27] can be used, while floating point capability can be benchmarked using the Linpack test [28].

3 Unsigned Integer arithmetic

Using the unsigned 8-bit integer (uint8) data type requires only 1 byte of memory per image pixel for storage with a range of 0-255. In contrast the floating point data type can represent fractions and negative numbers, at the expense of 4 bytes per data sample. For greater precision, the 8-byte double data type can be used. This section defines the different arithmetic operations using the uint8 data type. Let a, b, and c represent three uint8 numbers:

The addition operation is denoted as and is defined as:

(1)

As shown in (1), the result of uint8 addition between two numbers is the normal addition of those two numbers, with clipping at 255.^{Footnote 1}

The subtraction operation is denoted as and is defined as:

(2)

The operation effectively performs a ReLu operation, where negative values are clamped to 0. Algorithms 1 and 2 were implemented in C to achieve the operation with 8-bit memory.

The multiplication operation is denoted as and is defined as:

(3)

Algorithm 3 was implemented in C to achieve the operation with 8-bit memory.

4 Applications in edge detection

As an example of efficient image processing using unsigned integer arithmetic, OptInt is applied to find edges in an image. The OptInt implementation is similar to the Laplacian gradient based edge detection method, with the major difference being that uint8 variables are used, whereas the Laplacian method uses floating point variables. For a Grayscale image I stored in uint8 memory, let $x_0$ be the center pixel in a square patch of size $N \times N$. The pixels surrounding $x_0$ within the patch can be denoted by $x_k$, where $k=1:N^2-1$. An example of a $3 \times 3$ patch is shown below.

Within each patch, difference $d_k$ between each pixel and the centre pixel can be computed as:

(4)

where M is a scaling factor defined as:

$$\begin{aligned} M=2^n\qquad \qquad \qquad n=0,1,2..7 \end{aligned}$$

(5)

By performing uint8 subtraction, negative gradients (where $x_k$ is greater than $x_0$) are neglected. However, as will be shown in the Results section, OptInt produces similar edge images to the conventional methods without the need for floating point arithmetic. The constraint of making M a power of 2 is so that the uint8 division can be performed by bit shifting right n times. All $d_k$ within the patch can be summed (using a series of uint8 additions) to produce an edge intensity pixel $E_p$ related to that patch, as shown below:

$$\begin{aligned} E_p = \max (255,\sum _{k=1}^{N^2-1} d_k) \end{aligned}$$

(6)

The purpose of applying a scaling factor is to prevent $E_p$ from saturating which reduces noise in the image. After obtaining $E_p$ the patch is moved one pixel to the right and apply (4) and (6) to the pixels under the new patch. The final edge image is then made up of all edge intensity pixels $E_p$.

4.1 The 8-bit unsigned integer edge detection algorithm

The algorithm for creating the edge image using 8-bit unsigned integer arithmetic is shown in Algorithm 4. The inputs to the algorithm are the Grayscale image D (for which the edges are to be found), and the patch size p. For a $3 \times 3$ patch, $p = 1$. Similarly, $p = 2$ for a $5 \times 5$ patch.

4.2 Implementing laplacian and sobel edge detection

As a way of benchmarking the unsigned integer arithmetic algorithm, Laplacian and Sobel edge detectors were implemented using floating point arithmetic to find the edges of a test image [17, 29]. The Laplace kernel, L used is shown in (7) and the two Sobel kernels, $S_x$ and $S_y$ used are shown in (8) [17].

$$\begin{aligned} L = \left[ \begin{array}{ccc} -1&{}-1&{}-1 \\ -1&{}8&{}-1 \\ -1&{}-1&{}-1 \\ \end{array}\right] \end{aligned}$$

(7)

$$\begin{aligned} S_x = \left[ \begin{array}{ccc} -1&{}0&{}1 \\ -2&{}0&{}2 \\ -1&{}0&{}1 \\ \end{array}\right] \quad S_y = \left[ \begin{array}{ccc} 1&{}2&{}1 \\ 0&{}0&{}0 \\ -1&{}-2&{}-1 \\ \end{array}\right] \end{aligned}$$

(8)

The algorithm for the Laplacian Edge detection is shown in Algorithm 5. The inputs to the algorithm are the Grayscale image D (for which the edges are to be found), and output of the algorithm is the Edge Intensity image E. To save the Edge Intensity image as a uint8 array, the maximum absolute intensity value must first be calculated and subsequently used to scale the intensity values to within the uint8 range (0 to 255). This means that the edge intensity calculation must be computed twice. To reduce the computational time, the intensity pixels can be stored in a floating point array, however this limits the image size in devices with low RAM.

The algorithm for the Sobel edge detection is shown in Algorithm 6. The algorithm functions similarly to the Laplacian edge detection, with the key difference being that two kernels are used with the Sobel edge detector. This means the Sobel operation will be computationally more expensive than the Laplace or uint8 operations.

5 Optimisation considerations

To improve the speed of the unsigned integer edge detection algorithm, three areas were explored:

1.
Unrolling the for loops.
2.
Optimizing the uint8 division process.
3.
Compiler Optimisations.

From Algorithm 4, a number of nested for loops can be observed. When $p=1$ the algorithm works on a $3 \times 3$ window patch. Similarly when $p=2$ the algorithm works on a $5 \times 5$ patch. In the case of edge detection, any larger window size will produce ringing artifacts in the Edge Intensity Image E [17]. The speed of the algorithm can be optimized by limiting p to specific values (either $p=1$ or $p=2$), and unroll the two innermost for loops. For example, when $p=1$, Algorithm 4 can be rewritten as shown in 7. This optimisation results in a speed increase of approximately 2.4 times.^{Footnote 2}

The uint8 division process performs an integer division on the operands, yielding only the quotient of the division. Division is computationally expensive when divisors are a power of 2 can be replaced by right bit-shifting [30]. This results in a significant increase in speed.

The edge detection algorithm was written in C and subsequently compiled into machine code using the GCC compiler. There are optimisations which can be specified for this compiler, with each optimisation giving various improvements in memory and speed. As will be shown in the Results and Discussion section, changing the compiler optimisation flags has a significant effect on the overall algorithm execution time. For resource constrained devices, the emphasis is on speeding up the execution time while staying within the memory constraints of the device.

6 Results and discussion

This section presents the results of tests performed with the unsigned integer arithmetic edge detection algorithm. The results obtained were first implemented MATLAB, so as to verify its ability to accurately detect edges. This is followed by performance statistics achieved from implementing the algorithms in IoT-ready devices, namely the ESP32 [31], ESP8266 [32], and the Raspberry Pi [33]. Following this the results of implementing modified Laplace and Sobel edge detection algorithms are presented for each of the above mentioned IoT-ready devices for comparison purposes.

6.1 Implementing the edge detection algorithms in MATLAB

To verify that unsigned integer arithmetic can be used to detect edges in a grayscale image, the OptInt Algorithms 4 and 7 were implemented in MATLAB, and compared their output with the Laplace and Sobel edge detection algorithms.

Figure 1 shows the output of the Uint8 edge detector when applied to a 256x256 Grayscale image. The different figures demonstrate that a larger patch size, (p), results in more pixels being classified as edge pixels. Also, comparing Fig. 1e to d we can see that the normalization factor can be used to reduce noise in the edge image.

Figure 2 shows a comparison of the uint8, Sobel, and Laplace edge detector. The uint8 edge detector produces an output comparable to the Sobel edge detector. The advantage of using the unsigned integer edge detection method is more apparent in resource constrained devices, as will be shown in the sub-sections to follow.

6.2 Implementing the edge detection algorithms in IoT-ready devices

To quantify the performance of using unsigned integer arithmetic for image processing in IoT devices, the OptInt implementation was tested on three commonly used IoT devices. These devices are shown with relevant specifications in Table 1. Due to the processor speed and memory constraints of the NodeMCU devices, limits were imposed due to the image size (to extract edges) as well as the time taken to perform the edge detection.

6.2.1 ESP8266 NodeMCU implementation

The ESP8266 uses a single 32-bit microprocessor for both networking and application specific computations. It is assumed that the IoT device will read raw pixel data from a camera which has an on-board buffer. To emulate this, MATLAB was used to extract the raw grayscale pixel values of an image at various sizes which was then saved an SD card. The ESP8266 NodeMCU reads these files from the SD card and passes the data to the edge detection algorithms. The output arrays from the algorithms are then saved to the SD card. MATLAB was again used to display the edge intensity image for the purposes of this paper. The following edge detection algorithms were applied to the input images:

unsigned integer (8-bit), using algorithm 4; $p = 1, M = 2$
unsigned integer (8-bit), using algorithm 7; $p = 1, M = 2$
unsigned integer (8-bit), using algorithm 7; $p = 2, M = 2$
Laplace, using algorithm 5
Sobel, using algorithm 6
Sobel, using algorithm 6 and an approximation to the square root operation ref.
unsigned integer (32-bit), using algorithm 4 adapted for 32-bit architecture; $p \!=\! 1, M \!=\! 2$
unsigned integer (32-bit), using algorithm 7 adapted for 32-bit architecture; $p \!=\! 1, M \!=\! 2$
unsigned integer (32-bit), using algorithm 7 adapted for 32-bit architecture; $p \!=\! 2, M \!=\! 2$

Note that adapting the uint8 edge detection algorithms to uint32 only requires the use of a two 32-bit variables in which the additions, subtractions, and bit-shifting will be performed. After the arithmetic has taken place, the variable is cast back to uint8. This means that a negligible amount of additional RAM is needed for uint32 arithmetic, with the benefit of increased execution time outweighing the extra RAM requirement.

Table 1 Specifications of the IoT devices used in testing

Full size table

To ensure consistent results, each edge detection algorithm was run 100 times on the input image, and the average execution time was recorded. The simulations indicate that the fastest algorithm was the unsigned integer 32-bit, with the unrolled inner for loops. Figure 3 shows the execution times for each of the above-mentioned algorithms applied to images with dimensions ranging from 32 x 32 pixels ($\approx $1 Kilopixel) up to 150 x 150 pixels (22.5 Kilopixels).

The maximum image size capable of being processed in the ESP8266 NodeMCU was 150 x 150 pixels. This means that QQVGA images (i.e 160 x 120 pixel images) would be able to be processed in this device which is highlighted as a vertical line. Since this IoT-ready device does not have a built-in FPU, floating point arithmetic will take much longer than integer arithmetic. This is evident when comparing the execution times for the modified Laplace algorithm with that of the unrolled uint32 algorithm (Fig. 3b). The QQVGA image edges are computed almost 6 times faster using the unrolled uint32 algorithm.

Compiler optimisations also have a significant influence on the execution times of each of the algorithms. Table 2 shows the execution times for each algorithm at various compiler optimisation levels, using a QQVGA resolution input image. An interesting observation made from this experiment was that the advantage of using uint32 over uint8 diminishes as the optimisation becomes more execution-speed oriented (02 and 03). At all optimisation levels, unsigned integer edge detection algorithms executed faster than the modified Laplace or Sobel algorithms. The results presented in Fig. 3 were obtained using the Os optimisation level.

Table 2 ESP8266 NodeMCU edge detection execution speeds at different compiler optimisation levels for a QQVGA image

Full size table

6.2.2 ESP32 NodeMCU implementation

The ESP32 has two 32-bit microprocessors at its core, with one dedicated to networking and the other used for application specific computations. The same edge detection algorithms tested in the ESP8266 NodeMCU implementation were tested in this device. Figure 4 shows the execution times for each of the above-mentioned algorithms applied to images with dimensions ranging from 32 x 32 pixels ($\approx $1 Kilopixel) up to 200 x 200 pixels (40 Kilopixels). Results for the unsigned integer 32 bit algorithm with the unrolled for loops is also shown in each figure so as to highlight the execution speed gain when using this algorithm compared to the others. Each algorithm was run 100 times and the average execution times were taken to ensure consistency in the results. The maximum image size capable of being processed in the ESP32 NodeMCU was 200 x 200 pixels, meaning that it is able to process images in QQVGA and QCIF (176 x 144 pixels) resolution formats.

Table 3 ESP32 NodeMCU edge detection execution speeds at different compiler optimisation levels for a QQVGA image

Full size table

Figure 4 shows the uint32 algorithm with the unrolled for loops took the least amount of time to generate the edge intensity images. The addition of the FPU was observed to significantly reduce the time taken for the modified Sobel and Laplace edge detection algorithms (approximately 4 times faster in the case of the Laplace edge detection). Some of this execution time gain over the ESP8266 is due to the faster 240MHz CPU in the ESP32 device. However, even with the FPU present, the unrolled uint32 algorithm was almost 4 times faster for the QQVGA image. This shows that there is still an advantage gained when using unsigned integer arithmetic for image processing in microprocessors with an on-board FPU.

As with the ESP8266, varying the compiler optimisation level resulted in reduction in the execution times for each of the algorithms (Table 3). The effect of unrolling the for loops is reduced since some compiler optimisation levels will automatically perform this step. From the execution times shown in Table 3 the uint32 algorithm is the quickest, with an execution time of 4.6ms for QQVGA.

6.2.3 Raspberry Pi implementation

This is the most powerful IoT ready device of the three presented in this paper with processing capabilities significantly more powerful than the ESP32 and ESP8266. An operating system can be installed on this IoT device, and image processing packages such as OpenCV can run on it. Experimentation was also performed on a Raspberry Pi to show the results of OptInt algorithms in a less constrained IoT device. The nine edge detection algorithms implemented in the ESP8266 NodeMCU and ESP32 NodeMCU were also implemented in the Raspberry Pi. However, since the operating system on the Raspberry Pi was 64-bit, the uint32 algorithms were adapted to uint64 instead (by changing the variable data types from 32-bit unsigned integer to 64-bit unsigned integer for the arithmetic).

Figure 5 shows the execution times for each of the above-mentioned algorithms applied to images with dimensions ranging from 32 x 32 pixels ($\approx $1 Kilopixel) up to 800 x 600 pixels (480 Kilopixels). Results for the uint64 algorithm are included for comparison. Each algorithm was run 100 times and the average execution times were taken to ensure consistency in the results. The results of these tests show that the speed gain in using unsigned integer arithmetic for image processing is much less in the Raspberry Pi than in the ESP32 and ESP8266. This is because of the four cores in the Raspberry Pi processor, as well as the on-board FPU. Also, there is no advantage in manually unrolling the for loops with this device, as evident in the fact that the fastest unsigned integer algorithm was the uint64 with the nested for loops. Figure 5 shows the uint64 edge detection algorithm executes faster than the modified Sobel and Sobel approximation algorithms, but is slightly slower than the modified Laplace edge detection algorithm.

7 Conclusion

In this paper, the use of unsigned integer arithmetic for image processing computations in resource constrained devices was demonstrated. A framework of governing equations were introduced for using 8-bit unsigned integer arithmetic for addition, subtraction and multiplication; with the intention of keeping the result within the ranges of an 8-bit unsigned integer. A gradient-based edge detection algorithm using this arithmetic, as well as modified versions of two common edge detection algorithms which use floating point arithmetic, namely the Sobel and Laplace edge detectors were implemented. The purpose of the modification was so that the edge images could be saved using 8-bit unsigned integer memory. These algorithms were run on three IoT-ready (resource constrained) devices, namely the ESP8266 NodeMCU, the ESP32 NodeMCU and the Raspberry Pi 3. Various optimisation methods for OptInt algorithms were investigated, with the focus on reducing the overall execution time.

The results show that by implementing the unsigned integer algorithms for edge detection in resource constrained devices significantly reduces the computation time, while producing an edge image of similar quality to the popular Sobel and Laplace edge detectors. This is even more apparent in devices which do not contain an FPU. The algorithm execution speed is reduced further when the unsigned integer arithmetic is adjusted to the base architecture of the device (for example, using 32-bit unsigned integers for computation in a 32-bit device). Another benefit of using the OptInt implementation was its demonstrated ability to filter out noisy pixels in the edge image. This is achieved by the ReLu operation inherent in the unsigned integer addition and subtraction equations, coupled with the scaling factor used in the algorithm. Images therefore do not need to be smoothed prior to computing the edges.

Future work may include implementation and bench-marking of a range of other image processing and edge detection algorithms. Furthermore, the system could be integrated into some resource limited applications over a low-bandwidth communications link to demonstrate practical applications of the system.

Data Availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study

Notes

Note that this is the way uint8 addition is performed in MATLAB, however in the ANSI C standard values above 255 and below 0 overflow.
Tests were run in MATLAB on a 256x256 image using a $3 \times 3$ patch.

References

Khan JY, Yuce MR (2019) Internet of things (IoT): systems and applications. CRC Press
Bonomi, F, Milito, R, Zhu, J, Addepalli, S.: Fog computing and its role in the internet of things. In: Proceedings of the first edition of the MCC workshop on mobile cloud computing, pp 13–16 (2012)
Bakkouri I, Afdel K (2020) Computer-aided diagnosis (cad) system based on multi-layer feature fusion network for skin lesion recognition in dermoscopy images. Multimedia Tools and Applications 79(29–30):20483–20518
Article Google Scholar
Bakkouri I, Afdel K (2022) Mlca2f: Multi-level context attentional feature fusion for covid-19 lesion segmentation from ct scans. Signal, Image and Video Processing, pp 1–8
Mrozek D, Koczur A, Małysiak-Mrozek B (2020) Fall detection in older adults with mobile iot devices and machine learning in the cloud and on the edge. Inf Sci 537:132–147
Punyavathi G, Neeladri M, Singh MK (2022) Vehicle tracking and detection techniques using iot. Materials Today: Proceedings 51:909–913
Google Scholar
Ross R, Parsons L, Thai BS, Hall R, Kaushik M (2020) An iot smart rodent bait station system utilizing computer vision. Sensors 20(17):4670
Article ADS PubMed PubMed Central Google Scholar
Sharma A, Singh PK, Kumar Y (2020) An integrated fire detection system using iot and image processing technique for smart cities. Sustainable Cities and Society 61:102332
Article Google Scholar
Alsaawy Y, Alkhodre A, Abi Sen A, Alshanqiti A, Bhat WA, Bahbouh NM (2022) A comprehensive and effective framework for traffic congestion problem based on the integration of iot and data analytics. Appl Sci 12(4):2043
Article Google Scholar
Adelantado F, Vilajosana X, Tuset-Peiro P, Martinez B, Melia-Segui J, Watteyne T (2017) Understanding the limits of lorawan. IEEE Commun Mag 55(9):34–40
Article Google Scholar
Yu W, Liang F, He X, Hatcher WG, Lu C, Lin J, Yang X (2017) A survey on the edge computing for the internet of things. IEEE access 6:6900–6919
Article Google Scholar
Ramakrishnan, A, Conrad, JM (2011) Analysis of floating point operations in microcontrollers. In: 2011 Proceedings of IEEE Southeastcon, pp 97–100. https://doi.org/10.1109/SECON.2011.5752913
Limare, N (2014) Integer and floating point arithmetic speed vs precision. http://nicolas.limare.net/pro/notes/2014/12/16_math_speed/
Yates R (2009) Fixed-point arithmetic: An introduction. Digital Signal Labs 81(83):198
Google Scholar
Kahan W (1996) Ieee standard 754 for binary floating-point arithmetic. Lecture Notes on the Status of IEEE 754(94720–1776):11
Google Scholar
Shrivakshan G, Chandrasekar C (2012) A comparison of various edge detection techniques used in image processing. International Journal of Computer Science Issues (IJCSI) 9(5):269
Google Scholar
Gonzalez, RC, Woods, RE (2017) Digital image processing. Pearson
Agarap, AF (2018) Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375
Cabello, F, Leon, J, Iano, Y, Arthur, R (2015) Implementation of a fixed-point 2d gaussian filter for image processing based on fpga. In: 2015 Signal Processing: algorithms, architectures, arrangements, and applications (SPA), pp 28–33. https://doi.org/10.1109/SPA.2015.7365108
Hu X, Li X, Huang H, Zheng X, Xiong X (2022) Tinna: A tiny accelerator for neural networks with efficient dsp optimization. IEEE Transactions on Circuits and Systems II: Express Briefs 69(4):2301–2305
Google Scholar
Lampert, CH, Wirjadi, O (2006) Anisotropic gaussian filtering using fixed point arithmetic. In: 2006 International conference on image processing, pp 1565–1568. https://doi.org/10.1109/ICIP.2006.312606
Hwang, K, Sung, W (2014) Fixed-point feedforward deep neural network design using weights +1, 0, and -1. In: 2014 IEEE Workshop on Signal Processing Systems (SiPS), pp 1–6. https://doi.org/10.1109/SiPS.2014.6986082
Jacob, B, Kligys, S, Chen, B, Zhu, M, Tang, M, Howard, A, Adam, H, Kalenichenko, D (2018) Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Safonov I, Kornilov A, Makienko D (2022) An approach for matrix multiplication of 32-bit fixed point numbers by means of 16-bit simd instructions on dsp. Electronics 12(1):78
Article Google Scholar
Ram, B (2001) Adv microprocessors interfacing. McGraw-Hill Education
Lakshmia, PY, Lavanyab, Y, Babuc, BJ, Sridevid, B (2023) Comparative analysis of approximate integer and floating-point multiplier. In: Recent developments in electronics and communication systems: proceedings of the first international conference on recent developments in electronics and communication systems (RDECS-2022), vol 32, pp 45. IOS Press
York, R (2002) Benchmarking in context: Dhrystone. ARM, March
Dongarra JJ, Luszczek P, Petitet A (2003) The linpack benchmark: past, present and future. Concurrency and Computation: practice and experience 15(9):803–820
Article Google Scholar
Shrivakshan G, Chandrasekar C (2012) A comparison of various edge detection techniques used in image processing. International Journal of Computer Science Issues (IJCSI) 9(5):269
Google Scholar
Al-Kofahi, M.M, Al-Shorman, MY, Al-Kofahi, OM (2019) Toward energy efficient microcontrollers and internet-of-things systems. Computers & Electrical Engineering 79106457
ESP32 Wi-Fi and Bluetooth MCU (2020). https://www.espressif.com/en/products/socs/esp32/overview
ESP8266 Wi-Fi Modules (2020). https://www.espressif.com/en/products/modules/esp8266
Raspberry Pi 3 Model B (2020). https://www.raspberrypi.org/products/raspberry-pi-3-model-b
Ivkovic, J, Ivkovic, J (2017) Analysis of the performance of the new generation of 32-bit microcontrollers for iot and big data application. In: Proceedings of the international conference on information society and technology (ICIST), Kopaonik, Serbia, pp 12–15
Longbottom, R (2018) Raspberry Pi 3B+ 32 bit and 64 bit Benchmarks and Stress Tests. https://doi.org/10.13140/RG.2.2.31859.58403

Download references

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions.

Author information

Authors and Affiliations

Department of Engineering, La Trobe University, Science Drive, Bundoora, 3086, Victoria, Australia
Lyle Parsons, Guang Deng & Robert Ross

Authors

Lyle Parsons
View author publications
You can also search for this author in PubMed Google Scholar
Guang Deng
View author publications
You can also search for this author in PubMed Google Scholar
Robert Ross
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Ross.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Parsons, L., Deng, G. & Ross, R. Detecting image edges in IOT nodes without FPU. Multimed Tools Appl 83, 31161–31175 (2024). https://doi.org/10.1007/s11042-023-16672-4

Download citation

Received: 21 December 2022
Revised: 16 June 2023
Accepted: 18 August 2023
Published: 13 September 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-16672-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Detecting image edges in IOT nodes without FPU

Abstract

Similar content being viewed by others

An Edge Computing Architecture for Object Detection

Energy-Efficient Edge Intelligence: A Comparative Analysis of AIoT Technologies

Profiling and improving the duty-cycling performance of Linux-based IoT devices

1 Introduction

2 Related work

3 Unsigned Integer arithmetic

4 Applications in edge detection

4.1 The 8-bit unsigned integer edge detection algorithm

4.2 Implementing laplacian and sobel edge detection

5 Optimisation considerations

6 Results and discussion

6.1 Implementing the edge detection algorithms in MATLAB