A nine-input 1.25 mW, 34 ns CMOS analog median filter for image processing in real time

In this paper an analog voltage-mode median filter, which operates on a 3 × 3 kernel is presented. The filter is implemented in a 0.35 μm CMOS technology. The proposed solution is based on voltage comparators and a bubble sort configuration. As a result, a fast (34 ns) time response with low power consumption (1.25 mW for 3.3 V) is achieved. The key advantage of the configuration is relatively high accuracy of signal processing, which allows the calculation of the median of signals with the difference in amplitude as small as 10 mV. This feature allows the application of the filter to vision systems with up to 7 bit equivalent resolution. The analytical and statistical analysis of the filter resolution, and analysis of its speed limitations are presented and compared to measurement results. Based on the achieved results, a set of guidelines for the filter design and optimisation is presented.


Introduction
The median filtering is a non-linear operation frequently used in early vision image processing [1]. This kind of filtering allows for removing of high frequency impulsive noise, while preserving sharp edges in an original image. For this reason, the median filtering greatly facilitates further image processing such as the edge detection or the segmentation. In integrated circuits, the median filters are implemented using both analog [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16] and digital [17][18][19][20] techniques, depending on the design requirements. Typically, the greatest reduction of consumed power and required chip area are achievable using analog median filters, because they do not require analog to digital converters (ADC) for conversion of video signals generated by photo sensors. Whereas, digital filters require dedicated ADC for each analog video signal, which for a 3 9 3 kernel means nine additional converters. Alternatively, a single fast converter multiplexed between nine inputs can be used, but such an approach reduces an overall image processing speed. Because, the number of ADCs increases with the square of the kernel size, the digital approach leads to excessively large circuit solutions, that consume relatively high power. The analog implementations of the median filters are simpler in construction, consumes less power, and are much faster, but have limited accuracy of signal processing. Therefore, if the application of a median filter is targeted at a video system with low or moderate dynamics, it is advantageous to use the analog filters. However, a particular attention should be paid to optimisation of the accuracy of such filters.
One of the methods used to obtain the median value, both in the digital or analog realizations, is a bubble sorting. This method leads to a simple and intuitive filter structure [2,5]. On the other hand the filter complexity increases significantly with the number of inputs. In this paper an area-efficient implementation of a nine-input analog precise median filter, using the bubble sort is proposed.
Section 2 presents the configuration and the principle of operation of the proposed analog median filter. It is presented a circuit solution and analysis of its limitations and nonidealities. Section 3 is devoted to details of the filter implementation in a 0.35 lm CMOS technology, and results of simulation and measurements. The final conclusions are presented in the last section.
2 Median filter 2.1 General configuration and principle of operation The bubble sort configuration, shown in Fig. 1, is adopted for implementation of the median filter. The circuit consists of 19 MAXMIN selector circuits and an output buffer. Each MAXMIN circuit has two inputs V in1 and V in2 , and two outputs V max and V min . The output signals depend on instantaneous values of the inputs, according to the following relationships As (1) shows, MAXMIN selects the maximum input value and transmits it to the output labelled V max . Analogously, the minimum input value is directed to the output V min . As a result, 19 MAXMIN selectors realize the bubble sort algorithm with the output voltage V out being a median of all nine inputs V i1 …V i9 The exemplary decisions (the node voltages) of the comparators, labelled in Fig. 1, have been deliberately marked incorrectly, to show the influence of the comparators input offset voltage on the final result. This issue will be explained in detail in Sect. 2.2.
The topology, shown in Fig. 1, has several advantages in comparison to other median filters presented in the literature [2,4,5,11,13]. First of all, the proposed median filter is devoid of negative feedback loops, which guarantees the stability, and an aperiodic time response. Secondly, the circuit implementation of MAXMIN is simple, and can be based on a voltage comparator and an analog multiplexer. And finally, due to the configuration in Fig. 1 has only forward paths, its time response is very fast. The filter structure in Fig. 1 can be easily expanded to a general rank order filter by adding eight comparators.
The MAXMIN circuit is designed using a differential two stage voltage comparator, which controls an analog multiplexer composed of four MOS switches. Figure 2 shows details of the circuit. The differential pair M1-M2, together with the active cross-coupled load M3-M6 make an input stage of the comparator. Such a configuration ensures a good stabilization of the common-mode output voltages and a relatively high differential gain. The output stages, on transistors M7-M8 and M9-M10, generate balanced signals, which directly control the switches M12-M15. It is worth to notice, that the series resistance of those switches does not affect the function (1) of the circuit, because in a steady state condition there is no current flow between the filter inputs V i1 …V i9 and the output V out , and consequently there is no voltage drop. In other words, the output V out in a steady state is an exact replica of one of the input signals.

Corner errors
The analog implementation of the filter in Fig. 1 has a limited resolution in distinguishing values of the input signals V i1 …V i9 . This feature is called corner error [9,12], and is the main cause of erroneous calculation of the median value at the filter output. A unique feature of the proposed circuit solution (Fig. 2) is that it does not cause any distortions of the processed video signals. Nonidealities of the MAXMIN circuits may lead to an incorrect choice from the set of the input signals, however the output value is always exactly equal to one of the inputs. To explain this feature, consider the sample input sequence shown in Fig. 1, where the input instantaneous values are: 610, 605, 600, 595, 400, 395,195,190, and 185 mV. In this input set, there are two values, namely 390 and 400 mV, which are close to each other, and also are close to the correct median value 400 mV. The remaining values are similar to each other but are spaced from the median. In the ideal case, the output of the filter should be equal to V i5 = 400 mV. However, in a real circuit, in which the comparators may have an input offset voltage V OS there will appear several incorrect decisions. An example of such a situation is illustrated in Fig. 1, where the actual output value is V i6 = 395 mV, assuming that V OS = ± 10 mV is randomly assign to individual comparators. It can be demonstrated that regardless of the input sequence order, the filter output signal reaches one of the two values V i6 or V i5 . This example shows that the total resolution of the filter is limited by the greatest input offset voltage of all comparators used.
In order to achieve high accuracy of signal processing, the limitations and nonidealities resulting from the MAX-MIN circuit must be analysed, and carefully minimised. As explained, the only cause of errors in the topology shown in Fig. 1 is incorrect classification of signals by the voltage comparators. There are two main sources of errors in the comparator, an input offset voltage and a transmission delay. The input offset voltage can be evaluated by adding both components, due to the differential pair (M1-M2), and the cross-coupled load (M3-M6). The voltage offsets, caused by the output stages composed of M7-M8 and M9-M10, are omitted, due to their small relative values. It is because, the output stage offsets are divided by a large gain of the first stage when referred to the comparator input. The input offset voltage of the differential pair results mainly from the threshold voltage mismatch [21,22] which is where (DV T ) 1,2 denotes the mismatch between the threshold voltages of M1 and M2. In (3) the transconductance parameter mismatch is omitted due to its relatively low impact [21] on the total input offset voltage. The contribution of the load to the offset voltage, referred to the input, is determined as a current mismatches (DI DS ) 5,6 and (DI DS ) 3,4 , between two pairs M5-M6 and M3-M4, divided by the voltage-current gain g m1,2 of the differential pair, which results in Assuming the square-low voltage-current characteristic for the transistors, the total input offset V OS of the comparator can be expressed as where (DV T ) 5,6 and (DV T ) 3,4 represent the mismatches of the threshold voltages between M5-M6 and M3-M4. The threshold voltage mismatch can be expressed in terms of the standard deviation r by means of the Pelgrom's formula [23,21] rðDV GS Þ ¼ where A V T is a parameter specific for a selected technology. Assuming that V OS1,2 and V OS3-6 are statistically independent and DV T ð Þ 5;6 ¼ DV T ð Þ 3;4 , the input offset voltage of the comparator, in Fig. 2 can be expressed as As (7) shows, the reduction of V OS requires the application of M1 and M2 with a channel width as large as possible, and M3-M6 with the maximum possible channel length. To demonstrate the importance of a proper selection of the transistors width, let us consider three cases: (a) (W/L) 1,2 = 2/1, (b) (W/L) 1,2 = 10/1, (c) (W/L) 1,2 = 50/1, where (W/L) 3,4(5,6) = 2/1. Using (7) and technology parameters specific for AMS 0.35 lm CMOS process, namely A V T pMOS ¼ 10:3 mV lm, A V T nMOS ¼ 6:7 mV lm, The plots clearly show, that the input differential pair has a dominant influence on the total input offset voltage. This voltage can be reduced by using wide channel transistors and design the differential pair in the common-centroid manner.
The comparison of the input offset voltage is presented in Fig. 4 for two cases: (a) M1 and M2 design as separate devices, and (b) M1 and M2 design in a common-centroid layout. As the histograms show, the common-centroid layout allows over 30 % reduction of the input offset voltage. The minimization of the input offset voltage, by increasing area of M1 and M2, causes degradation of the comparator speed, because it increases its input capacitance C in .

Propagation delay
In the filter in Fig. 1 the input signals propagate in parallel through the layers in the structure. The total propagation time of each signal depends on the path, which it passes. In the worst case, this time is equal to nine delays of a single MAXMIN circuit. The propagation delay of MAXMIN consists of the delay introduced by the comparator and the delay of the analog multiplexer. The propagation delay time of the comparator is estimated under assumption that the input differential signal is big enough to fully switch on one of the transistors M1 or M2 (Fig. 2a), causing the biasing current I bias to flow through one of them. The circuit configuration relevant to this case is shown in Fig. 2(b). The propagation time can be estimated as a sum of two components: (i) the delay t delay,A between the input and the node labelled A, and (ii) the delay t delay,B between the node labelled A and B (the comparator output).
where: DV A and DV B denote the voltage swings at nodes A and B, SR A and SR B are the slew rates at the same nodes. SR A depends on the biasing current I bias and the total capacitance of the node A, which results inwhere A7 = gm7/(g ds7 ? g ds8 ) is the voltage gain of M7. The positive and negative slew rates at the node B is determined respectively by the drain current of M8 and M7, and the total capacitance of this node, which leads to The expressions (10a) and (10b) show, that the slew rate for a rising edge is four times smaller than the slew rate for a falling edge. The voltage swing DV A can be calculated bearing in mind that the full biasing current I bias flows through M1, and as a result only M3 is active in the load circuit.
where l 0 , C OX , and (W/L) have their usual meanings. The worst case of the total propagation delay can be calculated using (9), (10a), (11), and (8). Assuming that: (W/L) 1,2 = 10 lm/1 lm, (W/L) 3-6 = 2 lm/1 lm, I bias = 10 lA, and n = 2, the following results were obtained: DV A = 0.34 V, SR A = 61.7 V/ls, and SR þ B = 2270 V/ls. The propagation delays for this case are t delay,A = 5.54 ns, and t delay,B = 0.72 ns. Finally, the analytically calculated value of the total propagation delay is t delay = 6.26 ns, whereas the delay determined based on the circuit simulation is 4.42 ns.
The delay time of the analog multiplexer can be resolved based on the time constant associated with the resistance r ds12-15 of the turned-on switches M12-M15 and the input capacitance C in of the comparator. Assuming that (W/L) 12-15 = 1 lm/0.35 lm, the worst case of r ds12-15 is about 7-8 kX. The estimated input capacitance of the comparator for (W/L) 1,2 = 10 lm/1 lm and (W/L) 3-6 = 2 lm/1 lm is about C in & 120 fF. The time constant for this case is about 0.95 ns, which means that the delay of the multiplexer is much smaller than the delay of the comparator.
A unique feature of the presented filter is that the comparator settling time has little influence on the video signal delay, and therefore the median filter is relatively fast. This is due to the fact that the outputs of the comparators control the switches and do not directly affect the video signals. As a 3 Implementation and measurements of a median filter

Filer implementation
The median filter is designed to be implemented in AMS 0.35 lm CMOS technology. Based on the guidelines, given in the previous section, the MAXMIN circuits are optimised to reach a trade-off between accuracy and speed. To achieve a resolution equivalent to 7 bits for a video signal with 1.8 V amplitude, the filter resolution must be at least 14 mV. According to the histograms presented in Fig. 3(a), such a requirement can be satisfied if the width of the input transistors is greater or equal to 10 lm, in the case of M1 and M2 designed as separate devices. A better result is achievable, if the common-centroid technique is applied. In this case, as Fig. 3(b) shows, width of the transistors can be reduced to less than 10 lm. In the final circuit implementation, the transistors M1 and M2 were divided into four parts, and arranged in a common-centroid layout. The selected dimensions of the transistors are as follows: (W/L) 1-2 = = 10/1, (W/L) 3-6 = 2/1, (W/L) 7,9 = 4/1, (W/L) 8,10 = 2/1, (W/L) 11 = 4/1, (W/L) 12-15 = 1/0.35. The biasing currents are established to be 10 lA for M11, and 5 lA for M8 and M10. The Monte Carlo analysis reveals that for the worst case, the voltage gain of a single comparator is 3,800 V/V, the input common-mode range is better than 2 V, and the input offset voltage is below 5 mV. The summary of the main electrical parameters of the MAXMIN circuit is given in Table 1. The layout of the MAXMIN circuit has dimensions 27 lm 9 27 lm, and the layout of the overall median filter, consisting of 19 MAXMIN circuits, occupies 0.014 mm 2 area (Fig. 5).

Measurements of electrical parameters
The operation of the median filter has been verified by means of simulations and measurements of a prototype integrated circuit. The simulated time response of the filter for a test signal is presented in Fig. 6. A set of nine triangle waveform signals (thin lines) is applied to the filter inputs. The succesive triangle signals are delayed by 1 ls and have the same amplitude equal to 1.8 V. In this case, the filter is driven by many possible combinations of the inputs. The solid line, in Fig. 6, shows the output voltage of the filter, which correctly determines the median value, according to the formula (2).
The final filter resolution was also determined using a triangular waveform with 500 mV pp peak-to-peak value, and two DC voltages (200 and 500 mV) applied to three inputs of the filter. The remaining three inputs were connected to 0 V, and three to 1.8 V. Figure 7 shows the input and output waveforms for this case. The DC signals are omitted for clarity. Figure 7(a) shows a general view, whereas Fig. 7(b) presents a magnified portion of the waveforms, that shows the moment of the output switching. The plot in Fig. 7(b) clearly shows that the input offset voltage of the comparator is about 10 mV, which makes the relative resolution equal to 0.55 % (7 bits), for the input voltage range of 0-1.8 V.
The measured dynamic properties of the filter were determined for the worst case. As Fig. 1 shows, the total delay time depends on a selection of the input signals combination, and it reaches its maximum when the input  Fig. 6 The output response of the median filter for triangle waveforms signals propagate through 9 MAXMIN circuits. A time response of the filter was measured for that case. Figure 8 presents the square wave input signal V i5 of 8.33 MHz frequency, and the output V out of the filter. For the considered case, the total delay is about 34 ns. The delay is defined as a time interval between 1 % and 99 % of the initial and final values of the signal during its transition. This means that the achievable speed of an image reconstruction is 8.33 MP/s (Table 1).

Image filtering example
The presented median filter has been developed for use it in a prototype system with parallel processing of video images in real time. The system consists of a 32 9 32 photopixel matrix and 32 analog convolution filters (Fig. 9a), which processes 2,000 frames per second. Due to the limited area of the prototype integrated circuit, only a single median filter is implemented. The video signals from the matrix are sequentially directed to the median filter using a signal multiplexer, as shown in Fig. 9(b). For this reason, the maximum speed of median image processing is relatively low, about 30 frames per second. Based on simulation, the maximum speed of the median filtering is estimated to be about 1,600 frames per second if the parallel processing is used [24]. The raw images generated by a CMOS photo matrix are of low quality, due to the fixed pattern noise (FPN), random noise (RN), photo-response non-uniformity, and so called 'death' pixels (DP) [25]. An example of such an image is presented in Fig. 10(a). Without proper processing, this kind of image has little application in practice. One of the most frequently used methods for image enhancement are low-pass convolution filtering or median filtering. The sample results of processing, using the prototype image system and both types of filtering, are depicted in Fig. 10. As Fig. 10(b) shows, the convolution filtering reduces FPN and RN, but also causes an image blur, which in turn deteriorates the overall image quality. Significantly better results can be achieved with the median filtering (Fig. 10c), where after the removal of FPN, RN, and DP, the sharp edges of the object are preserved.

Comparison
The comparison of the proposed median filter to other analog solutions is summarized in Table 2. All the compared filters operate in continuous time, but they differ in the applied circuit technique and number of inputs. For example, in [4] the analog delay cells and comparators are used to implement a nine-input median filter. The references [2] and [5] present the filters which perform the bubble sort algorithm using current mirrors and current comparators. Another technique is used in [11] and [13], where three-input filters are designed using differential amplifiers, and a feedback. An interesting solution is presented in [14], where the authors propose a high precision four-input rank order filter based on gain-boosting amplifiers and voltage followers. A complete comparison of the filters is difficult, because in most of the published works, the authors do not specify a full set of filters parameters. Nevertheless, in terms of power consumption, accuracy, speed and area, our median filter is competitive to other solutions. Our circuit based on the MAXMIX selectors and the bubble sort topology is a bit more complicated than the other solutions, although the area needed for its Fig. 9 Vision chip: a system architecture, b arrangement of a median filter and a pixel matrix implementation is smaller. It is worth noting that the design procedure of our filter is easier, due to the fact that its main parameters, such as accuracy and speed, are easier to predict based on the comparator parameters. Also, our median filter is absolute stable, and has no DC offset problem in the output signal.

Conclusions
A low-power, high-speed, compact nine-input analog median filter based on a bubble sort network is presented. The operation of the designed filter has been carefully tested by means of simulations and measurements, performed with the use of a test vision system containing an integrated prototype vision-chip. The presented filter has advantages such as high processing speed and low power consumption. Its resolution is better than 0.55 %, the delay time is below 34 ns, and the total power consumption is 1.25 mW. The filter has a very compact layout and occupies a relatively small area equal to 0.014 mm 2 . Due to the small area and low power consumption, the filter is attractive for implementation of large, parallel, real-time image processing systems with a high computation rate.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited. Fig. 10 Results of an exemplary image processing: a raw images, b low pass convolution filtering, c median filtering