1 Introduction

The development of reconfigurable analog circuits, especially field-programmable analog arrays (FPAA) is nowadays the most current challenge in the VLSIFootnote 1 branch of science which follows the trends of miniaturisation and automation [11, 14, 16, 26, 47]. Solutions for implementing large-scale programmable circuits prototypes with parameters comparable to these of dedicated analog circuits have been appearing in the last few years [1, 3, 22, 24, 30]. Noteworthy examples of conferring the reconfigurable feature for hybrid techniques with a memristor using were introduced [23]. Other works take into consideration the problem of simulating and performing a synthesis of analog structures into array resources [37]. The literature presents new possibilities for adopting analog reconfigurable architectures in the field of neural computing [29]. FPAAs are also subjects of industrial implementations and patents [45]. All of these examples were designed as voltage-mode arrays.

Implementation of reprogrammable circuits working in the voltage mode using amplifiers is becoming impossible in modern nanometre technologies, which are characterised by low supply voltages. Therefore, the literature contains FPAA implementations working in the current mode. One of the first propositions was using a current conveyor working in the continuous time mode [8, 32]. The main benefit of such an approach is the high working frequency of such circuit. However, still, data processing accuracy and the nonlinear relation between the resistance of a tunable resistor, capacitances of tunable capacitors and input frequency remain the biggest disadvantages. Limited functionality and high dependency on the transistor parameters dispersion create additional limitations. The continuous time mode prevents implementing advanced structures. Another FPAA proposition bases on a digitally controlled balanced output transconductor [25]. However, it does not fully work in the current mode, and its implementation requires implementing a gain amplifier. Implementing filters using such structures requires using capacitors; therefore, FPAA contains an array of keyed capacitances. Limited performance of current FPAA implementations is the very reason for their low popularity. In fact, difficulties in implementing analog structures (because of many parasitic effects which also require expertise to deal with), high sensitivity for mismatch and a lack of the possibility of full debug (design for testing strategy [19]) are reasons for low popularity of analog circuits.

Taking into account the above implementations and their limitations, the author decided to propose a fully analogue solution, with a digital interface, which can be implemented using the standard digital CMOS technology and providing the possibility of implementing advanced structures with high data processing accuracy. The author decided to use the current mode, due to the possibility of implementing the solution in modern nanometre CMOS technologies. The presented reconfigurable architecture can be implemented as a standalone chip or IPcoreFootnote 2 of a larger mixed system. The proposed FPAA structure features flexibility in selecting resources and a high versatility of applications. The solution is especially dedicated for applications in the sensor technology, in which it is worth performing the initial data preprocessing in an analogue circuit [46]. IoTFootnote 3 sensing devices require a reconfigurable structure for the variety of sensing with a feature of low power consumption. Analogue preprocessors and accelerators are used in vision systems for compressing analogue signals coming from image sensors [39]. The reconfigurability of a preprocessor is particularly important because of the possibility of implementing different compression algorithms and selecting the right degree of compression. The work analyses the accuracy of processing of the designed FPAA structure concerning its parameter dispersion. A method for eliminating the concurrent component, present in balanced structures working in the current mode, has also been proposed.

Due to the mentioned potential areas of application, the developed FPAA architecture has been equipped with modules making it possible to perform compression and image processing tasks. The whole current-mode FPAA architecture, reconfigurable modules and their routing are presented in Sect. 2. Few words about the programming method basing on the redundancy feature are presented in Sect. 3. Section 4 presents examples of circuits structures implemented with FPAA. Results of the implementation are discussed in the conclusion section.

2 FPAA Architecture

This section takes up a current-mode field-programmable analog array architecture. The architecture described in this work was inspired by an idea explained in [9]. Authors created an analogue circuit simulator working in a balanced structure. Admittedly, on the interface level, the proposed structure behaves like an analogue circuit. However, it is a digital circuit, and the proposed modification bases only on designing a DACFootnote 4 and ADCFootnote 5 converters interface. Such implementation lacks any of the benefits brought by analogue solutions, such as low power consumption and small area of such integrated circuitry. Existing modules were used in the project for implementing the example circuits. These modules included an FPGAFootnote 6 circuit and 16-bit converters (ADS8412, AD5546). Implementing converters with such high resolution in modern technologies is non-trivial. Thus, the solution is not very practical. The author of the current work decided to propose a fully analogue, balanced structure which, on one hand, provides high data processing accuracy and, on the other, is easier to implement. At the end of the Sect. 2.1, an analysis of the influence of dispersion was conducted.

The structure of the proposed FPAA is based on the CPLDFootnote 7 [7] concept in which dedicated modules are attached to the routing core. Next, subsections describe analog current-mode modules used in the FPAA and the proposed routing implementation.

2.1 Configurable Current Mirror

One of the most common cells used in circuits working in the current-mode is a current mirror implementing the multiplication operation. In the traditional ASICFootnote 8 implementation, scaling factors in mirrors are determined by choosing the relation between transistor sizes at their input and output stages [12]. With a programmable array, such an implementation has to be configurable with digital words. The structure of the proposed reconfigurable current mirror (RCM) is shown in Fig. 1.

Fig. 1
figure 1

Reconfigurable current mirror (RCM) output stage implementation

In fact, the figure presents a single output stage implementation of a multi-output RCM. Directions of currents are marked with arrows in the figure. The proposed FPAA works in the balanced mode, and therefore, input signals in the reconfigurable mirror fulfil Eq. 1.

$$\begin{aligned} {\textit{IN}}_{1}=-{\textit{IN}}_{2} \end{aligned}$$
(1)

\({\textit{CM}}1_{1}\) and \({\textit{CM}}1_{2}\) cells are multi-output mirrors with scaling factor equal to 1. They play roles of separate modules and duplicators of input signals. Number of their outputs is equal to number of outputs in the RCM. The outputs of CM1 mirrors are attached to the inputs of digital-to-analog converters (DAC). Their structure is described at the end of the current subsection. At this stage, let us just notice that both converters are controlled with a common \(n_{1}+n_{2}\) length word which defines factor A. Input DAC signals have values from Eq. 2.

$$\begin{aligned} {\textit{CA}}_{1}[in]=-IN_{1}, {\textit{CA}}_{2}[in]=-IN_{2} \end{aligned}$$
(2)

Output DAC signals are obtained by multiplying input signals by factor A. In fact, this factor implements a scaling factor of a single output stage in the RCM. DAC converters may be sources of errors in the processing path. Moreover, as it was discussed in [38] and [33] a common mode rejection ratio (CMRR) [10] may appear in a balanced current-mode structure. Compensation of the concurrent component by optimising the current structure is impossible because its value depends only to a certain extent on choosing MOS transistors parameters. Both of the mentioned phenomena are sources of a nonlinear error e in current signals. Therefore, a CMRR elimination module [41] was added to the circuit in Fig. 1. Input signals in the CMRR module have values from Eq. 3.

$$\begin{aligned} \begin{array}{c} {\textit{CM}}2_{1}[in]=-({\textit{IN}}_{1}\cdot A+e), \\ {\textit{CM}}2_{2}[in]=-({\textit{IN}}_{2}\cdot A+e) \end{array} \end{aligned}$$
(3)

CM2 mirrors are used to duplicate signals. One of the pairs was added in the input node CM3. Current at this input can be described using Eq. 4 and its simplified version in Eq. 5.

$$\begin{aligned} {\textit{CM}}_{3}[in]= & {} -[-({\textit{IN}}_{1}\cdot A+e)]-[-({\textit{IN}}_{2}\cdot A+e)] \end{aligned}$$
(4)
$$\begin{aligned} {\textit{CM}}_{3}[in]= & {} A({\textit{IN}}_{1}+{\textit{IN}}_{2})+2e=2e \end{aligned}$$
(5)

CM3 mirror produces the e error signal by dividing input current by 2. Output from CMRR module is obtained by subtracting e from CM2 output signals (Eq. 6).

$$\begin{aligned} {\textit{OUT}}_1={\textit{IN}}_{2}\cdot A, {\textit{OUT}}_{2}={\textit{IN}}_{1}\cdot A \end{aligned}$$
(6)

Taking Eq. 1, output RCM signals can be written in the form of Eq. 7.

$$\begin{aligned} {\textit{OUT}}_{1}=-{\textit{IN}}_{1}\cdot A, {\textit{OUT}}_{2}=-{\textit{IN}}_{2}\cdot A \end{aligned}$$
(7)

Removing the e component of the signal processing track increases the accuracy of calculations and makes it possible to implement more complex structures. Let us notice that interface of the analysed circuit has the functionality of a current mirror configured by a digital word. Structure of the circuit is fully symmetrical, which minimises the current offset and guarantees a common delay on both paths. The whole structure of the RCM stage is based on current mirrors; therefore, DAC538 converters were as well implemented with the current mirrors concept. In this case, a single mirror implementation is insufficient because of the low resolution and the necessity of using extremely long channels for implementing small factors. Because of the above, a structure in Fig. 2 was proposed.

Fig. 2
figure 2

DAC converter data flow

It consists of two stages. The first one is controlled with the \(B_{1}\) word and implements scaling factor \(\alpha \). Its output signal is driven to the second stage, which is controlled with \(B_{2}\) word and implements factor \(\beta \). It means that the A factor of the whole converter can be written in the form of Eq. 8.

$$\begin{aligned} A=\alpha \cdot \beta \end{aligned}$$
(8)

Each stage is composed of a diode-connected transistor pair (Mp1, Mn1),(Mp2, Mn2) and inverter-connected transistor pairs (INV11, INV12, \(\ldots \), \(INV1n_{1}\)), (INV21, INV22, \(\ldots \), \(INV2n_{2}\)) controlled with CMOS switches. A single diode, in combination with an inverter, works as a current mirror with a scaling factor dependent on choosing transistors sizes. Transistors sizes in diodes and inverters are chosen to achieve (at each i inverter output) the scaling factor two times bigger than at output \(i-1\). Assuming the smallest factors in both stages as \(\alpha _{11}\) and \(\beta _{21}\), the factors of stages can be represented by Eq. 9 and the final DAC converter factor by Eq. 10.

$$\begin{aligned} \alpha= & {} \alpha _{11}\sum \limits _{i=0}^{n_{1}-1}2^i, \beta =\beta _{21}\sum \limits _{j=0}^{n_{2}-1}2^j \end{aligned}$$
(9)
$$\begin{aligned} A= & {} \alpha _{11}\cdot \beta _{21}\cdot \sum \limits _{i=0}^{n_{1}-1} \sum \limits _{j=0}^{n_{2}-1}2^{i+j}B_{1}[i]\cdot B_{2}[j] \end{aligned}$$
(10)

Sizes of bit words (\(n_{1}\) and \(n_{2}\)) are marked in Fig. 1. The whole converter has a \(n_{1}+n_{2}\) length input bit word.

Taking into consideration physical parameters of the proposed DAC structure, special attention should be put to choosing channel lengths in transistors which are parts of the input stage and inverters. Power consumption of the converter depends on \(I_{Dp}\) currents flowing through PMOS transistors of the mentioned circuits. Let us consider power consumption of a single stage of a converter. The value of power consumption can be defined with:

$$\begin{aligned} P={\textit{VDD}}\cdot \sum I_{Dp} \end{aligned}$$
(11)

Taking into account that each output stage consists of a pair of inverters, the equation can be written as:

$$\begin{aligned} P={\textit{VDD}}\cdot \left[ I_{Mp1}+\sum \limits _{i=1}^{n_{1}}(I_{\textit{DINV}1i} + I_{\textit{DI}})\right] \end{aligned}$$
(12)

This article assumes that PMOS and NMOS transistors in diodes or inverters have a common length. Moreover, all NMOS transistors in diodes and inverters have a common width. Therefore, scaling factors depending on relations between transistors lengths and PMOS transistors widths are established to ensure a symmetrical answer for positive and negative currents and the input of the selected stage. Assuming that transistors work in the saturation region—Eq. 12 can be written in a form:

$$\begin{aligned} P\approx \varphi \cdot W_{avg}\cdot \left[ \sum \limits _{i=0}^{n_{1}}\frac{(V_{SG} +V_{Tp})^2}{L_{i}}+\sum \limits _{i=1}^{n_{1}}I_{\textit{DI}}\right] , \end{aligned}$$
(13)

where

$$\begin{aligned} \varphi =\frac{\mu C_{ox}{\textit{VDD}}}{2} \end{aligned}$$
(14)

\(W_{avg}\) parameter determines the average width of transistor channels. In fact, in such strategy it is only slightly different for all PMOS transistors and its choice depends on the common width of NMOS transistors. The above equation therefore proves that parameters which make it possible to decrease power consumption are \(L_{i}\) lengths of the input and output stages, as well as transistor sizes used for implementing the remaining inverters. From a functional point of view of the FPAA array, sizes of inverters which programme switches are insignificant. However, choosing \(L_{i}\) lengths is limited by the time constant of the circuit and influences its work speed. The time constant of the whole circuit from Fig. 2 can be written as Eq. 15, assuming that bit 0 is the least important bit. This means that INV11 implements the lowest scaling factor and consists of the longest transistors, according to Eq. 16 ([12]):

$$\begin{aligned} \tau= & {} \frac{4\cdot {\textit{VDD}}\cdot {C'}_{ox}\cdot L_{11}\cdot scale}{I_{on}} \end{aligned}$$
(15)
$$\begin{aligned} \alpha _{11}= & {} \frac{L_{Mp_{1},Mn_{1}}}{L_{11}}, \end{aligned}$$
(16)

where \({C'}_{ox}\) is the capacitance per unit gate area, VDD is the supply voltage, and \(I_{on}\) is a so-called ON current when VDS \(\,=\,\) VGS \(\,=\,\) VDD \(\cdot \, Scale\) (in \(\upmu \hbox {m}\)) is a generic scale factor used by the GDSFootnote 9 (Calma stream format) layout file provided the topography is drawn with reference to the minimum device dimension [20].

Taking into consideration the problem of choosing transistors sizes in diodes and inverters, two example methods can be suggested. The first one was described in [35] and is based on choosing solutions from a previously generated technology grid. The second method is optimisation using the Hooke–Jeeves algorithm [18] described in [28, 42]. The first approach seems to be faster when having a technology grid. However, the process of calculating the grid is time-consuming. The optimisation method gives solutions with a smaller factor reflection error.

Fig. 3
figure 3

Current mirrors parameter dispersion a scaling factor 1, b scaling factor 3. Asterisk Classic mirror (a transistor in diode connection and a transistor with a common gate), circle RCM, solid line ideal value, triangle inverted triangle Monte Carlo analysis. TT typical transistors; SS slow NMOS, slow PMOS, FF fast NMOS, fast PMOS; SF slow NMOS, fast PMOS; FS fast NMOS, fast PMOS; MC Monte Carlo. The analysis was performed using models provided by the Taiwan Semiconductor Manufacturing Company

The accuracy of mapping scaling factors also depends on their susceptibility to parameter dispersion of the silicone structure. Figure 3 presents the results of an analysis of the influence of the transistor parameters dispersion on mapping the scaling factor. The research was conducted using threshold voltage mismatch modelling and the Monte Carlo analysis for the circuit in Fig. 1, programmed to implement the functionality of a scaling circuit with coefficient equal to 1.0 and 3.0 and for a classic mirror (composed of a transistor in a diode connection and a transistor with a common gate) designed in the same technology (using piecewise cubic Hermite’s [34] and cubic spline [27] Interpolation) and implementing the same scaling factor. The analysis was done with 20 trials. The classic mirror was designed for transistor sizes comparable with the ones in RCM. It is worth noticing that using longer channels makes it possible to design mirrors with lower dispersion (during the MC analysis) [5]. However, according to Eq. 15, the maximal working frequency is then also lowered. As presented in Fig. 3, in the dedicated circuit, the discrepancy between the expected and the actual current mirror multiplier coefficient, assuming a fixed sampling time of circuits with switched currents, may range up to dozens of %. Applying the approach described in Sect. 3 of this article makes it possible to compensate these phenomena in the full range of changes in input signals and maintaining a full symmetry of operation of the RCM module.

In the end, it is worth mentioning a few words about the sizes of transistors in the DAC circuit from Fig. 2, designed in the 180-nm technology. While maintaining the above-described strategy of selecting transistor sizes, in a diode connection, the transistor length equals \(0.5\,\upmu \hbox {m}\). However, in output stages it varies from 0.42 to \(15.26\,\upmu \hbox {m}\). The lengths were chosen as a result of a consensus between power consumption and the maximum work frequency.

2.2 Routing of Array

This subsection presents the problem of a configurable routing of current-mode modules. The RCM described in the previous section is one of the possible modules working in the continuous time (CT) domain. Let us notice that in programmable systems, sequenced circuits are usually preferred; therefore, a switched-current (SI) modules implementation can be suggested. Unfortunately, the SI technique is characterised by low data processing precision, which has its source especially in CMRR, unmatching between modules and unmatching between transistors used to build modules which work in the balanced mode.

Taking into account the mentioned phenomena, reconfigurable tracking of modules was designed to ensure the symmetry of the final array. Its structure is shown in Fig. 4.

Fig. 4
figure 4

Configurable routing of modules

Let us analyse the tracking method basing on ROW1 from the figure. CMRR cells were moved to the centre of routing. Outputs current signals from SI/CT modules are attached directly to the CMRR modules. A set of switches S controlled with a \(W_{S}\) word is used to choose the next way of the current flow. Current can flow to the \(v_{1} \ldots v_{x/2}\) or \(v_{1+x/2} \ldots v_{x}\) nodes and is connected to the suitable node with block of switches \(\textit{SWITCHES}_{13}\) or \(\textit{SWITCHES}_{14}\), respectively. The blocks are controlled using \(W_{sp}\) and \(W_{sm}\) words. Subsequent blocks \(\textit{SWITCHES}_{11}\) and \(\textit{SWITCHES}_{12}\) are used to connect inputs to nodes. A single switch is composed of a CMOS pair of transistors with short channels (to minimise switch resistance) and a less than twice minimal width (to minimise parasitic capacitance in the routing node). Let us notice that red arrows in the figure are in fact representations of buses of currents and node lines correspond to single currents. Nodes are common for all of the switches blocks. Any of the nodes can be used as an input or an output port controlled with the \(W_{out}\) word. The whole structure is fully symmetrical, and the symmetry of implemented circuits depends on the method of assigning nodes. Finally let us mention that such an architecture can be easily divided into subcircuits marked in the figure with ROW rectangles. Thanks to such feature, the proposed FPAA concept can be used for ASIC or IPcore devices assembled with smaller modules, depending on required resources.

Fig. 5
figure 5

Layout of an example FPAA IPcore with 4 pairs of balanced mode CT mirrors, 6 SI integrators nad 16 SI memories

2.3 Layout of IPcore

This subsection presents a layout of an example FPAA IPcore designed in the 180-nm technology. The proposed architecture consists of rows shown in Fig. 4. It features large versatility because it can be easily modified with respect to requirements, by adding additional rows including the necessary CT/SI cells. An example of IPcore shown in Fig. 5 consists of 15 ROWs: 4 with an 8-output RCM pair, 3 with an SI integrator [36] pair and 8 with SI memory cells [17] with delay elements. Let us notice that routing is a part of ROWs. It means that there is now routing between ROWs. Moreover, metal1 layer is used to draw signal nets in routing regions. Thanks to the above, using of vias has been largely reduced, similarly to parasitic effects coming from routing. The topography sizes are: \(748 \times 1492\,\upmu \hbox {m}\), and it was used to implement the example filter described in Sect. 4.2 and image processors described in Sect. 4.3.

3 Programming

The proposed FPAA architecture has a very beneficial property, which can be seen in the modular structure, the flexibility of choosing scaling factors in RCMs or in routing methods. \(B_{1}\), \(B_{2}\) words controlling the DAC converter shown in Fig. 2 do not correspond directly to the converted current value. Scaling factor A from Eq. 10 depends on the multiplication product of words. Hardware calculation of the scaling factor would force the usage of a large digital decoder. The author proposes a quite different programming method based on choosing solutions from the previously generated grid. The method for its generation is described in Sect. 4.1. There are many benefits of such an approach:

  1. 1.

    No need to use a hardware decoder or a ROM memory for storing B words.

  2. 2.

    The possibility to generate grid of solutions at any design stage, the schematic stage or the layout stage (with parasitics) and even on the stage of the physical chip. It is a way to achieve higher data processing precision with thanks to taking into account the actual properties of circuits.

  3. 3.

    In contrary to the analytic method here, there are no discrepancies between given factors and the obtained ones, caused by parasitics. The only differences have their sources in the resolution of the grid.

  4. 4.

    In the literature, many methods for modelling a mismatch phenomenon with a specific probability were proposed [4]. The approach proposed in the current work makes it possible to synthesise analog circuits taking into account an actual topography mismatch.

Another benefit of such an approach is that transistors parameters in DAC or in the whole RCM module do not have to be calculated with high precision, which means that restriction in Eq. 9 is not crucial. As a proof for this thesis, the worst scenario is analysed in the next section: stage1 and stage2 of DAC from Fig. 2 have their factors \(\alpha _{11}\) and \(\beta _{11}\) equalled and have equalled B word length (Eq. 17).

$$\begin{aligned} \alpha _{11}=\beta _{11}, n_{1}=n_{2}=n_{1,2} \end{aligned}$$
(17)

In such a case, using an analytic method the number of unique solutions \(N_{US}\) would be strongly reduced and could be expressed using Eq. 18 in comparison with the best scenario where it can have its maximum number \(N_{MAX}\) in Eq. 19.

$$\begin{aligned} N_{US}= & {} 2^{2n_{1,2}}-\sum \limits _{i=1}^{2^{n_{1,2}}-1}i \end{aligned}$$
(18)
$$\begin{aligned} N_{MAX}= & {} 2^{n_{1}+n_{2}} \end{aligned}$$
(19)

Practically, the worst case is the easiest one, as far as the design complexity is concerned, because it means that the DAC is build of the two same stages. Next section proves that no optimisation effort is necessary with FPAA ROWs implementation. This means it is an easy-to-use solution. Section 4.1 presents the efficiency of the redundancy feature in RCM programming, and Sect. 4.2 shows the precision of the proposed FPAA implementation. Both are the answer to the uncertainty of how many resources are needed to ensure the proper data processing accuracy.

Few words about memory size for configuring an FPAA must be said concerning required resources to sum up the current section. It depends on dimensions of the RCM: the number of RCM outputs (k1) and the digital word size (\(n_{1}, n_{2}\)) of the DAC. Next, dimensions of routing are important: the routing verse height (k2) which depends on routed modules interfaces, the number of nodes (x), the size of \(W_{S}\) in each ROW, the number of rows (R) and finally the size of port switches, which can be equalled to the number of nodes (x). Memory size can be calculated using Eq. 20.

$$\begin{aligned} \begin{array}{c} M_{size}=R_{\textit{RCM}}[(n_{1}+n_{2}k_{1}+(1+k_{1})(x+W_{S})] \\ + R_{\textit{INT},\textit{MEM}}(N_{in}+N_{out})(x+W_{S})+N_{p} \end{array} \end{aligned}$$
(20)

Layout shown in Fig. 5 is build of an 8-output RCM with a 12-bit DAC, integrators with 2 inputs and 2 outputs, and has 32 nodes. All 4 rows with RCMs are programmed with 1572 bits, 3 rows with integrators are routed with 396 bits, 8 rows with memories are routed with 1056 bits, and ports are controlled with 32 bits, which gives 3056 bits of resources needed for configuring the IPcore. The author, using C++, developed computer tools for the design process, integrated with the presented architecture, which automate the process of generating a grid of solutions and configure an FPAA memory based on a description of a synthesised analogue circuit architecture, with its description in VHDL-AMS. The next chapter presents details concerning implementation of circuits of different classes, which were included in the developed system.

4 Example Implementations

This section describes four examples of circuits implemented with the FPAA. The first example, which shows a potential of an RCM, is a DAC converter. The second example is an elliptic filter implemented using RCMs and SI integrators modules, and the third and fourth examples are image processors implemented using RCMs and SI memories.

4.1 10-Bit DAC Converter

As the first example, a 10-bit digital-to-analog converter is analysed to show the efficiency of the redundancy feature. The converter was implemented using only one RCM module. As mentioned in the previous section, the RCM was designed for the worst case (Eq. 17) to demonstrate the low sensitivity of the implemented examples according to array parameters. The grid of solutions obtained from post-layout simulations of the single RCM module gives a set of assignments of scaling factors A (Eq. 8) to concatenations \(B_{1}B_{2}\). Let us notice that the number of possible solutions N which allow to implement a searching factor S with an acceptable mismatch e varies in the whole set. Moreover, this number is inversely proportional to the value represented by multiplication \(B_{1}\cdot B_{2}\) (Eq. 21).

$$\begin{aligned} N_{S+e}\sim \frac{1}{B_{1}\cdot B_{2}} \end{aligned}$$
(21)

In other words, having the DAC from Fig. 2 designed for the set of possible factors \(\langle S_{min},S_{max}\rangle \) there are many solutions which implement the factor approximately equalled to \(S_{min}\) and only one solution which implements the factor approximately equalled to \(S_{max}\). This only one solution corresponds to concatenation \(B_{1MAX}B_{2MAX}\). Because of these properties, the mismatch e depends directly on the multiplication of B words (Eq. 22).

$$\begin{aligned} e\sim B_{1}\cdot B_{2} \end{aligned}$$
(22)

It means that mismatch may depend on the searching factor value S (Eq. 23).

$$\begin{aligned} e\sim S \end{aligned}$$
(23)

The above dependencies raise a question about the necessary resources which allow to obtain satisfactory parameters of circuits implemented with the FPAA. Table 1 shows parameters of a grid of solutions obtained for the RCM designed with specifications from the previous section. DAC stages were designed for given limits of the set of scaling factors. The maximum error of the factor reflection is equal to 2.24\(\%\) of its value.

Table 1 RCM and grid parameters from post-layout simulation
Fig. 6
figure 6

10-bit DAC converter realised using a single RCM

The architecture of the proposed 10-bit DAC converter is shown in Fig. 6. It was built using seven outputs of a single RCM. The first output is controlled using 12 bits. The next two outputs are switched together with a single bit, and the next four outputs with another bit. Outputs 2 and 3 act as sources of current which corresponds to the 9th DAC bit and are configured by word B9. Outputs 4–7 act as sources of current which corresponds to the 10th DAC bit and are configured by word B10. A digital RAM or a digital decoder is used to decode the 10-bit word of the converter, out of the 14 bits which control the circuit.

Decoding is done using the previously generated grids of solutions. The main problem is to find B9 and B10 words with the minimal mismatch in the output current. Using just one grid (same as for configuring output 1) does not provide a possibility to obtain satisfactory parameters of the converter because factors in the RCM could in this case be chosen only from solutions approximately equalled to \(i_{ref}\cdot S_{MAX}\) with the maximal mismatch. Equations (24) present ranges of possible solutions to implement the first output current \(i_{out1}\) and the next output currents \(i_{{out}j}\).

$$\begin{aligned} \begin{array}{l} i_{out1} \in \langle 0, MAX_{1}\rangle , \\ i_{{out}j} \in \langle 0, j\cdot MAX_{1}\rangle , MAX_{1}=i_{ref}\cdot S_{MAX} \end{array} \end{aligned}$$
(24)

Parameters of the converter can be improved provided the next j output is programmed using a new grid, generated in the case where its scaling factor \(S_{j}\) is chosen from a different range than factor \(S_{j-1}\). It can be proved that the distribution of redundancy is more uniform in the whole range used to design the converter if Eq. (25) is fulfilled.

$$\begin{aligned} \begin{array}{l} i_{out1} \in \langle 0, MAX_{1}\rangle , \\ i_{{out}j} \in \langle (1-\frac{\sqrt{2}}{2})\cdot MAX_{j-1}, (2-\frac{\sqrt{2}}{2})\cdot MAX_{j-1}\rangle \end{array} \end{aligned}$$
(25)
Fig. 7
figure 7

Differential nonlinearity and integral nonlinearity of a 10-bit DAC converter

Table 2 Parameters of a converter in comparison with other current-mode implementations

Figure 7 presents INLFootnote 10 and DNLFootnote 11 parameters of the designed converter. Let us notice that the figures reflect the distribution of redundancy in the whole range of the converter factor. INL and DNL deviations increase from 0 to 256 bits, according to redundancy decreasing in the first grid. Next, at bit 427, after crossing the limit from eq. (25) an increasing trend is hampered because of broadcasting a possibility of choosing a factor by adding another grid. A small downgrade of converter parameters is observed at the end of the range in which factors in outputs 3–7 have values which influence mismatch, according to eq. (23).

Converter parameters are compared in Table 2 with corresponding current-mode ASIC implementations in suitable technologies taken from the literature.

4.2 Elliptic Filter

This section presents an example of an elliptic filter implementation. Example is taken from [35]. The filter has the following parameters: third order, 20-dB attenuation in the stop band and a 0.6-dB ripple in the pass band. The prototype of a ladder filter is presented in Fig. 8.

The inductor L was replaced by a gyrator–capacitor circuit (\(IG1-C4-IG2\)) [13, 21]. Parameters of the gyrator–capacitor prototype were calculated using the method proposed in [15]. Choosing the calculation method depends on hardware resources of the FPAA matrix, especially concerning the size of the grid calculated for the RCM circuit. Note that the maximum parameter dispersion cannot be higher than the one defined in Eq. 26, basing on Eq. 10. Table 3 presents parameter dispersion concerning gyrator, capacitance and conductance calculated for the analysed example with two methods: the Hooke–Jeeves algorithm [18] and the Powell’s method.

Fig. 8
figure 8

Prototype of a third-order ladder filter

Table 3 Parameter dispersion of the elliptic filter
$$\begin{aligned} {\textit{DISP}}=\langle \alpha _{11}\cdot \beta _{21}, \alpha _{11}\cdot \beta _{21}\sum \limits _{i=0}^{n_{1}-1} \sum \limits _{j=0}^{n_{2}-1}2^{i+j}\rangle \end{aligned}$$
(26)

Current-mode implementation of the filter can be obtained by solving node voltage equations; therefore, parameter dispersion of the SI circuit depends on parameter dispersion of the GC model. The dispersion should be confronted with parameters \(S_{MIN}\) and \(S_{MAX}\) from Table 1, as well as Eq. 23 which defines the accuracy of mapping parameters of the circuit depending on the sizes of these parameters. Both proposed methods make it possible to calculate parameters of the model with values which have counterparts in the lower area of the solution grid, which, in turn, ensures mapping with high accuracy. Calculations can be made automatically using the environment proposed in another work [16]. Below, a snippet of a VHDL-AMS schematic description was added to present the spread of scaling factors and the whole architecture of the filter with a balanced structure. The description is readable by the EDA system, developed by the author, which parses a filter architecture into a bit stream programming an FPAA memory. As shown, 4 pairs of current mirrors and 3 integrator cells are needed during the placement. Mirrors are numerated according to ROWs from Fig. 4, and they are placed in pairs \(([CMXXp,CMXXm],ROW_{XX+1})\) in the array. In the filter, there are 14 nodes which have to be assigned to proper nodes in the array routing. They are numerated with symbols sXp and sXm which corresponds to \(v_{1}\ldots v_{x/2}\) and \(v_{1+x/2}\ldots v_{x}\), respectively, in Fig. 4. \(W_{S}\) word manages the change in polarity of nodes according to data from the netlist.

figure a

Figure 9 presents a filter pulse response in the time domain, obtained in post-layout simulations with a parasitics extraction, and Fig. 10 shows its answer in the frequency domain, obtained using the FFT. The SNRFootnote 12 coefficient of the simulated filter is 40.42 dB. The power consumption of an IPcore is equal to 22.97 mW with a 1.8 V power supply. The maximum clock frequency in integrators is 3.6 MHz.

Fig. 9
figure 9

Pulse filter response obtained for a balanced structure in post-layout simulations

Fig. 10
figure 10

The frequency characteristic of a filter implemented with an FPAA: solid line ideal characteristic, asterisk post-layout simulations of the implemented circuit

4.3 Image Processors

In vision systems, FPAA accelerators perform preprocessing of analogue signals coming from a sensor. Colour space conversion is often carried out in order to eliminate the chrominance component of the image, followed by a compression, using a 2-dimensional discrete Fourier transform (2D-DCTFootnote 13). This section presents results of the implementation of both circuits on the sensor. Both the DCT and the colour space conversion are transformations used in many image standards, e.g. JPEG and MPEG.

The idea of the 2D-DCT has been repeatedly discussed in the literature [2]. The hardware implementation of calculating the transform comes down to implementing the following equation 27:

$$\begin{aligned} Y=Z^{T}C^{T}, Z=X^{T}C^{T} \end{aligned}$$
(27)

X is the frame of the processed image and the C matrix, in case of the \(4\times 4\) DCT, has the following form 28:

$$\begin{aligned} C=\frac{1}{2}\cdot \left[ \begin{array}{cccc} a &{}\quad a &{}\quad a &{}\quad a\\ b &{}\quad c &{}\quad -c &{}\quad -b\\ a &{}\quad -a &{}\quad -a &{}\quad a\\ c &{}\quad -b &{}\quad b &{}\quad -c\\ \end{array} \right] \end{aligned}$$
(28)

where \(a=\frac{1}{2}\), \(b=\sqrt{\frac{1}{2}}cos(\frac{\pi }{8})\), \(c=\sqrt{\frac{1}{2}}cos(\frac{3\pi }{8})\). The hardware implementation of the above transformation comes down to implementing multiplication operations by coefficients a, b, c and additions resulting from matrix multiplication operations. Implementing a two-dimensional transformation requires two computing blocks and a memory block for intermediate results. Constants in the C matrix were implemented as scaling factors in RCM mirrors. The sign of the multiplier depends on connecting the output of the mirror to a proper inverting or non-inverting node in the balanced structure. The discussed FPAA structure makes an on-the-run reconfiguration possible, which means that the second matrix multiplication operation can be implemented in the same multiplying block, on condition that the matrix routing is changed. Figure 11 shows subsequent sequences of the operation.

Fig. 11
figure 11

2D-DCT calculation with FPAA reprogramming

In the I sequence, the bit stream programmes scaling factors of the multiplying block, as well as the routing for implementing the calculation of a one-dimensional DCT. In the II sequence, in the configured processor, multiplying blocks calculate the first matrix operation, and clocks control loading all the calculated data into the SI memory. Once the data have been loaded, the array routing is reprogrammed during the III sequence, so that memory outputs and inputs of multiplying blocks are on common nodes, so that data stored in the memory will be again a subject of a matrix operation. Additionally, matrix output nodes for the calculated signals are selected. During the IV sequence, the second calculation of the one-dimensional transform takes place, resulting in a two-dimensional transform of the original input data.

Fig. 12
figure 12

2D-DCT processor’s response for input signals from Eq. 29

Figure 12 presents a sample output waveform of a two-dimensional transform, calculated from INPUT data with Eq. 29. Equation 30 shows the result of a perfect 2D-DCT, while Eq. 31 presents the array response calculated and scaled to the dynamic range. The values are given in \(\upmu \hbox {A}\).

$$\begin{aligned} {\textit{DCTIN}}= & {} \left[ \begin{array}{cccc} 2 &{}\quad 2 &{}\quad 2 &{}\quad 2\\ 1.5 &{}\quad 1.5 &{}\quad 1.5 &{}\quad 1.5\\ 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ 0.5 &{}\quad 0.5 &{}\quad 0.5 &{}\quad 0.5\\ \end{array} \right] \end{aligned}$$
(29)
$$\begin{aligned} {\textit{IDEAL}}= & {} \left[ \begin{array}{cccc} 5 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 2.231 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 0.159 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ \end{array} \right] \end{aligned}$$
(30)
$$\begin{aligned} {\textit{CALC}}= & {} \left[ \begin{array}{cccc} 4.942 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 2.282 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 0.043 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ 0.162 &{}\quad 0 &{}\quad 0 &{}\quad 0\\ \end{array} \right] \end{aligned}$$
(31)

Preprocessor parameters in Table 4 are compared with implementations of 2D-DCT \(4\times 4\) analogue processors, as classic dedicated circuits, in the same technology. A greater processing accuracy is mainly the effect of programming the architecture, taking into account routing parasites and the loads.

Table 4 DCT results comparison

As a second example of an analogue preprocessor, an RGB-to-YCrCbFootnote 14 colour space converter [12] was implemented. The transformation makes it possible to eliminate redundant information about the colour and to transmit, to the subsequent signal processing track, only information about the brightness of a pixel. It comes down to calculating a simple matrix multiplication described with Eq. 32.

$$\begin{aligned} \left[ \begin{array}{c} \pm Y\\ \pm Cr\\ \pm Cb\\ \end{array} \right] = \left[ \begin{array}{ccc} 0.299 &{}\quad 0.587 &{}\quad 0.114\\ 0.5 &{}\quad -0.419 &{}\quad 0.081\\ -0.169 &{}\quad -0.331 &{}\quad 0.5\\ \end{array} \right] \left[ \begin{array}{c} \mp R\\ \mp G\\ \mp B\\ \end{array} \right] \end{aligned}$$
(32)

The preprocessor implementation algorithm is similar to the previous case and is based on the implementation of multiplying matrix elements as scaling factors in RCM blocks. In this case 6 three-output mirrors are required. The work of the circuit was verified in post-layout simulations, in which the following PSNRFootnote 15 coefficient was obtained: 46.95 dB. Implementing the same structure as a dedicated circuit in the same technology after optimisation gives the PSNR factor equal to 49.16 dB, and in the 90-nm technology, 43.54 dB [28].

5 Conclusion

The article presents the architecture of a current-mode field-programmable analog array, which is an answer for problems occurring while implementing a classic FPAA in modern technologies. Details about the proposed structures were explained, and few words about its properties were said. The proposed solution provides many advantages, compared to existing solutions. Firstly, it makes it possible to implement analogue reprogrammable circuits using a standard, digital CMOS technology. Furthermore, using the current mode makes it possible to implement it in submicron technologies, which are difficult for implementing FPAA circuits working in the voltage mode. The proposed solution features a high processing accuracy and resistance to the mismatch phenomenon, thanks to the proposed software methodology, which takes into account load parasites and routing of modules in the process of their programming. A method for synthesising analog structures from their VHDL-AMS descriptions corresponding to the existing digital solutions. The solution has versatile applications, which has been proven by examples of implementation of different classes of circuits, like a converter, a filter and image processors. Comparing the processing accuracy with dedicated circuits, designed in the same technology, proves that the proposed architecture and programming methods can compete with the existing solutions and should contribute to popularisation of reconfigurable analog circuits.