1 Introduction

Since the introduction of the transistor, the semiconductor industry has always been able to propose an increasingly higher level of circuit performance while keeping cost constant by scaling the transistor’s area. This scaling process (named Moore’s law) has been followed since the 80s. However, it has been facing new constraints and challenges since 2012. Standard sub-30nm bulk CMOS technologies cannot provide sufficient performance while remaining industrially profitable. Thereby, various solutions, such as FinFETs (Auth et al. 2012) or Fully Depleted Silicon On Insulator (FDSOI) (Faynot et al. 2010) transistors have therefore been proposed. All these solutions enabled Moore’s law scaling to continue. However, when approaching sub-10nm technology nodes, the story starts again. Again, process costs and electrical issues reduce the profitability of such solutions, and new technologies such as Gate-All-Around (GAA) (Sacchetto et al. 2009) transistors are seen as future FinFET replacement candidates. On the other hand, alternative solutions to overly expensive transistor scaling are currently being explored and developed by several academic research centers. Among these approaches, one solution consists of enhancing the transistor’s functionalities instead of scaling its dimensions. In other words, the transistor performances are boosted by the addition of functionalities inside the device itself. From this perspective, the transistor’s performance is enhanced, providing the same computing performances than a more advanced scaled CMOS node while providing a reduced process fabrication cost overhead. Compared to the standard Moore’s scaling law, this approach could enable a strong cost reduction thanks to relaxed constraints on the device scaling. From this perspective, devices with new or boosted functionalities (i.e., Functionality Enhanced Devices—FED) are explored in the literature and, among these, this chapter focuses on the Silicon NanoWire Field Effect Transistor (SiNWFET) (De Marchi et al. 2012). The three major enhancements of this technology are (i) dynamic majority carrier type control, (ii) dynamic subthreshold slope control and (iii) dynamic threshold voltage tuning. Thanks to these effects, it is demonstrated that SiNWFET enables denser logic gates than equivalent performance CMOS gates while using less transistors. While several works exploring logic gate design (Gaillardon et al. 2016; Amarú et al. 2013) and circuit synthesis (Arnani et al. 2013) are reported, memory blocks using SiNWFET are poorly explored (Shamsi et al. 2015). Thereby, in this work, memory blocks using SiNWFET are explored. The following methodology was considered (i) identification of the standard CMOS-based solution, (ii) proposition of standard replacement architecture solutions using SiNWFET, (iii) exploration of breakthrough architecture solutions and, (iv) benchmark with existing solutions. The memory market is usually segmented into two categories. The first one is the volatile memory category that includes several technologies for various uses. Among them, the most known ones are the embedded SRAM caches and the standalone DRAM memories (Chang et al. 2017; Lee et al. 2014). However, even closer to the logic circuit than SRAM, some memorization points are present under the name of Flip-Flops. In parallel to the logic circuit development, flip-flops using SiNWFETs must be designed. The first subsection of this chapter explores the design of flip-flops using SiNWFET technologies. As the microprocessors are expected to operate a high frequencies, this section focuses on True Single Phase Clock (TSPC) (Tang et al. 2014) that are adapted thanks to the lack of cross-coupled latch inside the Flip-Flop structure. The other memory market category is the Non-Volatile Memory (NVM) category. Most well-known for its high density Solid-Sate Drive (SSDs) or Hard Disks Drive (HDDs) standalone chips (Park et al. 2015; Mamun et al. 2017). NVM co-integration with advanced CMOS nodes is required to enable microcontrollers production, leading to huge design and technology complexity (i.e., costs) (Shum et al. 2017). Due to skyrocketing costs, floating gate-based NVM integration takes more and more time to be performed at each CMOS technology node (4 years for the 90 nm, more than 10 for the 28nm technology node) (Strenz 2012). Knowing this, it seems unrealistic to develop a floating gate-based NVM for SiNWFET technology. The second part of this chapter thus focuses on emerging NVM technologies (such as Filamentary-based Resistive Switching Technologies—RRAM) that provide low process cost overhead and fabrication friendly materials. Additionally, these technologies are widely explored and seen as the floating gate-based embedded NVM replacement by both industrial and academic researchers (Kawahara et al. 2013; Portal et al. 2017). The major contribution of this section is the use of the dynamic polarity control enabled by SiNWFET technology to enhance the density and functionality of such NVM blocks. The outline of this chapter is organized as follows. First, the necessary technology background on SiNWFET technology is introduced. Then, an extensive study of TSPC Flip-Flops using SiNWFET is carried out. Basic TSPC flip-flops as well as enhanced TSPC with embedded logic functions are designed and compared to CMOS-based TSPC providing the same functionalities. Finally, this chapter addresses the design of NVM memories blocks using 1 Transistor-1 RRAM (1T1R) bitcells. After the basis of filamentary-based RRAM technology is presented, the dynamic polarity control of SiNWFET technology is used to provide more efficient and denser 1T1R solutions than equivalent CMOS-based solutions.

Fig. 1
figure 1

a 3D view of a polarity controllable SiNWFET structure. b Simulated I-V curves of n-type (black) and p-type (red) operation in a 25nm equivalent node (Mohammadi et al. 2015) with detailed subthreshold current. c Symbol view of the PC SiNWFET with independent polarity gates and the equivalent circuits. d Symbol view of the PC SiNWFET with the polarity gate (PG) node connected to both PGs and PGd

2 Polarity Controllable SiNWFET Technology Overview

This section introduces the necessary technology background of this chapter on Polarity Controllable (PC) Silicon Nanowire FET (SiNWFET) technology. In parallel to the evolution of regular CMOS technologies, devices providing dynamic polarity control were explored in order to enable extremely dense and low power logic gates (Gaillardon et al. 2016) (e.g. such as 4-transistors XOR function (Amarú et al. 2013)). Thereby, polarity control transistors are actively studied and various highly-scaled FET devices based on silicon nanowires (De Marchi et al. 2012; Heinzig et al. 2012), carbon nanotubes (Lin et al. 2005), graphene (Nakaharai et al. 2014), FinFETs (Zhang et al. 2014a) and WSe2 based bipolar transistors (Resta et al. 2016) have been demonstrated. Among these technologies, the Silicon NanoWire Field Effect Transistor (SiNWFET Marchi et al. 2014; De Marchi et al. 2012) using a gate-all-around process appears to be the most natural evolution from FinFET transistors as its process is based on a FinFET fabrication process (Sacchetto et al. 2009). Figure 1a presents the physical structure of the considered SiNWFET transistors. Polarity controllable devices provide huge flexibility due to their performance control. Indeed, by controlling the voltages on the two polarity gates (PGs and PGd), on the control gate, on the drain and the source, several effects can be obtained: polarity control, subthreshold slope control and threshold voltage modulation. This chapter focuses on the polarity control effect presented Fig. 1b. By changing the polarity gates voltage bias, it is possible to switch from an n-type MOS behavior to a p-type MOS behavior. As shown Fig. 1c PC SiNWFET transistor can be approximated as a 2 independent gates transistor by biasing one of the polarity gates at gnd (to create two serial p-type transistors) or at vdd (to create two serial n-type transistors). This configuration will be considered in the TSPC section (i.e., Sect. 3). On the other hand, as it is shown Fig. 1d, by connecting together the PGD and the PGS together, a single polarity gate (PG) is created. This PG controls the transistor type while keeping it as a single transistor. This properties are used in the NVM section (i.e., Sect. 4).

3 Silicon NanoWire FET Based True Single Phase Clock Flip-Flops

This section explores the opportunities opened by Functionality-Enhanced Devices (FED) in local memorization blocks (i.e., flip-flops) embedded in the logic. The contributions of this section are summarized as follow. Innovative Polarity Controllable (PC) SiNWFET-based True Single Phase Clock Flip-Flops embedding asynchronous set, reset and logic operations are proposed. Then, these topologies are compared with equivalent standard CMOS solutions. Energy and area gains are exhibited.

3.1 True Single-Phase Clock (TSPC) Flip-Flop

Because they are a key component of integrated sequential circuits, flip-flops strongly impact the area, speed and power consumption of modern digital circuits (Teh et al. 2011). Two main kind of Flip-flops are reported, the dynamic and the static flip-flops (Rabaey et al. 2002). Static flip-flops, that are usually built around two inverter-based latches, are highly sensitive to the inverter design and may suffer of area and delay increase. On the other hand, thanks to a simpler architecture, flip-flops based on dynamic logic have advantages in high operating speed and area density compared to static ones (Teh et al. 2011) Additionally, while most of the static flip-flops require both clock signal and its inversion, dynamic flip-flops do not need it (Yuan and Svensson 1997), simplifying the clock tree synthesis. Thereby, using single phase clock and dynamic logic do not only lead to compact design but also faster response. Finally, a unique feature of TSPC flip-flop is the reduction of the overall setup time and delay obtained by embedding logic gates inside the structure (Rabaey et al. 2002). For these reasons, TSPC flip-flop widely used in many high-speed applications (Nomura 2008), will be explored in this chapter. CMOS TSPC flip-flop can be built with only 9 transistors, which is very compact as compared to static version with 22 transistors (Rabaey et al. 2002). A TSPC flip-flop with asynchronous reset and set requires 6 additional transistors for pulling-up to vdd or pulling-down to gnd at each stage.

Fig. 2
figure 2

Schematic of a CMOS-based TSPC flip-flop embedding an asynchronous set and reset though S and R signals (Golinescu 1999; Yuan and Svensson 1997)

As depicted in Fig. 2, CMOS TSPC flip-flop is composed of three stages. (i) A precharge stage with a clock low enable. This stage is transparent when CLK \(=\) 0, i.e. the node X is updated with the value of D only when CLK \(=\) 0. (ii) A latch stage. The second stage is a latch stage, storing the value of node X at the rising edge of CLK. When CLK \(=\) 0, node Y is precharged to vdd. At the rising edge of CLK, node Y can be updated to gnd if node X \(=\) 1. Note that even if node X can be pulled-down when CLK \(=\) 1, the latched value at node Y cannot be modified. (iii) A precharge stage with clock high enable. The last stage is transparent when CLK \(=\) 1. It propagates node Y only when CLK \(=\) 1. However, as it is based on precharging and discharging, TSPC flip-flop relies on internal capacitance for storage, which requires periodical refreshing (on the order of milliseconds) (Rabaey et al. 2002). In addition, logic gates can be directly inserted into the first stage of CMOS TSPC flip-flop. Figure 3 depicts a TSPC flip-flop with a prior AND function. The setup time of a single TSPC flip-flop increases but considering a AND gate cascaded by a standard TSPC flip-flop, the overall setup time decreases (Rabaey et al. 2002).

Fig. 3
figure 3

Schematic of a CMOS-based TSPC flip-flop embedding an asynchronous set and reset though S and R signals (Golinescu 1999; Yuan and Svensson 1997). AND gate is highlighted in red

3.2 Standard PC SiNWFET Flip-Flop

This section proposes to explore the opportunities opened by PC SiNWFET in TSPC flip-flop design. First, circuit design opportunities are explored. Then, circuit functionality is verified through electrical simulations. Finally, the PC SiNWFET flip-flops area and delay are compared to standard CMOS TSPC flip-flops and the results are discussed.

3.2.1 Structure Modifications

As it was introduced before, PC SiNWFET leads to area and timing efficiency thanks to the fact that it is an equivalent circuit of two serial transistors in a unique device (as it was shown Fig. 1). In standard CMOS-based TSPC flip-flops, as shown in Fig. 3, serial transistors configuration is widely used. In addition, PC SiNWFET can be dynamically reconfigured from n- to p-type (Zhang et al. 2013), depending on the signal applied to its gates. Therefore, it is possible to dynamically convert a pull-up transistor into a pull-down one and vice-versa. A combination of these two properties is used to compact the original TSPC flip-flop design.

Figure 4 depicts a PC SiNWFET TSPC flip-flop with asynchronous set and reset operations (High Enabled). The novel circuit consists of only 8 PC SiNWFETs and its schematic fits the layout style in Bobba et al. (2012), consuming only 3 tiles. By applying design rules of Fig. 1, all control gates and polarity gates are fully used. Compared with CMOS design (Fig. 3), the novel circuit removes the pull-up and pull-down transistors at first stage and third stage. Transistor N1 (highlighted by an orange rectangle in Fig. 4) plays a dual role, as it replaces part of pull-up and pull-down branches of the first stage of Fig. 3. Transistor N2 (in red) plays an identical role for the third stage. In addition to serial transistors merge into single devices, improvements are done at the two pre-charge stages.

TSPC flip-flop first stage improvements

Asynchronous set requires node X to be pulled down to gnd at the first stage even when CLK \(=\) 0 and D \(=\) 0. In CMOS design, an additional pull-down transistor, controlled by signal S, is required in order to avoid a path from vdd to gnd when CLK \(=\) 1 and D \(=\) 0. Then, to prevent another path from vdd to gnd when CLK \(=\) 0 and D \(=\) 0, a pull-up transistor, also controlled by signal S, is added. The transistor N1 in Fig. 4 realizes this dual functionality. When set is enabled, transistor N1 switches from n-type to p-type, while its source is set to vdd. Thus, transistor N1 is only in on state when CLK \(=\) 0 and X \(=\) 0 (when D \(=\) 0, X \(=\) 0), which ensures that node Y can be pulled up to vdd whatever CLK and D are. In Fig. 3, asynchronous set path is highlighted in dashed arrow while asynchronous reset is in solid line arrow.

TSPC flip-flop third stage improvements

Asynchronous set requires immediate logic-high output when enabled. Therefore, the third stage should be pulled down once set is enabled even when CLK \(=\) 0. In CMOS design, a pull-down transistor is required. However, in PC SiNWFET design, similarly, transistor N2 switches from p-type to n-type, while its source is set to gnd when set is enabled. As node Y is pulled up to vdd when set is enabled, transistor N2 is in on state, thereby pulling down node Q- to gnd.

Fig. 4
figure 4

Schematic of the proposed PC SiNWFET-based TSPC flip-flop embedding asynchronous set and reset features

3.2.2 Transient Validation

To validate the correct behavior of the cell under asynchronous set and reset, electrical simulations are performed, and transient waveforms are shown in Figs. 5 and 6. A simple 22nm PC SiNWFET table-based compact model, derived from Zhang et al. (2013), is used with HSPICE simulator. Figure 5 verifies that the output Q can be pulled up once set is enabled even during the most challenging case (CLK \(=\) 0, D \(=\) 0 and Q \(=\) 0). For asynchronous reset, the most challenging case happens when CLK \(=\) 0, D \(=\) 1 and Q \(=\) 1. From Fig. 6, the output Q is observed to be correctly pulled down, when set operation is triggered. Once set/reset signals are de-asserted, the output Q switches again accordingly to the next clock rising edge.

Fig. 5
figure 5

Electrical simulation waveform of an asynchronous set in a PC SiNWFET-based TSPC flip-flop

Fig. 6
figure 6

Electrical simulation waveform of an asynchronous reset in a PC SiNWFET-based TSPC flip-flop

3.2.3 Circuit-Level Performances Estimations

To evaluate the performance of the proposed flip-flop design, four major metrics are considered: area, setup-time, hold-time and clock-to-Q delay. In this section, experimental results, obtained by electrical simulations, are compared between TSPC flip-flop designs, implemented both in traditional CMOS (Fig. 2) and in PC SiNWFET (Fig. 4), using a of 22nm technology node. Note that static flip-flops are out of the scope of this chapter and therefore are not discussed. For CMOS technology, PTM 22nm LSTP FinFET model (PTM 2021) is considered. To accurately measure the minimum setup time/hold time and clock-to-Q delay, a binary search approach is used by setting a delay tolerance corresponding to 10% of the reference delay and a resolution of 0.01 ps (Cadence 2009).

Table 1 Comparison between FinFET-based and PC SiNWFET-based TSPC flip-flops. SiNWFET area \(=\) 1.5 MOS area

Table 1 shows the comparison the two TSPC flip-flop implementations. Thanks to the compactness properties of PC SiNWFETs, an area saving of up to 20% is achievable. Regarding timing performances, PC SiNWFET TSPC flip-flop reduces its internal delay by 30% on average. The remarkable performance gains come from the area reduction given by PC SiNWFET as well as the intrinsic parasitic capacitance reduction given by a single device instead of two serial CMOS transistors.

3.3 Enhanced PC SiNWFET FF with Logic Operations

In addition to the realization of more compact pull-up and pull-down networks, PC SiNWFETs are also used to implement the AND logic function natively (Zhang et al. 2013). This feature can be efficiently embedded into PC SiNWFET flip-flop. For instance, Fig. 7 shows a PC SiNWFET flip-flop with AND gate embedded in the first stage. In the PC SiNWFET implementation, the first stage is a clocked AND gate, derived from Zhang et al. (2013). The following stages remain unchanged as compared to Fig. 4. Note that the clock signal in the first stage is wired to the low-leakage controllability gate, leading to a larger power efficiency as a trade-off in internal delay. The clock signal replaces the fixed gnd biases at the first stage of CMOS design. When CLK \(=\) 0, the first stage works as a XOR gate, i.e., X is equal to A AND B. When CLK \(=\) 1, node X is pulled up to vdd, as opposed to the regular CMOS design where node X is left untouched. Therefore, an additional inverter is inserted to avoid the conditions where a path from vdd to gnd can be created (when CLK \(=\) 1).

Fig. 7
figure 7

Schematic of the proposed PC SiNWFET-based TSPC flip-flop embedding AND gate and asynchronous set and reset features

Electrical simulations, depicted in Fig. 8, validates the AND/TSPC flip-flop functionality. Indeed, it shows that the output Q equals to A AND B at each clock rising edge. Finally, the performance between CMOS design (Fig. 7) and PC SiNWFET design (Fig. 8) are compared in Table 2. It shows that the area shrinks by 21%, while the delay reduces by 6% on average and leakage power drops by 45%. Leakage gain is accounted for power-efficiency of the PC SiNWFET AND gate (Zhang et al. 2013). Nevertheless, note that the timing performance gain is lowered because of the use of the low-leakage controllability gate of PC SiNWFETs.

Fig. 8
figure 8

Electrical simulation waveform of a AND operation in a PC SiNWFET-based AND-gate embedded TSPC flip-flop

Table 2 Comparison between FinFET-based and PC SiNWFET-based TSPC flip-flops with embedded AND operation. SiNWFET area \(=\) 1.5 MOS area

3.4 Discussion and Conclusions

In this section, TSPC flip-flop circuit designs, leveraging the compactness offered by PC SiNWFETs, are explored and various solutions are proposed. Experimental results, obtained by electrical characterization, show that PC SiNWFET implementation improves area, delay and leakage power by nearly 20%, 30% and 7% respectively compared to CMOS design. In addition, this section showed that an AND gate can be embedded into the structure. This leads to an area reduction of 21%, a delay reduction of 6% and a leakage reduction of 45% on average. It shows that beyond standard logic functions improvements, PC SiNWFET also enhance the flip-flop functionality which enables area and energy reduction while adding functionalities. In the following section of this chapter, the focus is given in the opportunities opened by PC SiNWFET in the case of NVM co-integration.

4 Emerging Resistive Memories Architectures Using PC SiNWFETS

In this section, we proposes to investigate dense memory array implication of using PC SiNWFET as selectors in the bitcell. To that end, this chapter first proposes an extensive overview of the filamentary RRAM technologies and programming conditions. Then, it explores the opportunities opened by the fine-grain dynamic polarity control offered by PC SiNWFET technology. Finally, two PC SiNWFET based bitcells are proposed and explored from an array point of view. The proposed PC SiNWFET RRAM bitcells enable low voltage operation (no gate-overdrive required) while enabling from 1.5x up to 2.45x compared to CMOS-based 2T1R bitcells.

4.1 RRAM Technology

On the other side of the technologies developments, non-volatile resistance switching devices enabled by simple technology stacks and materials are seen by both industrials and academics as an attractive solution for the future of Non-Volatile Memories (NVM) (Burr et al. 2008). These devices, by enabling non-volatile resistance switching on a 2-terminals device are widely explored thanks to their huge integration density enabled by crosspoint architecture (Patel and Friedman 2014). However, these high density solution suffers of middle voltage constraints during programming operations (i.e., 3 to 5 V of programming voltage) and huge periphery area overhead (Levisse et al. 2017). For these reasons, 1-Transistor 1-Resistance (1T1R) architectures are still explored by the industrials (Wei et al. 2011) for NOR memories replacement (Giraud et al. 2017). The main resistive memory technologies (RRAM) are the Phase Change Memories (PCM) (Wong et al. 2010; Burr et al. 2016), the Magneto-resistive Memories (MRAM) (Endoh et al. 2016; Apalkov et al. 2013) and the Filamentary Resistive Memories (Oxide-based OxRAM and Conductive-bridge CBRAM) (Vianello et al. 2014; Wong et al. 2012). While the non-volatile resistive switching properties exploration started in 1960s with the study of reversible breakdown in Metal/Oxide/Metal (Au/SiO/Au and Au/SiO/Al) stacks (Nielsen and Bashara 1964; Simmons and Verderber 1967), it was later on totally forgotten with the continuous scaling of floating gate NVMs. However, with the worries about the “scaling wall” and the end of Moore’s Law (End of Moore’s Law 2016, 2013; Haron and Hamdioui 2008), it was resurrected (Beck et al. 2000; Zhuang et al. 2002) and pushed as an opportunity of future replacement candidate for DRAM (Prenat et al. 2014) and Flash memories (Kawahara et al. 2013; Baek et al. 2005; ExtremeTech 2013; Computerworld 2013). Today, filamentary RRAMs are seen as the most promising candidate among all the other technologies (PCM, MRAM) thanks to its CMOS-compatible and extremely simple fabrication process (Vianello et al. 2014; Wong et al. 2012). Filamentary RRAM are organized in three families. The Oxide-based (OxRAM), the Conductive-Bridge (CBRAM) and the Hybrid. While each family relies on a different physical effect, electrical behaviors of OxRAM, CBRAM and Hybrid are identical. All these memories rely in the creation (named set operation) and destruction (named reset operation) of a Conductive Filament (CF) inside an insulating material. This leads to a controllable variation of the equivalent material resistance state.

4.1.1 Filamentary Based RRAM Technologies

In OxRAM technology, the CF is made of oxygen vacancies in a Transition-Metal-Oxide (TMO) switching layer. Under the effect of the electric field and joule effect, the oxygen ions migrate inside the TMO. The switching materials are generally made of metal oxides such as HfOx (Goux et al. 2011), AlOx (Lee et al. 2010), NiOx (Seo et al. 2004), TiOx (Kwon et al. 2010) or TaOx (Lee et al. 2011) sandwiched in-between two metal electrodes (Top Electrode—TE and Bottom Electrode—BE). Two different kinds of OxRAM exist: the Unipolar (or Non-Polar) OxRAMs relying only on the thermal effect for the filament destruction (Symmetric stacks such as Pt/HfO2/Pt (Cagli et al. 2011), Pt/NiO/Pt (Seo et al. 2004) or TiW/SiOx/TiW (Chen et al. 2016)), while Bipolar OxRAM uses the electrical field effect to amplify the thermal effect during reset (non-symetric stacks with vacancies scavenging layer are used such as Ti/HfO2 (Vianello et al. 2014)). Bipolar OxRAM devices enable smaller reset current and better switching characteristics than non-polar devices. In CBRAM technology, the CF creation and destruction relies on the electro-migration of metal ions from the TE inside an insulating material. Various insulator/metal couples can be used, such as GeS2/Ag (Jameson et al. 2012) or Al2O3/Cu (Belmonte et al. 2013). Finally, Hybrid resistive memories (Vianello et al. 2014) are a mix between OxRAM and CBRAM. Oxygen vacancies are used to improve the mobility of metal ions during the set and reset operations (Molas et al. 2014; Nail et al. 2016). For all the previously introduced filamentary RRAM technologies (OxRAM, CBRAM or Hybrid), the electrical behavior is almost identical and is introduced in Fig. 9. At the beginning of the RRAM life, an electro-forming operation is mandatory to perform a first breakdown of the insulator material and create the first oxygen vacancies (OxRAM) or metal (CBRAM) CF (Xu et al. 2008). Compliance current is used to limit the current and enables a reversible breakdown. Once the electro-forming step performed, the RRAM is in Low Resistance State (LRS). From a LRS, by applying a reverse voltage across the device, a part of the CF can be destroyed changing the RRAM resistance to a High Resistance State (HRS) (i.e., a reset operation). Finally, a set operation is performed by applying a positive voltage pulse across the RRAM. The set operation is similar to an electro-forming operation but with lower voltages.

Fig. 9
figure 9

I-V curve of a filamentary RRAM with detailed electroforming, reset and set operations inspired from Thammasack et al. (2017)

It is worth noting that the electro-forming step is a critical operation for filamentary RRAM while it requires high voltages and long time (Vianello et al. 2014; Lorenzi et al. 2012) (around the \(\upmu \)s at 3volts for a HfO2-based OxRAM). From a test point on view it may be critical and cause the Electrical Wafer Sort test to be overly long and not cost effective. As this point is weakly documented and as most of the device development teams are trying to lower the electro-forming voltage or create forming-free devices (Kim et al. 2016; Chakrabarti et al. 2014), in the following, the assumption that the electro-forming operation has already been performed is taken and the focus is given on the set and reset operations. Figure 10 presents the relation between the programming current (Iprog) and the LRS value during a set operation. This direct relation between the programming current is explained by the fact that a higher current lead to a wider CF, which results in a lower resistance state. Figure 10a (inspired from Vianello et al. (2014), Fackenthal et al. (2014), filament scaling inserts are inspired from Vandelli et al. (2011)) shows that this relation is commonly considered for both OxRAM and CBRAM. Finally, Fig. 10b (inspired from Garbin et al. (2015)), shows the cumulative distribution of the achieved LRS. A lower Iprog results in a wider distribution while a higher Iprog in a narrower distribution. This behavior can be explained physically by the width of the created CF. A higher Iprog produces a wider filament, less sensible to variations than a small CF resulting of a lower Iprog.

Fig. 10
figure 10

a Evolution of the LRS value versus the Iprog current for various CBRAM and OxRAM technologies inspired from and Vianello et al. (2014), Fackenthal et al. (2014) and Vandelli et al. (2011). It shows a noticeable dependency and the corresponding Modeled CF size for various Iprog currents. b Variability of the LRS value for various Iprog current. It shows that lower is the Iprog larger is the CF distribution, inspired from Garbin et al. (2015)

Figure 11a and b, inspired from Vianello et al. (2014), shows the relation between the programming time and the applied voltage across the RRAM device. During a set pulse, the time required to start the switching operation reduces exponentially with the applied voltage. In reset operation, the required time to reach a given HRS value (in this example, 5 times the LRS value) also depends exponentially with the applied voltage. Finally, Fig. 11c (also inspired from Vianello et al. (2014)) shows the relationship between the achieved HRS value versus the applied voltage for a constant programming time (1 \(\upmu \)s) and starting from a constant LRS (here 3 k\(\Omega \)).

Fig. 11
figure 11

a Evolution of the set time versus the set voltage. b Evolution of the reset time versus the reset voltage for a constant HRS/LRS ratio. c Evolution of the HRS value versus the reset voltage at for a constant time

This section introduced the metrics and trade-offs of filamentary RRAMs technologies. After a first electro-forming operation requiring long time and high voltage, the RRAM device end-up in a LRS. From a LRS, a reset operation can be performed by applying a negative voltage across the RRAM. A higher reset voltage induces a higher HRS value at constant time or a faster reset operation for a given HRS target. From a HRS, a set operation can be performed by applying a positive voltage across the RRAM. A higher voltage induces a faster set operation. Once the switching happens, the current flowing through the RRAM must be limited to ensure a reversible switching. The resulting LRS value depends on the maximum Iprog current value through the RRAM.

4.1.2 RRAM Electrical Models

Various filamentary RRAM models are reported in the literature (Hajri et al. 2017; Strukov et al. 2009; Strukov and Williams 2009; Pickett et al. 2009; Kvatinsky et al. 2012, 2015; Jiang et al. 2016; Bocquet et al. 2014b). However, all these models are not suitable for fast and accurate electrical simulations of RRAM memory array. Indeed, to the best of our knowledge, only (Jiang et al. 2016; Bocquet et al. 2014b) includes the relationship between the programming current Iprog and the obtained resistance value. Additionally, as the model introduced in  Bocquet et al. (2014a, b) is compiled for Eldo simulator (Platform 2021) and fitted with up to date silicon data (Vianello et al. 2014). This simulation set is considered it in the following. The model used to simulate the RRAM relies on electric field-induced creation/destruction of oxygen vacancies within the switching layer, as presented in Bocquet et al. (2014a, b). The memory resistance is directly linked to the radius of the conductive filament (CF), which is calculated thanks to a single master equation continuously accounting for both electro-forming/set and reset. The model takes into account various phenomenon, including the switching time dependency versus the applied voltage for all operations, the HRS value evolution versus the applied voltage during reset, the LRS value evolution vs Iprog, and the temperature impact on all operations (i.e., the relationships from Figs. 10 and 11 are modeled). In the following simulations the RRAM electro-forming step is considered as already performed.

4.2 PC SiNWFET-OxRAM Co-Integration

In this section, as it was identified in Jovanovic et al. (2016) and Portal et al. (2017), we propose to overcome 1T1R VT-loss with a groundbreaking solution. Figure 12a, b remind the operation of a 1T1R bitcell during a set and a reset operation. During a reset operation, the n-type transistor Vgs switches to the other side of the transistor, forcing the internal node of the 1T1R bitcell to not overcome Vreset-VT (where VT is the n-type VT voltage). This effect forces an increase of the 1T1R programming voltages and thus of the overall reset operation energetic cost increase. It also causes a reduction of the transistor reliability due to additional voltage stress. One solution (Portal et al. 2017) is presented Fig. 12c and d, it consists in using a different transistor for each operation and keep a good control of the transistor Vgs. Thereby, for both set and reset operations, the transistor Vgs is well controlled between the SourceLine (SL) and the WordLine (WL). However, adding a supplementary p-type transistor strongly increases the overall bitcell area as shown later-on. The proposed solution consists in using a transistor that can dynamically be switched between p-type and n-type. This solution is introduced Fig. 12e, f for set and reset operations. A set operation is performed by using the PC SiNWFET as a n-type transistor (i.e. by applying the set voltage (Vset) on the Polarity Gate (PG)) while a reset operation is performed by using the PC SiNWFET as a p-type transistor (i.e., by applying gnd to the PG). The created bitcell is named 1 Polarity Controllable Transistor-1 RRAM (1PCT1R).

Fig. 12
figure 12

Schematic of 1T1R, 2T1R and 1PCT1R bitcells during set and reset operations. a and b shows a 1T1R, c and d a 2T1R and e and f a 1PCT1R bitcell. Vgs is highlighted for each structure

While it as briefly been introduced in Shamsi et al. (2015), co-integrating PC SiNWFET and RRAM technologies may be interesting from a security point of view. By symmetrizing the programming current Iprog during both set and reset operations, it smooths the memory power trace and thus enhance its reliability to Side Channel Attacks. In this work, we only focus on the memory array architecture, density and reliability considerations.

4.3 Bitcell Design

In this section, the physical layout of 1T1R bitcells is reported as it is proposed in Shen et al. (2012) and Chang et al. (2014) (Fig. 13a), and the physical layout of 2T1R bitcells is introduced based on the design rules of 28nm process technologies (Fig. 13b). As the drains of the p-type and n-type transistors cannot be easily merged, they are connected through a metal 1 interconnect (hatched purple) and the RRAM device is integrated in a via (yellow cross) between the metal 1 and the metal 2 (yellow). The SourceLines (SL) are integrated in metal 1 in parallel with the WordLines controlling the p-type (WLp) and the n-type (WLn) transistors. P-type transistor width is sized to ensure an identical drive compared to a minimum size n-type transistor, thereby its sizing is 160 nm instead of 80 nm. As the largest transistor defines the bitcell height, for layout density and uniformity considerations, the n-type transistor width is also increased to 160nm (as a consequence, a lower WL bias must be considered to control a lower set programming current). Considering the minimum spacing between the active regions in a same well (for similar type transistors) and in two different wells (for n-to-p type transistors spacing), the minimum 2T1R bitcell area is determined. This area can be estimated at 420 nm * 240 nm (0.1008 \(\upmu \text {m}^{2}\) –40.3 F2) for a 2T1R bitcell versus a 160 nm * 194 nm (0.031 \(\upmu \text {m}^{2}\) –12.4 F2) for a 1T1R bitcell.

Fig. 13
figure 13

Physical Layout considerations for a 1T1R and b 2T1R bitcells. In the 2T1R, the n-type MOS could be considered minimum size to respect the 2x ratio between n-type and p-type current drive on CMOS technologies

As a reminder, the area here is expressed in F2. In high-density memories, the metal half pitch F is assumed to be the half pitch of the first metal level. In 28nm technology (which is the reference here), F \(=\) 50 nm. In this section, two PC-based bitcells are presented. The 1PCT1R bitcell is based on a standard 1T1R structure, whereas the 1XPCT1R exhibits an innovative cross-shape structure. For each bitcell, physical structure and operating conditions are compared to 1T1R and 2T1R MOS-based bitcells. For the PC-based bitcells, as there is an additional terminal in the transistor (the Polarity Gate), a control line is added in the memory array to bias the Polarity Gates: the Polar Line (PL). The design environment (models and physical design assumptions) is first introduced followed by bitcells, physical design and functional validation descriptions. It is worth noting that some interesting structures could be guessed where the BLs or the SLs are merged with the PLs. However, during the bias of the memory, some transient effects could happen and cause unexpected programming operations in the array. Consequently, the focus is put on standard memory architecture to study scaling properties at the cost of an additional control line. Additionally, memory architectures embedding an additional control line (such as the PL) are standard in the eFlash memories (Do et al. 2016).

4.3.1 Physical Considerations and Simulation Environment

In order to compare it with advanced CMOS, a 22nm gate length SiNWFET transistor is considered with the design rules of a 28nm process node to determine the minimum spacing between the transistors and the drain/source area and the BEoL minimal spacing (F \(=\) 50 nm). In Mohammadi et al. (2015), Zhang et al. (2014b), Zografos et al. (2014), a 22nm gate length SiNWFET device operation was demonstrated thanks to physical simulation (TCAD) tool. A basic SiNWFET electrical model was described in Zhang et al. (2013, 2014b). This model is based on a parametric table extracted from TCAD simulations whose basic parameters are fitted on measured device characteristics, introduced previously (Marchi et al. 2014). Access resistances are estimated according to the device geometry. Each capacitance is extracted from TCAD simulations as an average value under all possible bias conditions. Moreover, the OxRAM connected in series with the SiNWFET is simulated with the OxRAM model proposed in Bocquet et al. (2014b). Note that the forming step is not presented in the simulation below.

4.3.2 Standard 1PCT1R Bitcell

Figure 14a shows the physical layout of a PC SiNWFET transistor. In the schematics, the transistor is represented as an active line between two drain/source contacts (reference to Fig. 14a, b). The Control Gate (CG) and the Polar Gate (PG) are represented as polysilicon gates even though in reality, these gates are enveloping the stacked nanowires and do not require additional spacing with the neighboring polysilicon wires. The two PGs are connected together on one side of the transistor with a poly wire while the CG contact are placed on the other side to optimize the layout, as is it done in Bobba et al. (2012). Figure 14b shows the organization in memory arrays of PC SiNWFET and OxRAM devices. The OxRAM bitcell is integrated on a via between the metal 1 BL and the PC SiNWFET drain. The sources of two PC SiNWFET are connected together to the SL to enable higher density bitcell organization. The BL and the PL are connected vertically while the SL and WL are horizontal. For the area estimation, 22nm SiNWFET transistors (Mohammadi et al. 2015) and a 28nm FinFET CMOS design rules are considered. Following this design rules, the minimum 1PCT1R bitcell area is defined. Thereby, considering the minimum active area and the minimum spacing between two active area in 28nm technology, the minimum bitcell size occupies a 262 nm * 245 nm area (0.064 \(\upmu \text {m}^{2}\) −25.67 F2). This bitcell is bigger than a standard 1T1R bitcell (12.4 F2), however, it is expected to provide the performances of the 2T1R bitcell occupying a 40.3 F2.

Fig. 14
figure 14

Layout of a 2\(\,\times \,\)2 bits 1PCT1R cell array in a 25nm SiNWFET process. Transistors drains are shared to reduce the bitcell area

During the programming operations in 1PCT1R arrays, two main constraints are mandatory: (1) the accessed bitcell has to be activated; (2) the non-accessed bitcells has to be disabled to avoid parasitic write operations. While the PC SiNWFET and OxRAM behaviors depends on the relative voltages between PL, SL, BL and WL terminals, the applied voltages can be either positive or negative. However, to simplify design and to avoid triple well isolation, only positive voltage are preferred. For set operation (as shown Fig. 12e), the PG voltage is put to high voltage to ensure n-type operation. Then, the CG voltage is put to the set gate voltage (VGset) in order to control the set current. During reset operation (as shown Fig. 12f), the PG voltage is set to ground (gnd) in order to ensure p-type operation. Then, the CG voltage is also set to gnd to provide reset operation with p-type transistor. Table 3 summarizes the bias voltages used for set and reset operations for the selected and non-selected bitcells. In the memory array operations, all the PC SiNWFET are set in the same polarity (all n-type for set and read operations and all p-type for the reset operation).

Table 3 Overview of the programming voltages and PC SiNWFET type for set, reset and read operations

Figure 15 presents the simulation waveform of a 2\(\,\times \,\)2 array of 1PCT1R bitcells as shown in Fig. 14. Each bitcell is first set to LRS then reset to HRS. In-between, each bit is read to validate the programmed resistance state. The PL, WL, SL and BL voltages are controlled as introduced Table 3. As each bitcell is programmed, the current through all the bitcells is shown and indicates that no parasitic programming operation on unselected bitcells occurs.

Fig. 15
figure 15

Waveforms of set and reset operations in a 1PCT1R bitcell 2\(\,\times \,\)2 array. WL, PL, BL voltages and OxRAM current are shown. The immunity to programming disturb in non-selected bitcells is ensured

4.3.3 Breakthrough 1XPCT1R Bitcell

Thanks to the memories array structure regularity, a higher flexibility is allowed with the design rules compared to standard logic physical design rules. In this section, with the assumption that gates can be deposited in both vertical and horizontal directions, an innovating bitcell, using PC SiNWFET transistors organized in a cross shape is proposed. The cross-shaped 1PCT1R bitcell (1XPCT1R) is validated through physical layout feasibility study and electrical simulations. Figure 16 presents the 1XPCT1R schematic diagram. Four 1PCT1R bitcells are organized in cross-shape with common transistor source. The transistors T1 and T2 (resp. T3 and T4) CGs are connected together to the WL0 (resp. WL1). T2 and T4 (resp. T1 and T3) PGs are in common and connected to the PL0 (resp. PL1). T2 and T3 OxRAMs are connected to the BL1 while T1 OxRAM is connected to BL0 and T4 OxRAM to BL2.

Fig. 16
figure 16

Schematic of a 2\(\,\times \,\)2 bits 1XPCT1R bitcell array. Each 1XPCT1R bitcell is constituted of 4 SiNWFETs with common drain

In Fig. 17, the layout array organization is shown; the 1XPCT1R is a cross-shaped bitcell. Each cross’s arm supports a PC SiNWFET (green) and an OxRAM memory (yellow squares). The minimum size replicable bloc is a 20bits blocks in a 0.828 \(\upmu \text {m}^{2}\). It leads to a 0.041 \(\upmu \text {m}^{2}\) per bit (almost half of the standard 1PCT1R area 0.064 \(\upmu \text {m}^{2}\)) for a 22nm physical rules PC SiNWFET technology node. Figure 17 presents the detailed physical layout of a 4 bits 1XPCT1R block. The common SL is drawn using a metal 3 vertical wire. Connection between the SL and the transistors common source is performed through a metal 1 wire used to shift the contact over T1 transistor. Thereby, BLs (resp. WLs) are drawn using metal 2 horizontal lines and are connected to the OxRAMs (resp. CGs). Each transistor drain supports an OxRAM.

Fig. 17
figure 17

Physical description of a 20-bits bitcell array and in the insert, a detailed layout of a 2\(\,\times \,\)2 bits 1XPCT1R bitcell array. Equivalent bit density is almost 2 times higher than for 1PCT1R bitcell

This 1XPCT1R array organization needs specific border bitcells. Some bits have to be sacrificed in the border. To make all the BLs, WLs, SLs and PLs accessible, the border cross are cut and some bits are not connected as presented Fig. 18. The uncompleted cross containing no common SL are sacrificed. It represents one bit among six for the firsts and lasts BL and SL. To ease the addressing, first and last BL and SL can be not addressed. Knowing that memory arrays are classically surrounded by dummy cells ring, sacrificed BL and SL can be considered as part of the dummy ring and do not impact the memory area.

Fig. 18
figure 18

Array of 1XPCT1R bitcells with detailed WL, BL, SL and PL. Border bitcells are detailed: unconnected OxRAMs are highlighted in red and uncomplete cross are cut

As before, the programming operations are considered as relative voltages differences and can be operated relatively to the gnd. Set operation is done by considering all the PC SiNWFET in n-type. First, all the SL, BL, WL are put at gnd. Then, the selected WL is biased to VGset and the writing pulse is applied on the selected BL. During reset operation, all the PC SiNWFET are put in p-type (PL voltage at gnd) and the array WLs, BLs and SLs are polarized at the reset voltage (Vreset). Finally, the selected WL is pulled down to gnd and the writing pulse (from Vreset to gnd) is applied on the selected BL. Due to it non-standard array organization, several cases are possible: two bicells with common WL, common PL, common WL and SL, common BL and SL, common PL and SL, and common BL. When a non-selected bitcell shares common lines with selected bitcell, immunity to write disturb has to be demonstrated. Shared WL, SL and BL are standard non-selected bitcells cases (as shown in Fig. 17). Shared PL is not critical for 1XPCT1R because all the PLs have the same polarization during programing or read operations. Figure 19 presents the disturb immunity for common WL and SL and for common BL and SL. During both reset and set operations, the write disturb is avoided by the WL, SL, BL and PL voltages.

Fig. 19
figure 19

Waveforms of set and reset operations for 1XPCT1R bitcell array. For each operation on, immunity on unselected bitcells that share WL, SL or BL with selected one is ensured

4.4 Performances Metrics

In this section, we explore the performances offered by various RRAM-based architectures using PCT transistors as selectors. Thereby, we compare PCT-based RRAM bitcells with CMOS-based 1T1R and 2T1R bitcells for Set, Reset and Read operations. Overall in this section we show that (i) during Set operation, PCT-based bitcells offer equivalent performances to CMOS-based bitcells. (ii) During Reset operations, PCT-based bitcell are a solution to the overdrive issue identified in CMOS-based bitcells while only slightly increasing the area overhead. Finally (iii), we show that some PCT-based bitcells can enable better read performances than CMOS-based bitcells thanks to a lower contact density along the memory lines.

In order to compare CMOS and PCT-based bitcells, we follow the considerations described in Sect. 4.3.1 and rely on a 22nm SiNWFET low-power technology (Zhang et al. 2013, 2014b). To ensure a fair apple-to-apple comparison with CMOS, we consider a low-power CMOS technology: the 28nm FDSOI technology PDK from STMicroelectronics. Finally, RRAM technology is simulated considering a filamentary-based model from Bocquet et al. (2014b) calibrated on characterization data (Vianello et al. 2014). Further information about the simulation framework are available in Levisse et al. (2019).

4.4.1 Performances in Set Operation

While the first claim of PCT-based bitcells is to achieve groundbreaking performance improvement during Reset, thank to the good gate-source voltage (Vgs) control, it is important to show that using such selectors does not degrade the performances in Set. In that sense, we simulated set operations using PCT and CMOS selectors during set operation. Figure 20 shows the set time versus the BL-SL voltage difference for PCT and CMOS-based bitcells. As both technologies do not behave exactly the same, we tuned the gate voltage to ensure that the programming current is the same for both bitcells. Thereby, in order to target a \(60\,\upmu \text {A}\) programming current, we setup 0.6 V for a 6-nanowire PCT SiNWFET transistor and 0.9 V for a 80 nm wide MOS transistor. With this flow, we ensure that both PCT and CMOS bitcells have the same set performances.

Fig. 20
figure 20

Set time versus BL-SL voltage difference for minimum size 1T1R MOS bitcell (red) and PCT-based bitcell (blue)

4.4.2 Performances in Reset Operation

In this section, we explore the performance gains enabled during reset operation by the proposed PCT-based bitcells. We compare them with CMOS-based 1T1R and 2T1R bitcells. The considered 1T1R bitcell is based on a minimum size transistor (W \(=\) 80 nm) following the design proposed in Shen et al. (2012). For the 2T1R, we consider three different configurations: (i) minimum size p-type (80 nm), (ii) medium size p-type (120 nm width) and (iii) double size 160nm p-type. In order to ensure layout regularity, we size the n-type accordingly to the p-type. Then, during the set operation, as described Sect. 4.4.1, we underdrive the n-type transistor gate to keep a \(60\,\upmu \text {A}\) programmind current (Iprog). Figure 13 shows the bitcell layout for 2T1R (configuration (iii)) and for the 1T1R bitcell. From an area perspective, a wider p-type leads to a bigger bitcell: configuration (i) enables a \(30.3\,\text {F}^2\) bitcell (\(0.0756\,\upmu \text {m}^2\)) while configuration (ii) a \(33.6\,\text {F}^2\) area. Finally, configuration (iii) is the biggest 2T1R bitcell with a \(40.3\,\text {F}^2\) bitcell area.

We perform reset operations with various BL, SL and WL voltages, for all the 1T1R, 2T1R and 1PCT1R bitcells. Then, we define the reset time as the time required for the RRAM resistance value to achieve a HRS/LRS ratio of 10. Figure 21 shows the reset time versus the SL-BL voltage for all the considered bitcells. As expected, the 1T1R bitcell requires a huge gate overdrive to perform sub-100 \(\upmu \)s reset time. As a reference, 1T1R bitcells demonstrated in the literature require more from 3 to 5 V to enable sub-100 ns reset operations (Grossi et al. 2018; Chen et al. 2012; Yi et al. 2011). In red, the 1T1R bitcell with its gate overdriven from 1.7 V up to 2 V shows poor reset performances while causing high voltage stress on the transistor. Alternatively, 2T1R bitcells (in blue) show stronger reset performances and transistor reliability at the price of bigger bitcells (more than \(30.3\,\text {F}^2\)). Finally, the proposed PCT-based bitcells are represented in green. When performing a set operation in n-type configuration and the reset in p-type configuration with a standard PCT bitcell, performances are equivalent to 2T1R bitcells. At the same time, it enables area reduction from 1.35\(\times \) (\(25\,\text {F}^2\) vs \(33.6\,\text {F}^2\)) with a 1PCT1R vs minimum size 2T1R, up to 2.6\(\times \) (\(16\,\text {F}^2\) vs. \(40.3\,\text {F}^2\)) with a 1XPCT1R vs double size 2T1R. It is interesting to note that PCT-based bitcell (1.2 V Vgs) show the same performances than strongly overdriven (2V Vgs) 1T1R bitcell for a 1.85 V SL-BL voltage. On the other hand, it shows better scalability at higher SL-BL voltage difference as the Vgs is independent from the SL-BL voltage. This effect leads to 105x reset time reduction for 2.2 V SL-BL versus 2 V overdriven 1T1R CMOS bitcell. Compared to 2T1R CMOS bitcells, while it shows equivalent performances than medium size p-type (\(33.6\,\text {F}^2\)) at SL-BL 1.6 V, it is outperformed by double size (\(40.3\,\text {F}^2\)) 2T1R bitcell as the p-type serial resistance becomes lower. It can be noted here that PCT-based bitcell performances during reset can be improved by increasing the amount of stacked nanowires. Ultimately, PCT-based bitcell considering n-type set and p-type reset enable up to 75x performance improvement at 2.2 SL-BL voltage difference versus CMOS-based 2T1R. At a lower operating voltage (1.8 V SL-BL), PCT-based bitcell outperform CMOS-based 2T1R by 5x. Alternatively, considering a p-type based set and a n-type based reset can enable strong performance improvement (up to 500x at 1.8 V SL-BL) but it would require to fabricate the RRAM stack in reserve fashion (top electrode first) and may imply different behavior. It can be noted that equivalent behavior could be observed with 2T1R (Portal et al. 2017) however, it would not improve much area considerations.

Figure 22 shows the energy consumed during a reset operation performed at 1.8 V SL-BL voltage for all the bitcells under study. As expected from Fig. 21, the programming energy of PCT-based bitcells is intermediate between intermediate size CMOS-based 2T1R and double size 2T1R bitcells (from \(33.6\,\text {F}^2\) to \(40.3\,\text {F}^2\)). Equivalent reset energies (1 to 10 nJ) cannot be achieved with minimum size CMOS 1T1R bitcell without a strong Vgs overdrive (more than 2 V). It is important to note that we do not consider here the energy necessary to the generation of such voltages. While it could seem fair to consider highest density bitcells and overdrive them, it induces a stress on the array and periphery gate oxide, diffusions etc. (bitcells sharing the WL are going to be stressed as well) and may lead to early memory failure (Federspiel et al. 2012).

Fig. 21
figure 21

Reset time versus SL-BL voltage difference for CMOS 1T1R bitcell (red) and various gate overdrive voltages (from 1.7 to 2 V), CMOS 2T1R bitcells (blue) with various p-type transistor size (from 80 nm up to 160 nm width) and PCT-based bitcells (green) for n-type and p-type reset

Fig. 22
figure 22

Energy consumed during reset operations for various bitcell architectures versus the required programming voltage. 1T1R bitcell requires an increase of the programming voltage (red) inducing a reduction of the MOS transistor reliability. PC SiNWFET-bitcells (green) enable 2T1R (green) operation voltages without overdrive while using a single PC SiNWFET transistor per OxRAM bitcell

4.4.3 Performances in Read Operation

This subsection explores the impact of the proposed bitcells during read operation. We compare the proposed PCT bitcells with regular CMOS-based 1T1R and 2T1R bitcells. While most of the results proposed in this section are pretty expected (i.e., bigger bitcells means lower read frequency), we show that the 1XPCT1R bitcell shows a shorter access time than smaller bitcells thanks to its unconventional array organization. By considering more metal lines per array, as shown Table 4, it reduces the parasitic capacitance on the SL, BL, PL and WL, enabling faster charge and discharge of the line, and thereby faster read operation. This reduction ratio is 4/5th of the contact density per line compared to regular bitcells. While the read energy is slightly increased, performances for a 65kbit array are improved by 12% compared to a 32% bigger 1T1R-based array.

Table 4 Number of BLs for standard arrays (1T1R, 1PCT1R) and for 1XPCT1R array

We simulated both CMOS and PCT-based arrays during read operations for various memory sizes. We considered a BL precharge and discharge to perform a read and assumed that a sense amplifier is able to achieve a read operation out of the read discharge. Following Sect. 4.4.1, we consider a \(60\,\upmu \text {A}\) programming current (i.e., leading to a 20 k\(\Omega \) LRS state). We assume that a read operation is triggered by the WL activation after the precharge. Overall, we consider in the read time (i) the WL charge and discharge time and (ii) the time for the BL to discharge through the accessed bitcell. To do so, we consider the extracted parasitics of a 28nm CMOS technology node Back-End-of-Line.

Figure 23, left hand side, presents the read time versus the BL length in a memory array while the SL is 512 bitcells long. Figure 23, right hand side, presents the ratio the read time of the different bitcells normalized against the 1XPCT1R bitcell-based array. We show 67% and 15% of gain for the 1XPCT1R compared to CMOS-based 2T1R and 1T1R bitcells respectively for a 512 long BL. Concerning standard 1PCT1R bitcells, performances gains versus 2T1R are 30% while it shows a 8.6% performances degradation versus 1T1R due to its lower density. Finally, we demonstrate 17% and 27% performance improvement for the 1XPCT1R compared to 1T1R and 1PCT1R respectively. In details, while the gate capacitance of the PCT is higher than its CMOS counterpart, for the same memory size, when compared to the BL discharge, it only represent 3% (respectively 1%) for a 256\(\times \)256 PCT (respectively 1T1R) array and 6% (respectively 2%) for a 512 \(\times \) 512 array. When considering non-square arrays, if the PL or SL are long enough, they may become longer to charge than the time it takes for the BL to discharge. In that case, the SL length limits the performances of the 1XPCT1R array. As the PL is connected to two polarity gates (PG) while the SL is connected to 1 single transistor drain, PL parasitic capacitance is higher. In this context, PL charging time limits the read speed when a read is performed right after a reset operation (as shown Fig. 19). As the WL and the BL are in the same direction in 1XPCT1R bitcell arrays, a longer WL charge also correspond to a longer BL discharge, mitigating the impact of the WL charge over the read performances and keeping its effect low as introduced for squared arrays. Figure 24 shows the read time ratio between CMOS 1T1R and 1XPCT1R bitcells array for various array sizes (BLs and SLs). The green area correspond to array sizes for which 1XPCT1R arrays have better performances than CMOS-based 1T1R. On the other hand, the red area corresponds to array cases for which 1T1R is more profitable. For all the reasons aforementionned, 1T1R array are more profitable for wide (more than 300 BLs) and thin (less than 100 SLs).

Fig. 23
figure 23

a Read time versus BL length for a 512 Bitcells long SL for CMOS and PCT bitcells. b Normalized read time versus 1XPCT1R bitcell array

Fig. 24
figure 24

1XPCT1R over 1T1R read time ratio versus array size (BLs and SLs). Except for extremely wide array, 1PCT1R arrays are more profitable than CMOS-based ones

In this section, the energy consumption difference between the previously studied architectures with emphasis on the selector voltage stress and the bitcell area is explored. For 1T1R, 2T1R and 1PCT1R, using the performances reported in Vianello et al. (2014), the reset energy versus the reset voltage for a 100 \(\upmu \)A set current (LRS \(=\) 10 k\(\Omega \)) is extracted and plotted Fig. 22. For each reset pulse, the current is integrated from the beginning of the reset operation until the HRS value reaches 100 k\(\Omega \). This value is then multiplied with the voltage difference between the SL and the BL in order to extract the reset energy. In 1T1R configuration (red curves), due to the VT loss through the n-type transistor, a higher voltage is needed to ensure a successful operation. 1T1R bitcells require higher than 1.8 V on at least one of their terminals to perform energy-efficient programming operations (dotted red line represents the programming energy when only the WL voltage is increased and the BL is kept at 1.2 V, solid red line the energy when both SL and WL voltages are increased). On the other hand, reset operation with a p-type transistor enables energy-efficient programming operation without the management of high voltages and without overstress on the memory array selection transistors (green curve for PC SiNWFET based bitcells and blue curve for 2T1R).

Table 5 summarizes the area, programming time, read time and programming voltages considered for the bitcells considered in this work: CMOS-based 1T1R and 2T1R, as well as PCT-based 1PCT1R and 1XPCT1R.

Table 5 Summary of the CMOS and PC SiNWFET-based bitcells area and required voltages

4.5 Conclusions

In this section, the opportunities opened by the use of PC SiNWFET transistors for designing OxRAM memory arrays are explored. Two innovative bitcells (1PCT1R and 1XPCT1R) using PCT SiNWFET are presented and validated through electrical simulations. Then, the area and energy consumption of these bitcells during read and write operations are compared with 1T1R and 2T1R CMOS-based bitcells. Thanks to its dynamic p-type/n-type switch features, PCT-based bitcells enable low reset voltage as the 2T1R bitcell while providing a more compact bitcell (0.041 \(\upmu \text {m}^{2}\) for the 1XPCT1R, 0.064 \(\upmu \text {m}^{2}\) for the 1PCT1R) versus 0.1008 \(\upmu \text {m}^{2}\) for the 2T1R (1.6x area reduction for the 1PCT1R and 2.45x for 1XPCT1R bitcell). On the other hand, as 1T1R bitcells require higher operating voltages (gate overdrive) to enable equivalent programming time and energy, 1T1R bitcells may suffer of degraded lifetime. Finally, we demonstrate that thanks to a breakthought array organization, 1XPCT1R bitcell provides a lower contact density along the memory lines and enables better read performances than denser bitcells (17% versus CMOS-based 1T1R and 67% versus CMOS-based 2T1R). This section was based on the works presented in Levisse et al. (2019). While it is outside of the scope of this chapter, recent works demonstrated that PCT-based technologies are a great enabler for machine learning acceleration by enabling native transpose access and dense peripheral circuitry (Levisse et al. 2019).

5 Discussion and Perspectives

This chapter explored the opportunities opened by FED, such as PCT SiNWFET technology, from a memory perspective. First, True Single-Phase Clock (TSPC) flop-flops using PCT SiNWFET were explored. Then, co-integration with emerging filamentary resistive non-volatile memories (RRAM) is explored. The contributions of this chapter are summarized below. (i) Various PCT SiNWFET-based flip-flops topologies were proposed, simulated and compared to standard CMOS-based TSPC flip-flops. Significant gains in area, delay and leakage power up to 20%, 30% and 7% respectively compared to CMOS design are demonstrated. On the embedded AND gate TSPC topology, PCT SiNWFET brings gains in area, delay and leakage up to 21%, 6% and 45% respectively. (ii) Dynamic polarity control allowed by PCT SiNWFET is used to improve the programming operations in filamentary RRAM memories. Two innovative PCT SiNWFET-based bitcells architectures enabling 1.35x and 2.6x area reduction compared to CMOS-based 2T1R architectures without performance degradation are proposed. Even though PCT SiNWFET bitcells are more area consuming than CMOS-based 1T1R bitcell, they do not require gate overdrive during reset operation, which highly reduces the reliability issues on the selection transistor. For equivalent performances PC SiNWFET bitcells enable 1.85V SL-BL voltage reset operation at Vgs \(=\) 1.2 V versus 2V for the CMOS-based 1T1R. It also enable up to 75x and 105x reset time reduction versus 2V overdriven 1T1R and middle size 2T1R respectively. Finally, thanks to an unconventional organization, the 1XPCT1R shows better read performances than both 1T1R bitcells (17% vs. CMOS-based 1T1R and 67% versus CMOS-based 2T1R). This work appears as a cornerstone for the design of SiNWFET-based memories that are themselves required to design more specific memory blocks such as SRAM macro memories. Additionally, this chapter opens the way for NVM integration with PCT technologies and by exploring some basic PCT-based 1T1R bitcells, it opens the way for more works on various points such as the peripheral circuitry design exploration using PCT properties.