Keywords

1 Introduction

One of the most significant aspects of an event is its time of occurrence, and the management of the temporal properties associated with the events is thus one of the cutting-edge issues in electronic processing.

The research focuses on the high-performance processing of temporal information, starting from its measurement and moving on to management and, most crucially, transmission.

The Time-to-Digital Converter (TDC), a device that makes it possible to measure events’ times in digital form, is nowadays the primary method for doing that measure. The TDC detects events and provides a digital code indicating when events occur. Because of the rising throughput (50–100 million measurements per second are becoming more frequent) and the rising number of parallel channels, the most recent generation of TDCs must deal with several operations beyond the measurement of the time an event occurs that translate into real architectural challenges [4].

System performance is greatly influenced by data routing, which is undoubtedly a foundational component of systems. To get beyond the restrictions of traditional timestamp management, a novel approach has been conceptualized and created in order to perform parallel-to-serial conversion while keeping the chronological order of the output measurements. The new approach is called BeltBus, and it gathers timestamps from the channels and sends them to the serial output like cargo on a conveyor belt. This technique is far from simple, even merely because a TDC component of the architecture is not synchronized with the system clock, making the transition between clock domains crucial [2]. The immediate result is vastly more straightforward processing in the phases after the measurement (for example, the histogram), avoiding the usage of multiplexers that necessarily become slower and slower as the number of channels increases and eliminating the single bus. The number of channels needed by an ever-increasing number of applications has reached a point where it is impossible to concentrate the acquisition and processing of the information transmitted by each of them in a single integrated circuit, even in an Application Specific Integrated Circuit (ASIC). The essential synchronization issue arises when the channels are divided across many processing circuits, which is made worse because time measurements are always differential measurements. Finding a solution to the synchronization issue in distributed systems has now taken on fundamental significance, in part since the jitter of the signals precludes the possibility of disseminating the same timing signal to every component of the system. Three distinct strategies were developed to correct the synchronization error, which was also looked at from the implementation perspective.

Finally, a TDC instrument that is entirely reconfigurable and based on an FPGA chip has been developed. The instrument underwent testing and validation with outside academic and industrial institutions, with excellent outcomes.

2 New TDC at the State-Of-Art in FPGA

To suit the requirements of a wide range of applications, numerous TDCs based on FPGA devices are available in various feature and performance combinations.

The proposed TDC [2] maximizes system performance, including parallelizing various tapped delay lines with high performance for multichannel purposes, an interpolation mechanism, and a sub-interpolation to improve resolution significantly below the tapped line bin propagation delay, and a calibration process to maintain linearity.

In doing so, a new fully configurable 16 channels TDC architecture with a dynamic range above 10 s, resolution around 300 fs, and single-shot precision just above 10 ps has been created. Moreover, high linearity to 250 fs for the differential and 2.5 ps has been tested for the 16 independent channels without performance degradation [2].

The design was challenging. First, the regular usage of the FPGA device does not cover the propagation delays across the buffers of the delay lines fully equal. The buffer delays are consequently too dissimilar to the processing requirements. Furthermore, as the signal moves from one zone to the next, the FPGA partition in clock areas results in diversity in the delays. As seen in Fig. 1, the delay difference affects both resolution and linearity.

Fig. 1
figure 1

Histogram of the propagation delays of the buffers making up the implemented tapped delay line. The diagram shows how the tapped delay line can span different clock areas, causing additional delays mainly due to crossing several clock regions. The most significant value of delay is named “ultra–bin”

For instance, sub-interpolation and calibration, which are necessary to address this issue, have received much attention. Sub-interpolation is used to correct the resolution, and calibration compensates for the linearity. This ground-breaking architecture was created to have a high dynamic range, high resolution, high accuracy, high linearity, and multichannel operability; all the information can be found in [2].

Special attention has been spent on the feasibility of migrating the new TDC firmware among different FPGA devices. Individual users are frequently unwilling to give up their outdated devices, especially for interventions that can be difficult and expensive. It is obvious how not straightforward it is to pursue the objective of making firmware transfer between various devices extremely simple. There are clues of common sense, not set criteria or design rules, for doing that. Understanding and utilizing the inherent FPGA structure is necessary to implement a TDC with such excellent performance. It is impossible to transfer the firmware rapidly due to the significant differences between devices made by various manufacturers. There will be a new work because the TDC core will change. Additionally, the challenge increases if the primitives point to global logic functions rather than specific hardware, allowing the synthesizer flexibility over what will be implemented. This is true, for instance, of Intel (Altera) FPGAs.

3 High-Performance Data-Serialization

A physical event is first detected and converted into an electrical signal by detectors and discriminators as part of the time-resolved measurement process. Then, the TDC performs digital coding of the event’s instant of occurrence to provide a timestamp. Timestamps play a crucial role in a TDC system because they must get to the elaboration modules to provide different types of information, like a histogram or a count of the acquired measures. They arrive in a parallel manner from different channels of the TDC core. In a world where speed and measurement rate are becoming critical in many applications, multichannel architectures are essential because they enable the simultaneous detection of more physical events [1]. In order to meet this need, time-resolved setups are adding more channels. While adopting a multichannel structure has many benefits, it also adds some complexity to managing the resources that are accessible. Typically, a parallel bus sends timestamps from the TDC-Core to the elaboration modules.

Additional restrictions on the incoming timestamps are also necessary if more elaboration is provided because they cannot be delivered randomly and must follow the temporal order. With a small number of channels, ordering may be straightforward, but as the number increases, ordering becomes more challenging. As a result, performance, ease of implementation, and reusability must be sacrificed due to parallel architecture.

To get beyond the restrictions of traditional timestamp management, a unique module that can perform a parallel-to-serial conversion while preserving the output measurements’ chronological sequence has been designed and created. The new solution is called BeltBus, and it gathers timestamps from the channels and sends them to a serial output (like a conveyor belt with packages, as Fig. 2 makes evident).

Fig. 2
figure 2

Implemented serialization structure in the BeltBus. The timestamps from the parallel channels propagate along with the nodes and reach the serial output. Internal resources available in each serial node are put in evidence

A series of temporally ordered measurements are produced via the parallel to serial conversion made possible by the BeltBus architecture and synchronization mechanism. The trade-off between multichannel structures and elaboration complexity can be resolved thanks to this elaboration structure. The timestamps can reach external modules via the BeltBus’ single serial output, where they can be used to perform various functions without the need for a pre-processing phase. Furthermore, a sizable amount of routing resources is conserved compared to parallel outputs.

Finally, because it enables combining the outputs of numerous TDCs, this structure is particularly suited for multi-board applications.

However, this structure does have some restrictions. First, execution frequency is reduced because serialization is limited by bus frequency. Second, throughput is highly constrained because there are no separated buses, and all the measures must flow in the same bus.

The Cocotb co-simulation, which connects Python test benches to GHDL, an open-source Linux VHDL code simulator, was used to assess the performance of the BeltBus module. Create a folder containing the test bench, MakeFile, and top module for simulation, open it in the terminal and type “make” to run the Cocotb co-simulation. As soon as the simulator starts, Cocotb quickly turns over control to GHDL from the test benches. An info message is produced on the monitor after each instance of an overflow, giving the total number of overflows. The simulation time and the number of coarse overflows that should occur can be chosen in the test.

4 Multi-TDCs Synchronization Techniques

Due to the growing complexity of the events being seen, many applications would benefit from having many measuring channels of TDCs connected to maintain a high-resolution measurement [3]. Making a network of independent time-meters, or TDCs, in which every device can communicate with one another, would be an intriguing idea. Due to the differences in the clock frequencies of various network components and the differences introduced by the FPGA-based TDC, synchronizing numerous TDCs is challenging and complex.

The critical point is that using various devices as implementation hosts is forced by the increasing number of channels that more and more applications demand. This problem makes it hard to compare and use timestamps from many TDCs together; the research has aimed to develop methods for treating timestamps from various TDCs as though the same machine produced them.

The research has introduced a new method of synchronization that makes timestamps from several TDCs appear to have been generated by the same device.

Many operating parameters are unavoidably distributed when using systems with various physical properties. The first crucial parameter is the clock period, TCLK. Despite their apparent similarity, two different boards may have clock frequency incompatibilities. Since the interpolation process yields a resolution that is proportional to the clock period TCLK [2], errors in clock frequencies immediately result in resolution errors. Gain errors are inconsistencies in the resolution LSB.

Moreover, the TDCs turning on at various periods results in distinct dispersion. An offset issue arises when different TDCs turn on at somewhat different times. In a network with two or more TDCs, each source of error must be considered to provide synchronization and enable communication.

Mismatches in the values of the various TDC clocks’ period are the root cause of the Gain Error. The gain error increases directly to the timestamp value because it results from a mismatch between the LSBs.

The Offset Error results mainly from discrepancies in the time instant when the various TDCs are turned on. Let us consider two TDCs with the same clock period that turns on at different times. Because of this circumstance, the timestamps of the two devices in question would differ in time.

In a network of TDCs, gain and offset errors regularly occur. These errors pile up analytically, creating an offset between them.

In the quantitative analysis, the global variance (2CH of the single-channel measure) includes the contributions provided by the fluctuation of the measure on the generic TDC channel, the variance due to non-ideal features of the frontend, and the intrinsic jitter of the signal. The TDC REF-measure channel’s variation and the variance of the frontend to which the REF is connected are usually added together to form the variance σ2REF.

The intrinsic jitter of the signal governs the accuracy of the synchronization algorithms and is a component influenced by the accuracy and non-idealities of the instrument utilized. In order to lower the precision loss brought on by the periodic REF signal jitter, a high order means filtering is actuated. As a result, the REF signal’s contribution to the compensation is minimized by making it insignificant compared to channel one.

A synchronization signal is generated and distributed among the various devices as part of the method designed to correct the faults discussed before and realign the timestamps. Several algorithms have been examined based on this signal.

A representation of the synchronization process based on a standard reference signal is reproduced in Fig. 3.

Fig. 3
figure 3

Several TDCs utilize a REF signal put on two FPGAs. It is important to note that each TDC has an i-th number of data generation channels and a separate channel for the REF signal

To assure synchronization, the easiest way would be to spread a high-frequency clock among all FPGAs and use it as a common clock. This approach is incredibly straightforward, but it has the drawback of degrading timing signals and making signal integrity challenging to maintain. Along with these problems, the spread of the clock signal across all FPGAs would amplify its jitter, which would be impossible to minimize. Due to these problems, a low-frequency signal that does not affect signal integrity or transfer rate was chosen for the network’s numerous components.

The only measurements that matter for multichannel applications are relative ones.

The first method can be referred to as Reference Based Gain Error-based Synchronization. The reference based gain error-based synchronization method is the most logical because it compares measurements from various TDCs. By utilizing one of the TDCs as a reference and figuring out a correction factor for all the others, the objective is to make all timestamps homogeneous. It is necessary to normalize the timestamps from the generic TDCi in relation to a calculated compensation factor using the TDC1 as a benchmark. This implies modifying LSBi regarding LSB1’s value as a starting point.

The second method can be referred to as Self Gain Error-based Synchronization. It does not use a reference device because it is based on TDC’s self-compensation. This approach aims to always refer to the TDC’s synchronization signal period.

The third method can be referred to as Offset Error-based Synchronization. Another method to synchronize the network is a continuous offset error compensation. An initial zeroing would be sufficient to correct the offset error after all the TDCs have been turned on. By measuring the first offset between the first timestamp and the REF signals, each TDC can be zeroed at a rate equal to the REF signal frequency. In addition, to offset mistakes, also gain errors exist. However, they can be ignored provided the zeroing procedure is carried out repeatedly and more quickly than the difference in the two frequencies.

Because the first two strategies employ the same compensation method in two different ways, the results are comparable since they are based on the same engine. Both enable straightforward offset error correction by doing an initial zeroing and thorough gain error correction. These compensating methods’ main drawback is that they involve multiplications that are pretty challenging to implement in FPGA designs without using dedicated hardware.

In contrast to the first two approaches, the third one improves by increasing the REF frequency, compromising the signal’s integrity. The third compensation method is far easier to execute in an FPGA design than the others because it simply calls for additions and subtractions, which are straightforward to carry out in an FPGA.

5 Felix Instrument

The research activity’s findings have undergone comprehensive validation in numerous experimental settings, verifying the anticipated performances and functional requirements. It is significant to present one of the instruments developed for the experimental activities rather than reporting every single experiment. This would demand an unnecessarily comprehensive treatment due to their number and complexity.

The Felix board is one of the instruments developed for practical applications in several settings and is an example of technology transfer of the research outputs.

The Felix board (Fig. 4) is programmable and implements the best-suited TDC architecture on FPGA. In addition to the cutting-edge measurement performance, the most crucial features are portability, ease of use, and compactness up to the plug-and-play operation.

Fig. 4
figure 4

Substrate for instrument hardware. The FPGA device can be seen in the center of the PCB

The instrument’s cost-to-performance ratio is quite good. It features two input channels (START and STOP) plus a SYNC input, for example, if a laser synchronization is required. Thanks to its configuration, it is suitable for testing detectors (e.g., CDL, SiPM, SPAD, PMT); it can perform time-correlation measurements and complete rapid characterization of measurement systems. The device was created with various firmware and software modules, including a sophisticated Graphical User Interface (GUI) and a software Application Programming Interface (API) that support the user in all conceivable application scenarios.

At acquisition rates up to 100 Msps, the single-shot precision settles below 10 ps r.m.s.. The resolution (LSB) reaches 250 fs with a short FSR. Dead time is less than 5 ns, whereas integral and differential nonlinearity (DNL, INL) is measured at less than 85 fs and 5.6 ps, respectively. The frontend circuits translate a temporal external analog signal into digital pulses using a comparator with a fully configurable threshold.

It is challenging to substantiate the assertion that the proposed design achieves the state-of-the-art in terms of both small size and high performance compared to existing alternatives. In actuality, there are no programmable devices that simultaneously maximize both of the features requested by modern uses. An application of the instrument in a configuration for coincidence measurements is shown in Fig. 5.

Fig. 5
figure 5

Photo of the device captured when it was being used to measure coincidence. The beam from a laser source is delivered in parallel to the SYNC input of the instrument that is being presented by a photodetector, as well as to the setup where the measure for verifying the coincidence between the two likely emissions resulting from excitation is carried out. The instrument calculates the timestamps of the laser source (cause) and the timestamps from two detectors that collect the sample’s emissions (effects). Here, the device decides if two detected events caused by the same laser pulse coincide. It may be possible to conduct more research, such as calculating the statistical frequency between causes and effects on a histogram. The PC’s only use is to show processing results

The instrument’s core consists of an FPGA that implements the firmware that implements the TDC architecture and allows software connecting with the outside world, such as a PC. The instrument’s connection gates include CH1, CH2, SYNC channels, and a USB connector with communication and a power supply connections. The FPGA device is a 28 nm Xilinx Artix-7 on which different processing modules are structured in the form of IP–Cores and the primary core that implements the TDC. The software is made up of plug-ins, one for each type of firmware processing. As a result, the user can entirely modify the instrument by adding their own IP–Cores and accompanying software plug-ins. A synoptic diagram of the instrument’s architecture is shown in Fig. 6, emphasizing the hardware, firmware, and software elements and the connections between them.

Fig. 6
figure 6

A schematic diagram shows the parts of the instrument’s architecture comprising hardware, firmware, and software. The instrument’s firmware and software are highlighted in yellow, through which the user easily programs to best suit his operational requirements

The instrument has been tested in the lab and in many specific application scenarios, fully legitimating its functions and features.

6 Conclusions and Future Developments

An innovative, fully configurable TDC architecture for implementation on FPGA has been designed, engineered, and experimentally validated. Features are above most of the available ASIC and FPGA comparable solutions at state of the art. Providing dynamic–range above 10 s, resolution below 300 ps, single-shot precision just above 10 ps, high linearity limited to 250 fs differential and 2.5 ps integral, and up to 16 independent channels without any performance degradation.

To face modern architectural challenges of these systems, i.e., throughput beyond tens of millions of measurements per second and number of channels in parallel vertiginously growing, two primary topics have been investigated and innovative solutions proposed.

For addressing the keystone of highly efficient data routing, an innovative technique named BeltBus has been developed, which performs parallel-to-serial conversion while maintaining the chronological order of the output measurements to overcome the limitations of standard timestamp management.

The ever-increasing number of parallel channels will make it mandatory to partition them among different processing circuits that must be synchronized. A strategy for periodic correction of the synchronicity error in distributed systems has been devised, and consequently, three different operative methods have been derived.

The evolutionary trend of the systems is not to improve the resolution, already at levels beyond the need, but to increase the number of channels. Therefore, it is natural that the research’s evolution will be to provide the techniques presented with the ability to manage hundreds of channels in parallel.