Introduction

Several books [13] are devoted to the theory and practice of signal sampling. Proper data sampling procedures allow to extract information of an event or a signal in the form of individual data. Those data can be arranged in one or more columns, which are usually named vectors or matrices. Signal sampling constitutes a basic issue in a variety of situations. Correct sampling is necessary for obtaining the sequence of data that correctly account for whatever phenomenon we want to examine. In the simplest case, a continuous trace reports the values of a given quantity that depends on a single variable. The dependent variable depends on a single independent variable, which is the case of a univariate signal.

Whatever variable is appropriate to describe the occurrence of a dynamic natural process (change in temperature at passing time or at varying location, change in volume at varying external pressure, change in distance from the ground of a falling body, etc.), the relevant values change progressively by increments that are so small to be hardly conceivable.

The most part of the natural processes, as well as most of the human-driven ones, occur continuously. This means that subsequent steps are different from each other to such a small extent not to be perceivable or directly measurable by any human senses and even instruments, in agreement with the term ‘conceivable’ used above. Mathematicians define the whole matter as follows: the infinitesimal change in the quantity y describing the event is indicated by dy and corresponds to an infinitesimal change of the independent variable (e.g., time), dt. Footnote 1 For this reason, dy and dt are the general symbols used for infinitesimal changes in y and t quantities, respectively. To account for what happens ‘at a given instant’, rather than over a finite interval, they should be preferred to Δy and Δt, that indicate finite variations. This notwithstanding, the rate at which the event occurs is a finite quantity, because the ratio between the two infinitesimal changes expresses a finite value: dy/dt. The concept of instantaneous rate, mathematically corresponding to the value of the local tangent to a function expressing the dependence of y on t, is well known to express a (finite) rate. On the contrary, the Δyt incremental ratio represents the medium rate within the Δt interval, more or less close to the instantaneous rate in the mid point of Δt. Noteworthy, the expression ‘at a given instant’ may refer to a ‘time instant’, to a ‘space instant’ or whatsoever ‘independent variable instant’, represented by t, chosen here to describe the progress of any event under observation.

The most direct way to monitor the changes occurring in the course of an event consists in ‘recording’ it by our own senses, that follow the movement of an object, the increase or decrease of loudness of a sound, the increase or decrease of the temperature (Θ) in the surrounding environment, etc., at flowing time or at changing the location, or whatsoever variable. By considering common instruments used to follow the evolution of the last exemplified event, the temperature may be measured on the basis of the ‘standardized’ change in volume of a liquid or of a gas: the continuous changes in temperature over time are transduced into the corresponding continuous changes in volume, which may be read or even recorded. In other words, two subsequent Θ temperature values differ from each other by an infinitesimal amount, dΘ, similarly to the corresponding change in volume, dV of the liquid or the gas used, and are recorded at infinitesimal time difference, dt.

Data sampling

Simple signals

Once we choose to read the temperature with a thermometer at given constant Δt intervals, every minute, every hour, or every day, we are sampling Θ on the independent variable time. It will be clear from the following how crucial the sampling operation is.

Criteria for correct sampling are mandatory. Correct sampling means picking up any significant changes of the monitored quantity: we fill in a matrix consisting of two columns (for instance, time, t, and temperature, Θ) and of as many rows as many readings are made. No meaningful change in Θ over time is lost only if sampling is correctly performed. On the other hand, incorrect sampling implies to miss information about what we are observing with respect to the event going on. For instance, we can choose to collect air samples at a crossroad, to determine the NO x content due to the emission of the cars. Let sampling occur manually every hour, the time interval being conditioned by the lack of an in situ suitable gas sensor. No possibility to shift the workers’ shifts, due to untouchable time of lunch! However, let the rush hour occur just within the interval between two subsequent sampling operations: the peak of air poisoning is lost…. Similarly, in a space-located example, a source of poisoning may be unidentified if the water sampling sites within a basin are not properly chosen.

A visual example of how a critical issue proper sampling can be illustrated by sequences of pictures. Figure 1S in Supplementary Material describes the rotation of a needle at a velocity such that two subsequent pictures are taken at constant time intervals (sampling interval or sampling period, Δt), lower than that required to cover half a round (Δt < T P/2, where T P is the rotation period, i.e., the time of a complete round). The needle rotates clockwise and looking at the sequence of pictures the perception of the direction of rotation is correct. On the other hand, if Δt is not lower than the time requested by half a round (ΔtT P/2), the rotation appears to occur counter clockwise: aliasing has occurred. In the same Fig. 1S, a point corresponding to the extremity of the needle is plotted vs. the time corresponding to the different positions: a sine (or cosine, depending on the starting point) function is depicted.Footnote 2 The difference in the independent variable, t, between two subsequent maxima, i.e., the period of the trigonometric function, coincides with that of rotation, namely the time difference between two subsequent equal positions of the needle. Correct or incorrect sampling of the needle rotation corresponds to correct or incorrect sampling in the corresponding trigonometric functions in Fig. 1S: the effect of aliasing is also shown with respect to the period of the sine and cosine functions.

In the all-day life, a practical example of incorrect sampling is found in old-fashioned western movies, where the recorded individual pictures (the individual samples) of the film are taken at constant time intervals, Δt, that are longer than that required to avoid aliasing: it may happen that the spokes in the wheels of a coach seem curiously to rotate in the opposite sense with respect to that predictable on the basis of the direction of the coach itself. An analogous effect can be observed in videos of propellers of an aircraft or of vanes of a fan.

The condition under which a sequence of pictures, taken at constant Δt sampling intervals, is suitable to represent the process properly, is expressed by the so-called Nyquist Footnote 3 criterion in terms of Δt, T P, and the relevant frequencies ν sampl and ν, respectively. Making reference to the example of the rotating needle of Fig. 1S, the well known Eq. (1) describes the relation between rotation period, T P, and frequency of rotation, ν:

$$v = \frac{1}{{T_{\rm P}}}$$
(1)

Similarly, the relation between the sampling interval, Δt, and the corresponding sampling frequency, ν sampl, is given by:

$$v_{\rm sampl} = \frac{1}{\Delta t}$$
(2)

The Nyquist criterion expresses the condition for a correct sampling that can be written in different forms, all expressing the same condition:

$$\Delta t < \frac{{T_{\rm P}}}{2}$$
(3)

i.e.:

$$\Delta t < \frac{1}{2\nu}$$
(4)

or:

$$v_{\rm sampl} > 2v$$
(5)

Summarizing and integrating: T P, in seconds (s), the period, is the time required by the sine or cosine function to assume equal values in subsequent cycles, i.e., the time between two identical positions of the needle. The sampling period Δt is expressed in s, and corresponds to the time elapsed between two subsequent samplings, i.e., in the example of a film, between two subsequent pictures. Frequencies, ν and ν sampl, are expressed in s−1 (also referred to as Hertz, Hz). ν corresponds to the number of maxima of the sine or the cosine function in 1 s; in the cited example it corresponds to the circular frequency that is the number of rotations of the needle per time units. The sampling frequency, ν sampl, corresponds here to the number of samplings, i.e., of pictures, per s.

In Fig. 1 examples are shown of sine functions with different frequencies, sampled in agreement or in disagreement with the Nyqvist criterion.

Fig. 1
figure 1

The same sampling period on simple sine signals (blue lines) possessing different frequency values. Sampling at \(\Delta t = \frac{1}{2\nu}\) (a); \(\Delta t > \frac{1}{2\nu}\) (b); \(\Delta t < \frac{1}{2\nu}\) (c). Aliasing occurs in (a) and in (b) (see red-dashed trace)

Equations (35) are valid whenever an event may be described by a single sine or cosine function, depending on the initial condition chosen. However, it very often happens that a signal is too complex to be accounted for by a single sine or a cosine function. Whenever the signal is a complex one, the correct sampling period, Δt, is given quantitative meaning by defining the frequency in the frame of the analysis of the signal, as it will be explained in the following sections. This requires the transformation of the signal from the original domain (for instance, time or space domain), into the corresponding frequency domain.

Complex signals

In a complex periodic signal more individual sines and cosines at different frequencies have to be summed to each other to account for the whole process. Nothing is lost in the relevant description, only once Δt is lower than half the shortest period, corresponding to the highest frequency of the individual signals.Footnote 4 The sine and cosine individual functions are called components that, altogether, describe the overall periodic event. Equations (3) and (4) become, respectively:

$$\Delta t < \frac{{T_{\text{Pmin}}}}{2}$$
(6)

and

$$\Delta t < \frac{1}{{2\nu_{ \text{max} }}}$$
(7)

where T Pmin and ν max are referred to the individual signal with the shortest period, corresponding to the highest frequency.

Noteworthy, the Nyquist criterion is often expressed directly with reference to a signal consisting of different components, in the form of Eq. (7). It follows that the definition of the useful lower limit for the sampling frequency, ν sampl, is:

$$\nu_{\text{sampl}} > 2 {\nu}_{ \text{max} }$$
(8)

As an example, in Fig. 2, four trigonometric functions (individual signals) are used to build up the complex signal [4].

Fig. 2
figure 2

The complex signal in the top part of the figure (Y, in red color) is given by the sum of two sine functions (s1 and s2) and two cosine functions (c1 and c2), each one with different amplitude and frequency values

Actually, in the frame of an experiment it is often requested to follow the opposite path, i.e., to identify the sine and cosine linear combination suitable to reconstruct a complex signal.

As it was mentioned at the beginning of the article, similar to what happens for many natural events, an instrumental signal may vary continuously: it may be recorded by a system suitable to register continuous changes of the measured quantity as a function of a suitable independent variable. This is an analog recorder. As an example, the changes in atmosphere composition at varying the location or in agreement with seasonal variations, which is an analog process, may be monitored by an analog recorder (see Fig. 3).

Fig. 3
figure 3

Some examples of analog recorders. Left analog recording thermometer (from https://www.weather.gov/unr/1943-01-22); right XY analog oscilloscope (from http://www.gwinstek.com/upload/website/8d7367f7d206f3d63218396ab8a9a178.jpg)

Sampling the analog signal of the recorder is necessary to obtain discrete data, i.e., numbers, suitable for computer handling: the computer is a digital device Footnote 5 that manipulates digital data. Proper sampling of the continuous signal is necessary to perform proper analog-to-digital (ADC) conversion: computers should be fed with vectors or sequences of data.Footnote 6 As an example, Fig. 4 shows how the continuous signal constituting a chromatogram is actually sampled, stored and displayed as a sequence of discrete data.

Fig. 4
figure 4

Zooming shows how a chromatographic peak, collected by an analog recorder, may be converted into a sequence of discrete data (highlighted in red). They can be displayed as such or interpolated by a proper algorithm to lead back to a continuous trace

Recalling that the original continuous signal should be reconstructed from the sequences of discrete points obtained by sampling, some practical ways to operate should be exposed. Aiming at a good reconstruction of the continuous signal by interpolation of the points constituting the sampled sequence, it is strongly advisable to follow a general rule of thumb, i.e., to pick up 10 points for every cycle of the highest frequency present in a signal. This practical rule, also referred to as ‘Real World Sampling Theorem’ [5], implies a suggested sampling frequency much higher than the minimum value fixed by the Nyquist criterion. By following this choice, a piecewise interpolation procedure [6] consisting of a series of quadratic or even linear equations will give back the original signal with quite good accuracy. On the contrary, by adopting the minimum sampling frequency fixed by the Nyquist criterion, interpolation to reconstruct the original signal is much more demanding and computationally intensive.

A further complication arises whenever a signal is an even more complex one, the highest frequency present being not always the same along the occurrence of the event. It is evident that changes of the sampling period may be suitable to avoid sampling, storing and elaborating redundant, meaningless data. Sectioning in some way different portions of the signal, looking at it through a window, may be advantageous. This operation (windowing), will be described in a further contribution, in a more general frame. An effective approach to economically handle a small number of data, as that described hereafter, may be followed in specific situations.

Any technicians, (skilful) chemistry students, and researchers, are well aware of a practical case of proper (or improper) sampling in the course of a trivial laboratory operation. When performing a pH-metric titration by hand, using a graduated glass burette, they read pH values different from each other by discrete quantities, in correspondence to subsequent additions of discrete volumes V of titrant. It is very important, to represent the titration curve accurately in the most critical part, i.e., close to the endpoint, that the larger the change in the measured pH, ΔpH, the smaller the corresponding finite volume of titrant added, ΔV. The sampling frequency necessary to correctly describe the titration curve has to increase with increasing the rate at which pH changes, expressed by ΔpH/ΔV. If we do not ‘sample the event’ properly, the description of the titration procedure is unsatisfactory: we may lose the most significant points of the relevant pH vs. V curve.

Figure 5 shows an example of a similar titration curve and of the relevant derivative, which well accounts for the occurrence of smooth and of rapid changes in correspondence to different parts of the titration curve. Different sampling frequencies are suitable to adequately describe the curve ‘locally’: the Nyqvist criterion requires different sampling frequencies for different segments of the signal.

Fig. 5
figure 5

Titration curve of 10 mL NaOH 0.1 M with HCl 0.1 M, considering different sampling periods (a); derivative of the titration curve (b)

The way how, in general, automatic titration systems work is quite interesting in this respect. If the additions are made by finite volume increments, the computer managing the titration decreases each titrant addition when pH changes rapidly, in order not to lose data in these parts of the pH vs. V curve close to the endpoint.

The case of a continuously varying signal may be represented once more by an automatic titrator in which the titrant volume addition is realised by a continuous flow. A procedure in which pH is sampled at discrete times may be adopted: the titrant flow is slowed down when ΔpH increases. The two-column matrix of pH and corresponding V values is efficiently managed by the computer. The procedure is quite different from what usually happens in nature: we can not slow down the occurrence of a natural event so easily, as we slow down the titrant flux!

It is often of basic importance to know the rate at which the measured quantity changes, possibly ‘before’ the changes occur. In this case, we must have, in a way or another, a good idea about the ‘shape’ of the signal: it is not trivial to plan an ‘online’ effective way to proceed, unless we have access to a signal that is representative of what we are going to sample. Otherwise, the event has to proceed smoothly enough not to present sudden, unexpected changes. The trend of the potentiometric pH vs. V titration is once more helpful to illustrate what happens in this respect. The curve sends us ‘messages’ about its mind to change more rapidly. It does not exhibit discontinuities, so that the extent of the subsequent additions and of the parallel readings may be modulated ‘online’: the higher the ΔpH/ΔV change at the n-th titrant addition, the lower will be the ΔV at the (n + 1)th addition.

More about sampling: more about discretization or Analog-to-Digital Conversion (ADC)

An instrumental signal is often represented by a more or less complex function of one (or even more) independent variable(s). The sequence of sampled data sent to the computer should bear the whole information content of the continuous trace; in principle, proper sampling implies that a suitable interpolation of the sampled points is capable to fit the continuous signal.

It is obvious that whenever no indication is possible with respect to a variation of the sampling frequency in correspondence to different portions of the signal (as it is the case of the automatic titrant), it is necessary to fix a constant suitable sampling frequency that obeys the Nyquist criterion over the whole signal.

To account for how the sampling operation on a continuous (univariate) signal is usually dealt with in dedicated textbooks, the definition of the delta function, also called impulse, or Dirac delta function, δ, is necessary [1, 2]. It is defined as an infinitely high, infinitely thin spike, subtending a unitary area: it constitutes the limit of the sequence of zero-centred normal distribution as the standard deviation tends to 0. These properties make the δ function suitable to pick up the value of a continuous function, f(t), at a given t value.Footnote 7 An array of δ functions shifted from 0 to be spaced from each other by Δt units, extending from −∞ to +∞, describes the so-called comb function: a sampling function extended over the whole possible range of the signal (see Fig. 6).

Fig. 6
figure 6

The comb function, defined in Eq. (9) for Δt = 1

Sampling of a continuous signal by adopting Δt sampling period may be then represented by the scheme in Fig. 7, which is expressed by the following notation:

Fig. 7
figure 7

Sampling of a continuous signal by adopting a Δt sampling period equal to 1

$$x(k) = f(t){\text{comb}}_{\Delta t} (t) = \sum\limits_{ k=- \infty }^{ + \infty } {[f(t)\delta (t - k\Delta t)]}$$
(9)

In conclusion, the operation leading to a sequence x(k) is performed by sampling the continuous signal, represented here by f(t), through a train of equally spaced unit impulses, as expressed by the comb function in Fig. 6, with Δt sampling period (Fig. 7), computed in agreement with Nyquist criterion. As a result, the continuous signal f(t) is converted into a sequence x with index k, x(k), which corresponds to the discretization of f(t), i.e., to the conversion of the analog signal f(t) into a digital (discrete) signal x(k). Figure 8 a reports a continuous signal, which is correctly sampled in Fig. 8 b and undersampled in Fig. 8c, leading to the clearly misleading sequence in Fig. 8d. The already discussed aliasing phenomenon has occurred in this case.

Fig. 8
figure 8

Continuous signal (a); correct sampling (b); undersampling (c); sequence of undersampled data (d)

On the other hand, the inverse path is possible by the interpolation operation, which performs Digital-to-Analog Conversion. Scheme 1 sketches the operations that have been described in the foregoing.

Scheme 1
scheme 1

Analog-to-digital conversion (ADC) and digital-to-analog conversion (DAC)

Transform techniques

Generally, in mathematics, a transform means a mathematical operation to be applied on a function. We are familiar with some transform procedures, such as the logarithmic transform: some calculations are made easier by transforming the variables into their logarithms. For example, multiplications and divisions are substituted by sums and subtractions, respectively. Furthermore, in some cases logarithms allow more compact and effective representation of data over a wide range, as it happens in the case of the acidity of a solution, expressed by pH, which is the logarithm of the reciprocal of the hydrogen ion activity.

When dealing with a signal, either a continuous signal described by a function f(t) or a discrete signal represented by a sequence x(k), the meaning of transform generally assumes quite a different meaning. Essentially, calculating the transform of a signal leads to an alternative representation of the information contained in the signal itself, using an alternative domain. To give the reader an intuitive explanation of this obscure definition, it could be useful to recall Fig. 2: the red signal (Y) represents the values of a dependent variable measured as a function of an independent variable, e.g., time. In other words, Y represents the measured values in the time domain. However, Y can be also represented as the sum of the four blue signals (s1, s2, c1 and c2), that correspond to two sine and two cosine functions with different frequencies and amplitudes. Therefore, an alternative representation of the same signal Y could consist in reporting the amplitudes of sines and cosines as a function of their frequencies. This is a possible example of transform of the signal, from the original domain (time) to an alternative domain (frequency). In this case, the sine and cosine functions are used as building blocks to fit the original signal: each sine and cosine function with a given frequency contributes to the original signal with a weight equal to its amplitude. In other words, sines and cosines with different frequencies and with unitary amplitude constitute an orthogonal set of functions Footnote 8 that can be used as a basis set: a set of basis functions constituting the building blocks to reconstruct the original signal. Each one of these functions can be properly weighted by multiplying it by a coefficient (the amplitude), and the sum of all the weighted basis functions gives the original signal.

In the real world, however, generally it is not possible to represent exactly the original signal through the combination of a finite number of functions, conversely to what is reported in the example of Fig. 2. Therefore, a linear combination of the basis functions will be calculated, in a way that the resulting g(t) function approximates, as close as possible the f(t) function representing the original signal. While doing this, g(t) will be calculated in a way to keep as low as possible the number of basis functions, and to achieve at the same time best approximation of f(t).

Let u′(h) be the vector bearing the proper coefficients of the basis set, i.e., the weights of each basis function s h (t). If only the first N coefficients are different from zero or if the first N coefficients are sufficient to obtain a satisfactory representation of the function f(t), then g(t) will be defined as the linear combination given by:

$$g\left( t \right) = \sum\limits_{h = 0}^{N} {u^{\prime}(h)s^{\prime}_{h} (t)}$$
(10)

Similarly, when passing from the continuous function f(t) to the sequence x(k), the sequence (k) approximating x(k) is defined according to:

$$\hat{x}\left(k \right) = \mathop \sum \limits_{h = 0}^{N - 1} u\left(h \right)s_{h} (k)$$
(11)

where u(h) is the coefficient vector and the s h (k) are the sequences that constitute the basis set.

Once the basis set is chosen, the coefficients of the different elements are found by the so-called transform techniques [7, 8]

The Fourier analysis

Jean-Baptiste Joseph Fourier was born in Auxerre, France, in 1768 and died in Paris, in 1830. He was a mathematician and physicist, especially known for elaborating the mathematic tools of Fourier series and, in general, of Fourier analysis. He also developed relevant applications to problems of heat transfer and vibrations.

Functions

The inverse property exhibited by the logarithmic transform is also a property of the Fourier Transform (FT) [5, 9, 10]. At first, let’s consider FT applied to a signal represented by a function: inverse Fourier Transform (FT−1) reconstructs back the original signal (see Fig. 9). Noteworthy, perfect reconstruction requires that the frequencies of the trigonometric functions are all multiples of a single fundamental frequency.

Fig. 9
figure 9

Fourier Transform (FT) of a signal and inverse Fourier Transform (FT−1) of the corresponding spectrum

It has been mentioned above that the transform of a complex signal consists of the calculation of individual signals whose sum gives back the complex signal itself or its approximation, g(t), according to Eq. (10). One possible approach is to choose, as individual signals, sine and cosine functions: sine and cosine constitute the basis set of FT. As defined in footnote h and sketched in Fig. 10, they constitute an orthogonal basis set. The sum of a proper number of these trigonometric functions with suitable angular frequencies and amplitudes reconstructs the complex signal, avoiding overlapping information, thanks to the reciprocal orthogonality. It does not seem redundant to once more underline that orthogonality constitutes a fundamental request for best effectiveness of the representation: the information content of each one of the two functions of a couple with the same argument does not bear anything redundant. Sines and cosines constitute the most widely used set of elementary functions. An operator, specifically the Fourier Transform algorithm, constitutes the tool for computing the amplitudes to assign to each individual sine and cosine component.

Fig. 10
figure 10

Sines and cosines are orthogonal to each other: plot of sin θ vs. cos θ (upper left), with θ varying from 0 to 2π, together with the corresponding values of sin θ and of cos θ as a function of θ. In the lower right plot, the area corresponding to positive values of sin θ cos θ (red) is equal to the area corresponding to negative values of sin θ cos θ (blue); hence, the integral of sin θ cos θ over the interval from 0 to 2π is equal to 0

Being f(t) a continuous function of the t variable, the FT operator applied to the f(t) function generates the r(v) function in the so-called alternative (to t) frequency (v) domain; r(v) is in turn a continuous function of the v variable. On the other hand, the inverse transformation operator, FT−1, regenerates f(t) from r(v):

$$r\left(\nu \right) = {\text{FT}}\left[{f\left(t \right)} \right]$$
(12)
$$f\left(t \right) = {\text{FT}}^{- 1} \left[{r\left(\nu \right)} \right]$$
(13)

v is a variable expressing frequency: r(v) gives quantitative account for the frequency content of f(t). The procedure that allows the determination of the frequency content of a signal, namely the Fourier analysis, is suitable to draw the spectrum of the signal, i.e., to represent the signal by the relevant frequencies. In other words, a signal f(t) in the original domain is transformed in r(v), defined in the relevant frequency domain, through the FT algorithm.

As already cited, Fig. 2 reports a continuous signal, expressed by the f(t) function, together with the sine and cosine components necessary to draw it. It is a relatively simple function: only two sine and two cosine functions are necessary to account for it (see Table 1). The relevant spectra, in terms of cosine and sine coefficients for four different frequency values, are reported in Fig. 11.

Table 1 Cosine (A) and sine (B) coefficients of the basis functions used to generate the complex continuous signal of Fig. 2
Fig. 11
figure 11

Spectra of the complex continuous signal reported in Fig. 2: cosine (A) and sine (B) coefficients

The complex signal in Fig. 2 can be expressed by:

$$g\left(t \right) = {\mathbf{3}}{ \sin }\left({ 2\pi {\mathbf{10}}t} \right) + {\mathbf{2}}{ \sin }\left({ 2\pi {\mathbf{2}}.{\mathbf{5}}t} \right) + {\mathbf{2}}{ \cos }\left({ 2\pi {\mathbf{25}}t} \right) + {\mathbf{4}}{ \cos }\left({ 2\pi {\mathbf{1}}t} \right)$$
(14)

In the present case, g(t) coincides with f(t).

Summarizing, we can list the amplitudes of the cosine functions (A i) and of the sine functions (B i) together with the corresponding frequencies, v i , as reported in Table 1. The spectra reported in Fig. 11 are, therefore, the plots of the A and of the B amplitudes vs. v.

Quite important operations are easier to perform on spectra than on the corresponding original signals. Furthermore, the analysis of the spectra is often more user-friendly than the signal itself: the spectra contain all information that is often more hidden in the original signals. Figure 12 reports some examples of instrumental signals (left-hand side) that are generally examined by looking at the corresponding spectra (right-hand side).

Fig. 12
figure 12

Near infrared interferogram (a) and corresponding spectrum (b); NMR free induction decay (FID) signal (c) and corresponding spectrum (D)

Sequences

Let’s now consider a signal represented by a sequence x(k), consisting of N equally spaced points (where N is an even number), so that 0 ≤ k < N. Whenever we analyze a digital signal, i.e., on sequences of data, eventually obtained by proper sampling an originally continuous signal as defined in Eq. (9), we have to use the Discrete Fourier Transform (DFT) algorithm. The spectrum of the original signal is accounted for by representations that will be described hereafter: the analysis of the spectrum of the signal is actually necessary to perform correct sampling, according to the Nyqvist criterion illustrated above [see Eq. (8)].

At this point, the reader is supposedly worried about an evident dilemma, which requires inserting an important consideration. The dilemma is: in front of the necessity to properly sample a continuous signal, we must calculate its spectrum. However, to calculate its spectrum, we should properly sample the signal. Which came first, the chicken or the egg? By reporting exactly from Ref. [5]: “The way out from this dilemma is to use common sense and a little intuition. That is, to sample a function when we do not know his bandwidth (the difference between the upper and lower frequencies) a priori, we should choose the sampling rate such that a smooth curve drawn through the samples will ‘strongly resemble’ the original function. If we follow this original rule of thumb, then for most practical applications the function will be sampled fine enough.”.

Once this issue has been pointed out, let us go back to FT of sequences. The general expression for the sines and cosines suitable to account for the x(k) sequence,Footnote 10 in agreement with what has been discussed for the Fourier analysis, makes use of an integer h such that 0 ≤ hN/2. Therefore, h accounts for discrete values of ν; the terms with different frequency are also called harmonics, or simply frequencies. If N points constitute the sequence in the original domain, N harmonics exactly describe the signal in the frequency domain. The product 2πh describes, at increasing h within the interval given:

  1. 1.

    a null frequency term, i.e., a constant term, accounting for the offset of the signal;

  2. 2.

    a unitary frequency, i.e., a frequency for a period as long as the whole x sequence;

  3. 3.

    a frequency corresponding to a period of one half the length of x, and so on, until, for h = N/2, the highest possible frequency.

The knowledge of N values of the x(k) sequence allows the solution of a system consisting of N equations with N unknowns

FT algorithm furnishes the values for A h and B h coefficients, constituting the sine and cosine amplitudes at different frequencies, respectively, that are necessary to account for every point of the x(k) sequence. The series of cosines accounts for the symmetric portion of the sequence, and that of sines the antisymmetric one (see the sine and cosine functions in Fig. 10). The value of the generic x(k) point of the sequence is given by:

$$x\left( k \right) = \sum\limits_{h = 0}^{N/2} {\left[ {A_{\text{h}} \cos \frac{2\pi hk}{N} + B_{\text{h}} \sin \frac{2\pi hk}{N}} \right]}$$
(15)

where, for h = 0 the cosine term equals 1 and the sine term equals 0, so that the A 0 term is multiplied by 1, i.e., is constant, and B 0 is absent. Similarly, for h = N/2, B N/2 is absent. Therefore, the term A 0 is a constant term, often referred to as continuous component.Footnote 11

The cosine and sine coefficients, constituting the spectrum of the signal, are given by:

$$A_{\text{h}} = \frac{2}{N}\sum\limits_{k = 0}^{N - 1} x \left( k \right){ \cos }\frac{2\pi hk}{N}\quad \quad \quad 0 \le h \le N/ 2$$
(16)

and

$$B_{{\text {h}}} = \frac{2}{N}\mathop \sum \limits_{k = 0}^{N - 1} x\left(k \right){ \sin }\frac{2\pi hk}{N}\quad \quad \quad 0 < h < N/ 2$$
(17)

Therefore, the spectrum of the x(k) sequence consists of N/2 + 1 cosine terms (actually, N/2 true cosine terms + the continuous component) and N/2 − 1 sine terms.

Summarizing, in Eq. (15) all coefficients A h and B h defined in Eqs. (16) and (17) are multiplied by the corresponding trigonometric functions with a given k value, leading to the corresponding x(k) point of the sequence.

It is noteworthy that, for given treatments of the signal that will be considered in the Part 2 of this contribution, such as the application of low-pass filters, the plot of the vector modulus \(\it \left[ {M_{\text \it{h}} = \sqrt {\left( {A_{\text{\it h}}^{2} + B_{\text \it{h}}^{2} } \right)} } \right]\) vs. frequency, ν, is sufficient to apply the filter properly.

Conclusions and perspectives

This contribution, devoted to the treatment of analog and digital signals, discusses how the analog representation of a natural event or of a response of an instrument can be reversibly translated into digital form, i.e., into a sequence, which brings the same information content. Practical indications are given with this respect. One of the ‘transform techniques’, the Discrete Fourier Transform, is especially popular and the characteristics of the frequency spectrum, obtained by this algorithm, are discussed.

In an upcoming part of this contribution, additional advantages and tools made available once a signal is ‘Fourier transformed’ will be discussed, with specific attention to situations that are most familiar to chemists. Mathematic operations, such as convolution and deconvolution of sequences, are much less known by chemists than the four basic arithmetic operations. This notwithstanding, they are more or less ‘hidden’ inside many instrumental signals (spectra, chromatograms, etc.), which chemists deal with every day and can constitute a powerful tool for effective treatment of data.