Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In 1967, a new technique of interferometry was developed in which the receiving elements were separated by such a large distance that it was expedient to operate them independently with no real-time communication link. This was accomplished by recording the data on magnetic tape for later cross-correlation at a central processing station. The technique was called very-long-baseline interferometry (VLBI), a term recalling the earlier long-baseline interferometers at Jodrell Bank Observatory, in which the elements were connected by microwave links that had reached 127 km in length. The principles involved in VLBI are fundamentally the same as those involved in interferometers with connected elements. The tape recorder and its successor, disk storage, can be considered as an IF delay line of limited capacity with an unusually long propagation time, weeks instead of microseconds. The use of tape and disk recording media is motivated entirely by economics and places substantial limitations on the system. Satellite links have been demonstrated (Yen et al. 1977), but their high cost discourages their use.

Tape recorders have been entirely replaced by compact disks. Data can also be transmitted to correlation facilities via the Internet in quasi real time. However, latency and throughput are significant issues, and data buffering is usually required.

9.1 Early Development

The motivation to develop VLBI came from the realization that many radio sources have structures that cannot be resolved by interferometers with baselines of a few hundred kilometers. By the mid-1960s, it was well known that scintillation (discussed in Chap. 14) and time variability of the radiation from quasars implied angular sizes of < 0. 01′ ′. Maser emission from OH molecules at 18-cm wavelength was unresolved at 0.1′ ′. Low-frequency burst radiation from Jupiter was believed to emanate from regions of small angular size. The aim of the first VLBI experiments was to measure the angular sizes of these radio sources. It is instructive to consider the operation of these early VLBI experiments in their most primitive form. Consider two telescopes with system temperatures T S1 and T S2, which are pointed at a compact source giving antenna temperatures T A1 and T A2. Each station records N data samples within the coherence time, that is, the interval during which the independent oscillators remain sufficiently stable that fringes can be averaged. In the subsequent processing, these data streams are aligned, cross-correlated, and time-averaged after removing the quasi-sinusoidal fringes. The expected correlation for a point source is

$$\displaystyle{ \rho _{0} \simeq \eta \sqrt{ \frac{T_{\!A1 } T_{\!A2 } } {(T_{\!S1} + T_{\!A1})(T_{\!S2} + T_{\!A2})}}\;, }$$
(9.1)

where η is a factor of value ∼ 0. 5 to account for losses due to quantization and processing (see Sect. 9.7). Here, it is convenient to consider a normalized form of the visibility:

$$\displaystyle{ \mathcal{V}_{N} = \frac{\rho } {\rho _{0}} = \frac{\rho } {\eta }\sqrt{ \frac{T_{\!S1 } T_{\!S2 } } {T_{\!A1}T_{\!A2}}}\;, }$$
(9.2)

where ρ is the measured correlation, and we assume T A  ≪ T S . The rms noise level is

$$\displaystyle{ \varDelta \rho \simeq \frac{1} {\sqrt{N}} \simeq \frac{1} {\sqrt{2\varDelta \nu \tau _{c}}}\;, }$$
(9.3)

where Δ ν is the IF bandwidth, and τ c is the coherent integration time. Hence, from Eqs. (9.1)–(9.3), the signal-to-noise ratio is

$$\displaystyle{ \frac{\rho } {\varDelta \rho } =\eta \mathcal{V}_{N}\sqrt{\frac{T_{\!A1 } T_{\!A2 } } {T_{\!S1}T_{\!S2}} (2\varDelta \nu \tau _{c})}\;. }$$
(9.4)

If the minimum useful signal-to-noise ratio is 4, the smallest detectable flux density is as follows, from Eqs. (1.3), (1.5), and (9.4):

$$\displaystyle{ S_{\mathrm{min}} \simeq \frac{8k} {\mathcal{V}_{N}\eta }\sqrt{\frac{T_{\!S1 } T_{\!S2 } } {A_{1}A_{2}}} \frac{1} {\sqrt{2\varDelta \nu \tau _{c}}}\;, }$$
(9.5)

where k is Boltzmann’s constant, and A 1 and A 2 are the antenna collecting areas. Typical parameters in 1967 were A ≃ 250 m2 (25-m-diameter telescope), T S  ≃ 100 K, η ≃ 0. 5, and N = 1. 4 × 108 bits (one bit per sample), the capacity of a tape at a density of 800 bpi (bits per inch) used in the NRAO Mark I system, which was based on standard IBM compatible technology. For an unresolved source, S min ≃ 2 Jy. The development after three decades is indicated by the following parameter values: A ≃ 1600 m2 (64-m-diameter telescope), T S  ≃ 30 K, and N = 5 × 1012 bits, the capacity of an instrumentation tape operated at 64 MHz bandwidth. For \(\mathcal{V}_{N} = 1\), Eq. (9.5) gives S min ≃ 0. 6 mJy. In both examples, the coherence time is assumed to be greater than the running time of the tape. The source size can be estimated from a single measurement of \(\mathcal{V}_{N}\) by comparison with the visibility expected for a symmetric Gaussian model. Hence, as in Fig. 1.5, the full width at half-maximum, a, is given by

$$\displaystyle{ a = \frac{2\sqrt{\ln 2}} {\pi u} \sqrt{-\ln \mathcal{V}_{N}}\;, }$$
(9.6)

where u is the projected baseline (in wavelengths).

VLBI can be used only to study objects of exceedingly high intensity. Thus, the emission processes must normally be of nonthermal origin. To be detected on a baseline of length D, the source must be smaller than the fringe spacing. Since the flux density S is 2k T B Ωλ 2, where T B is the brightness temperature, λ is the wavelength, and Ω is the source solid angle, the minimum detectable brightness temperature is

$$\displaystyle{ (T_{B})_{\mathrm{min}} \simeq \frac{2} {\pi k}D^{2}S_{\mathrm{ min}}\;, }$$
(9.7)

since Ω ≃ π(λ∕2D)2. If D = 103 km and S min = 2 mJy, then (T B )min ≃ 106 K. Therefore, observations of thermal phenomena occurring in molecular clouds, compact HII regions, and most stars are generally not possible. On the other hand, synchrotron sources such as supernova remnants, radio galaxies, and quasars, which are limited to 1012 K by Compton losses; masers in which T B  ≃ 1015 K; and pulsars can be readily studied.

Three things were accomplished by early VLBI measurements:

  1. 1.

    Simple intensity distributions were derived by comparing measured visibilities with source models.

  2. 2.

    The distribution of the various spectral components of masers was mapped by comparing fringe frequencies for different spectral features.

  3. 3.

    Source positions were measured to an accuracy of ∼ 1′ ′ and baselines to an accuracy of a few meters.

For a review of early techniques, see Klemperer (1972). Since then, the technique has moved steadily toward the mainstream of interferometry in terms of being able to produce reliable images of complex radio sources. The principal reason for this is the use of phase closure (see Sect. 10.3), which provides most of the phase information when a large enough number of antennas is available in the VLBI network. A list of various VLBI networks is shown in Table 9.1.

Table 9.1 Examples of VLBI arraysa

It is interesting to note that the correlation of data in the earliest systems was accomplished in software on general-purpose computers. After about 30 years, during which correlation was done with custom-built hardware, this task has largely reverted back to general-purpose computers because of the rapid growth of their capabilities (Deller et al. 20072011).

9.2 Differences Between VLBI and Conventional Interferometry

In this section, we briefly discuss the differences between VLBI and connected-element interferometry. Later sections in this chapter elaborate on these differences. Before beginning, we emphasize the theoretical unity of interferometry. The fundamental aim of all interferometry is to measure the coherence properties of the electromagnetic field. Thus, the principles of connected-element interferometry and VLBI are basically identical. However, certain special techniques used in VLBI are needed because of the particular observational constraints. As the continuity of (u, v) coverage is improved, from a few meters to more than 105 km (with the largest spacing achieved by elements on distant satellites), and fiberoptic or other advanced communication systems make recording unnecessary, the concept of VLBI as a distinct technique will become a matter of history. Here, we deal with certain limitations that make classical VLBI practices somewhat distinct from those of connected-element interferometry.

Early VLBI experiments were conducted by organizing a diverse group of observatories that had been constructed for general radio astronomical research. Each telescope had its own limitations, calibration procedures, and management personnel. Various networks were formed to standardize procedures and automate the execution of VLBI experiments. Such ad hoc VLBI networks operated on an intermittent basis, and during observations, the communication between elements to verify proper operation was limited. Small amounts of data from strong sources could be transmitted from the antennas to the correlator over telephone lines and cross-correlated to determine the instrumental delays and to check that the equipment was working properly. Later, arrays dedicated to VLBI were brought into operation [see, e.g., Napier et al. (1994)].

In VLBI, one has less control over the system stability because independent frequency standards are used at each element. Frequency offsets in the standards can cause instrumental timing errors. These errors usually include an epoch error of a few microseconds and a drift of a few tenths of a microsecond per day (Sect. 9.5). Therefore, the correlation function of the received signals [with respect to time offset, τ, as defined in Eq. (3.27)] must be measured to determine and track the instrumental delay. In contrast, delay errors in connected-element interferometers, due mainly to baseline errors and atmospheric propagation delays, are usually less than 30 ps, corresponding to 1 cm of path length. These errors are negligible for bandwidths less than 1 GHz. Thus, the response in connected-element, delay-tracking interferometers is always centered on the white light fringe. Delay becomes important only when the field of view becomes too large for the bandwidth (see Sects. 2.2 and 6.3) or when spectral line measurements are made by introducing time offsets. In VLBI, it is necessary to search a range of delay values to find the correct time relationship that maximizes the correlation. Correlations for a number of delay offsets are usually formed simultaneously, so a VLBI correlator may resemble a digital spectral correlator, although the number of frequency channels may be less than generally used for spectral line observations. The frequency offsets in the standards, which cause drifts with time in the instrumental delay, also introduce offsets in the fringe frequency. Thus, analysis of a VLBI experiment must begin with a two-dimensional search in delay and fringe frequency (delay rate) to find the peak of the correlation function. This process is referred to as fringe finding (see Sect. 9.3.4).

The concept of coherence has different implications in VLBI and connected-element interferometry. In connected-element interferometry, there is generally a suitable calibration source within a few degrees of the source of interest that can be observed every few minutes. Even if the instrumental phase drifts, there is no fundamental limit on integration time, and the concept of coherence time is replaced by that of the interval between calibrations. In VLBI, the use of calibrators to extend coherence time is more difficult because the short-term phase stability (t < 103 s) is worse. Atmospheric fluctuations above the stations are generally completely uncorrelated, and the frequency standards and frequency multipliers introduce phase noise in the fringes. Furthermore, a fundamental difference between connected-element interferometry and VLBI comes from the fact that there are many fewer sources that are unresolved at VLBI spacings and that can be used as calibrators. It is not always possible to find a calibrator close enough to the source under investigation to use as a phase reference. The time required to repoint the antennas and the decorrelation introduced by the atmosphere both increase with angular spacing. Thus, VLBI is subject to a fundamental coherence time that limits its sensitivity. For integration beyond the coherence time, it is necessary to average the fringe amplitudes, for which sensitivity improves only as the fourth root of the integration time (Sect. 9.3.5). It is also more difficult to calibrate phase in VLBI systems, although the situation has steadily improved as enhanced sensitivity has increased the number of sources that can be used as calibrators. Improved instrumental phase stability and more accurate modeling of the baselines, atmosphere, and similar factors have allowed the phase to be related to that of a calibrator several degrees away. Phase referencing in this manner is discussed in Sect. 12.2.3, and an example is shown in Fig. 12.1 Phase information can also be derived from phase closure analysis. In measuring positions, fringe frequency and group delay (the delay pattern effect discussed in Sects. 2.2 and 6.3.1) have also proved useful as measurement quantities.

Storage of the undetected signals before correlation presents VLBI with several problems. The average IF bandwidth is limited by the recording medium, which therefore limits the sensitivity of VLBI. The data must be stored as efficiently as possible, which requires a coarsely quantized representation of the signal, sampled at the Nyquist rate. With such a representation, the basic operations of fringe rotation and delay tracking, when performed on the recorded data, introduce significant effects that must be allowed for in deriving the visibility (Sect. 9.7).

9.2.1 The Problem of Field of View

In most VLBI applications, the ratio of the extent of the source under study to the resolution is typically less than about 102 (see Figs. 1.191.21). It is interesting to consider the challenge of imaging the entire primary beam of the antennas used in a VLBI observation. Consider an array of the following parameters:

Table 2

The nominal resolution is λD, or 1.5 mas, and the field of view is Δ θ ∼ λd, or 250″. Hence, the number of pixels required for an image (at 2 pixels per resolution element) of the entire primary beam is

$$\displaystyle{ N_{p} \simeq \pi \left (\frac{D} {d} \right )^{2} \simeq 5 \times 10^{11}\;. }$$
(9.8)

Note that N p is independent of wavelength because the resolution and field of view both scale as wavelength.

The processing and data storage requirements are considerable because of the large range of geometric delay and fringe rate that must be covered. The geometric delay is τ g  = Dcosθc, where θ is the angle between the baseline vector and the source direction. Thus, the range of delay over the primary beam is DsinθΔ θc, and the maximum delay range requirement is

$$\displaystyle{ \varDelta \tau _{g,\mathrm{max}} = \frac{D} {\nu d} \;. }$$
(9.9)

At the Nyquist sampling interval of (2Δ ν)−1, the number of lags in the correlation function needed to cover this range will be

$$\displaystyle{ N_{c} = 2\left (\frac{D} {d} \right )\left (\frac{\varDelta \nu } {\nu }\right )\;, }$$
(9.10)

which is about 30,000 for our example.

The fringe rate, in Hz, ω(d τ g d t)∕2π, is D ω e sinθλ, where ω e  = 1∕T e and T e is the Earth’s sidereal period. This leads to a range of fringe rates of

$$\displaystyle{ \varDelta \nu _{f,\mathrm{max}} = \left (\frac{D} {d} \right )\left ( \frac{2\pi } {T_{e}}\right )\;, }$$
(9.11)

which requires a minimum sampling time of (Δ ν f, max)−1, or about 34 ms. Thus, the number of fringe rate samples in time T obs = T e ∕2 is about 2. 9 × 106. The total amount of data in the delay–fringe rate domain on N(N − 1)∕2 ∼ N 2 baselines is

$$\displaystyle{ N_{T} \simeq \pi N^{2}\left (\frac{D} {d} \right )^{2}\left (\frac{\varDelta \nu } {\nu }\right )\;. }$$
(9.12)

For our case, N T  ∼ 5 × 1012 samples. With 2 bytes/sample and complex numbers, the minimum storage requirement would be about 160 Tbytes.

Because of the high brightness requirement of VLBI, most of the primary beam field will be largely empty but may contain a significant number of compact sources. A simple approach would be to image these sources with separate passes through the data processing system with separate field centers for each source. The advent of software correlators has provided a more efficient approach. The data from the correlation step (correlation functions of 30,000 lags at interval cadence of 34 ms, in our case) can be shifted to various phase centers and the resulting data streams reduced in volume substantially before imaging. The details of the phase center shifting, called “(u, v) shifting,” are described by Morgan et al. (2011). This process can be embedded in the software architecture without the need for intermediate storage of the entire delay–fringe rate data set. An implementation is described by Deller et al. (2011), and an example is shown in Fig. 9.1.

Fig. 9.1
figure 1

An example of the multiple field center imaging technique with data from the EVN at 1.6 GHz. “P-Centre” is the pointing center of the individual antennas, and the circle shows the primary beam size (FWHM) of a 32-m-diameter antenna. The phase calibrator is J2229+0114. Fifteen other sources were detected in the field, and the images of three of them are shown in the inset panels. The contour levels start at the 3σ level and increase by factors of \(\sqrt{2}\). From H.-M. Cao et al. (2014), reproduced with permission. © ESO.

9.3 Basic Performance of a VLBI System

9.3.1 Time and Frequency Errors

A block diagram of a basic VLBI system and a generic processor configuration is shown in Fig. 9.2. The atomic frequency standards control the phases of the local oscillators and the clock pulses for sampling the data. In many VLBI applications, such as spectral line observations or astrometric programs, frequency-dependent effects must be accounted for precisely. To understand the spectral response of the system, we consider the phase shifts encountered by a single frequency component. The signals received from a plane wave are e j2π ν t at antenna 1, which we designate as the time-reference antenna, and \(e^{\,j2\pi \nu (t-\tau _{g})}\) at antenna 2, where τ g is the geometric delay. The local oscillators have phases 2π ν LO t +θ 1 and 2π ν LO t +θ 2, where ν LO is the local oscillator frequency, and θ 1 and θ 2 are the slowly varying terms that represent the phase noise due to the frequency standards. To start, we consider the upper-sideband response in Fig. 9.2, for which the local oscillator frequency is below the signal frequency. Thus, the phases after mixing are

$$\displaystyle{ \begin{array}{ll} \phi _{1}^{(1)} = 2\pi &(\nu -\nu _{\mathrm{LO}})t -\theta _{1}\;, \\ \phi _{2}^{(1)} = 2\pi &(\nu -\nu _{\mathrm{LO}})t - 2\pi \nu \tau _{g} -\theta _{2}\;. \end{array} }$$
(9.13)

The recorded signals each have clock errors τ 1 and τ 2, so the phases of the recorded signals are

$$\displaystyle{ \begin{array}{ll} \phi _{1}^{(2)} = 2\pi (\nu -\nu _{\mathrm{LO}})&(t -\tau _{1}) -\theta _{1}\;, \\ \phi _{2}^{(2)} = 2\pi (\nu -\nu _{\mathrm{LO}})&(t -\tau _{2}) - 2\pi \nu \tau _{g} -\theta _{2}\;. \end{array} }$$
(9.14)

During processing, the time series of signal samples from antenna 2 is advanced by τ g , the estimate of τ g , so

$$\displaystyle{ \phi _{2}^{(3)} = 2\pi (\nu -\nu _{\mathrm{ LO}})(t -\tau _{2} +\tau '_{g}) - 2\pi \nu \tau _{g} -\theta _{2}\;. }$$
(9.15)
Fig. 9.2
figure 2

Block diagram of the essential elements of a VLBI system, including data acquisition and processing. The system may pass the upper, lower, or both sidebands at the mixer inputs, depending on the passband of the amplifiers. For millimeter-wavelength observations, there may be no amplifier preceding the mixer, in which case both sidebands may be accepted. Quantization and sampling of the signals occur in the format units. The processor system shown illustrates the configuration described analytically by Eqs. (9.21)–(9.26). Major variations in the processing system relate to the relative positions of the correlator, fringe rotator (see Fig. 9.21), and FFT operation in the correlator.

The output of the multidelay correlator and Fourier transform processor is the cross power spectrum. The phase at the output of the processor for the signal component at frequency ν is

$$\displaystyle\begin{array}{rcl} \phi _{12}& =& \ \phi _{1}^{(2)} -\phi _{ 2}^{(3)} \\ & =& \ 2\pi (\nu -\nu _{\mathrm{LO}})(\tau _{2} -\tau _{1}) + 2\pi (\nu \varDelta \tau _{g} +\nu _{\mathrm{LO}}\tau '_{g}) +\theta _{21} \\ & =& \ 2\pi (\nu -\nu _{\mathrm{LO}})(\tau _{e} +\varDelta \tau _{g}) + 2\pi \nu _{\mathrm{LO}}\tau _{g} +\theta _{21}\;, {}\end{array}$$
(9.16)

where Δ τ g  = τ g τ g is the delay error, τ e  = τ 2τ 1 is the clock error, and θ 21 = θ 2θ 1. Equation (9.16) applies to the upper-sideband frequency conversion in the mixers in Fig. 9.2, for which the intermediate frequency (IF), (νν LO), is positive. For generality, we also give the lower-sideband response, for which the IF is (ν LOν). For the lower sideband,

$$\displaystyle{ \phi _{12} = 2\pi (\nu _{\mathrm{LO}}-\nu )(\tau _{e} +\varDelta \tau _{g}) - 2\pi \nu _{\mathrm{LO}}\tau _{g} -\theta _{21}\;. }$$
(9.17)

Note that in the ideal case where τ 1 = τ 2, θ 1 = θ 2, and τ g  = τ g , Eqs. (9.16) and (9.17) reduce to ϕ 12 = 2π ν LO τ g for the upper sideband, and ϕ 12 = −2π ν LO τ g for the lower sideband.

The correlation function at the correlator output is real, but not even; thus, the cross power spectrum \(\mathcal{S}_{12}\) for a source of continuum radiation has the property

$$\displaystyle{ \mathcal{S}_{12}(\nu ') = \mathcal{S}_{12}^{{\ast}}(-\nu ')\;, }$$
(9.18)

where ν′ is the intermediate frequency (νν LO). We assume that the filters in the electronics have identical responses and therefore do not introduce any net phase shifts. The power response function of the instrumental filters is therefore real, and in terms of the voltage response, H(ν), of the filters for the two antennas, \(\mathcal{S}(\nu ') = H_{1}(\nu ')H_{2}^{{\ast}}(\nu ')\). By combining the phase from Eq. (9.16) and the magnitude of the power response, the cross power spectrum for the upper sideband can be written

$$\displaystyle{ \mathcal{S}_{12}(\nu ') = \mathcal{S}(\nu ')\exp \left \{j\left [2\pi \nu '(\tau _{e} +\varDelta \tau _{g}) + 2\pi \nu _{\mathrm{LO}}\tau _{g} +\theta _{21}\right ]\right \}\;. }$$
(9.19)

The corresponding equation for the lower sideband can be obtained from Eq. (9.17). For the upper sideband, the cross-correlation function can be calculated from Eqs. (9.18) and (9.19) as

$$\displaystyle{ \rho _{12}(\tau ) =\int _{ -\infty }^{\infty }\mathcal{S}_{ 12}(\nu ')e^{\,j2\pi \nu '\tau }d\nu '\;. }$$
(9.20)

For either sideband, integration includes both positive and negative frequencies, and since \(\mathcal{S}_{12}\) is Hermitian and \(\mathcal{S}\) is purely real, we obtain

$$\displaystyle{ \rho _{12}(\tau ) = 2F_{1}(\tau ')\cos (2\pi \nu _{\mathrm{LO}}\tau _{g} +\theta _{21}) - 2F_{2}(\tau ')\sin (2\pi \nu _{\mathrm{LO}}\tau _{g} +\theta _{21})\;, }$$
(9.21)

where τ′ = τ +τ e +Δ τ g and

$$\displaystyle{ \begin{array}{rcl} F_{1}(\tau )& =&\int _{0}^{\infty }\mathcal{S}(\nu ')\cos (2\pi \nu '\tau )d\nu '\;, \\ F_{2}(\tau )& =&\int _{0}^{\infty }\mathcal{S}(\nu ')\sin (2\pi \nu '\tau )d\nu '\;. \end{array} }$$
(9.22)

If \(\mathcal{S}(\nu ')\) is a rectangular lowpass spectrum with bandwidth Δ ν, then

$$\displaystyle{ \begin{array}{l} F_{1}(\tau ) =\varDelta \nu \frac{\sin 2\pi \varDelta \nu \tau } {2\pi \varDelta \nu \tau }\;, \\ F_{2}(\tau ) =\varDelta \nu \frac{\sin ^{2}\pi \varDelta \nu \tau } {\pi \varDelta \nu \tau } \;. \end{array} }$$
(9.23)

These functions are shown in Fig. 9.3. By substituting Eq. (9.23) into Eq. (9.21), the cross-correlation function can be written

$$\displaystyle{ \rho _{12}(\tau ) = 2\varDelta \nu \cos (2\pi \nu _{\mathrm{LO}}\tau _{g} +\theta _{21} +\pi \varDelta \nu \tau ')\frac{\sin \pi \varDelta \nu \tau '} {\pi \varDelta \nu \tau '}\;. }$$
(9.24)

A similar analysis is given by Rogers (1976).

Fig. 9.3
figure 3

Functions F 1(τ) and F 2(τ), defined in Eq. (9.23), and the quantity \(\sqrt{F_{1 }^{2 }(\tau ) + F_{2 }^{2 }(\tau )}\).

The variation of τ g with time results in fringe oscillations at the correlator output. The fringe frequency, (1∕2π)d ϕ 12d t, is constant across the receiver bandwidth because the (instrumental) delay tracking removes the (geometric) delay-induced phase variation across the band. For the upper and lower sidebands, the rate of change of phase has opposite signs; note the term 2π ν LO τ g in Eqs. (9.16) and (9.17). See also Fig. 6.5 and the related discussion. In VLBI, the natural fringe frequency is fast enough that the fringes would be lost in the final averaging of the correlated data, so rotation of the phase to stop the fringes is applied before the correlator in Fig. 9.2. In a double-sideband system, if the fringes are stopped for one sideband, the fringe frequency is doubled for the other sideband. However, it is possible to obtain the data from each sideband by processing the data twice with appropriate fringe offsets each time. In VLBI, the source position and other parameters are not always known with sufficient accuracy when the observation is made, so in Fig. 9.2, the fringes are stopped after recovery of the data streams to permit trial of different fringe rotation rates. This involves applying a phase shift to the quantized signals at the correlator input or output (see Sect. 9.7.1). The effect on the cross-correlation function or the cross power spectrum can be described as multiplication by \(e^{-j2\pi \nu _{LO}\tau '_{g}}\) for the upper sideband and filtering to select the low-frequency term. This process results in a complex correlation function:

$$\displaystyle{ \rho '_{12}(\tau ) =\varDelta \nu \exp \left [j\left (2\pi \nu _{\mathrm{LO}}\varDelta \tau _{g} +\theta _{21} +\pi \varDelta \nu \tau '\right )\right ]\frac{\sin \pi \varDelta \nu \tau '} {\pi \varDelta \nu \tau '}\;. }$$
(9.25)

Note that the principal fringe term, 2π ν LO τ g , has been eliminated, but residual fringes can result from terms in Δ τ g and Δ ν. The resulting cross power spectrum is

$$\displaystyle{ \mathcal{S}'_{12}(\nu ') = \mathcal{S}(\nu ')\exp \left \{j\left [2\pi \nu '(\tau _{e} +\varDelta \tau _{g})2\pi \nu _{\mathrm{LO}}\varDelta \tau _{g} +\theta _{21}\right ]\right \}\;. }$$
(9.26)

This applies to the upper sideband, for which the fringes have been stopped, and the correlator output for the other sideband averages to zero.

An example of ρ12(τ) for eight values of τ is shown in Fig. 9.4. The waveforms represent the correlator output as a function of time for eight different delay offsets (lags) that differ sequentially by one Nyquist sample interval. Note that there is a phase shift of π∕2 between adjacent delay steps. The fringe phase can be recovered by a proper interpolation (see Sect. 9.7.3) to the peak of the correlation function, or from the phase of the cross power spectrum at ν′ = 0. The group delay can be derived from the position of the correlation peak or the slope of the phase of the cross power spectrum. Note that the measured delay is (1∕2π)d ϕ 12d ν and is therefore a group delay, not a phase delay.

Fig. 9.4
figure 4

Each sinusoid represents the correlation function [the real part of Eq. (9.25)] vs. time for a particular delay offset (from the top: \(\frac{7} {2}\), \(\frac{5} {2}\), \(\frac{3} {2}\), \(\frac{1} {2}\), \(-\frac{1} {2}\), \(-\frac{3} {2}\), \(-\frac{5} {2}\), \(-\frac{7} {2}\) times the Nyquist interval). The oscillations result from the residual fringe frequency, which includes any offsets in the frequency standards at the two antennas. Note the progressive phase shift of 90 between values of the correlation function at successive delay offsets.

The actual local oscillator frequencies may differ from the nominal value ν LO due to an intentional offset from the nominal frequency or due to an offset error in the frequency standard. We can expand the phase terms θ 1 and θ 2 to include these frequency offsets, Δ ν 1 and Δ ν 2, and zero-mean phase components, θ1 and θ2:

$$\displaystyle{ \begin{array}{rcl} \theta _{1} & =&\ 2\pi \varDelta \nu _{1}t +\theta '_{1}\;, \\ \theta _{2} & =&\ 2\pi \varDelta \nu _{2}t +\theta '_{2}\;.\end{array} }$$
(9.27)

Thus, the fringe phase from Eq. (9.26) becomes

$$\displaystyle{ \phi _{12}(\nu ') = 2\pi \left [\nu '(\tau _{e} +\varDelta \tau _{g}) +\nu _{\mathrm{LO}}\varDelta \tau _{g} +\varDelta \nu _{\mathrm{LO}}t\right ] +\theta '_{21}\;, }$$
(9.28)

where Δ ν LO = Δ ν 2Δ ν 1, the difference in the local oscillator frequencies, and θ21 = θ2θ1. The fringe frequency (1∕2π)d ϕ 12d t contains this local oscillator difference term. If Δ ν 1 is due to an offset in a frequency standard and is not zero, the measured fringe phase is actually more complicated than shown in Eq. (9.28). The clock error changes with time because of the frequency standard offset and is

$$\displaystyle{ \tau _{1} = (\tau _{1})_{t=0} + \frac{\varDelta \nu _{1}} {\nu _{\mathrm{LO}}}t\;. }$$
(9.29)

The recovered time in the processor, based on the time of station 1, is related to the “true” time t by

$$\displaystyle{ t_{1} = (\tau _{1})_{t=0} + \left (1 + \frac{\varDelta \nu _{1}} {\nu _{\mathrm{LO}}}\right )t\;, }$$
(9.30)

so that there is a slight shift in all measured frequencies and phases. Thus, there is a fundamental asymmetry in the processing between the reference station from which time is derived and the other stations (Whitney et al. 1976).

For spectral line observations, the quantity \(\mathcal{S}(\nu ')\) in Eq. (9.26) is the (temporal frequency) spectrum of the visibility of the source multiplied by the bandpass response of the interferometer. The bandpass response can be obtained by observation of the cross power spectrum of a continuum source with a flat spectrum. Alternately, if the phase responses of the interferometer elements are identical, the bandpass response can be obtained from the geometric mean of the power spectra from the individual elements. These power spectra are obtained by observing a continuum source or blank sky and measuring the autocorrelation of the waveform from each individual antenna. The frequency spectrum of the normalized visibility can be obtained by dividing the visibility spectrum by the geometric mean of the power spectra of the source as measured with each antenna. To correct for nonidentical phase responses, it is necessary to measure the complex power spectrum on a strong continuum source. Details of calibration procedures in VLBI spectral line observations are given by Moran (1973), Reid et al. (1980), Moran and Dhawan (1995), and Reid (19951999).

9.3.2 Retarded Baselines

The estimate of delay τ g must be accurate enough to ensure that the signal is within the delay and fringe-frequency ranges of the processor. The simplest approximation is

$$\displaystyle{ \tau _{g} = \frac{1} {c}\,\mathbf{D}\boldsymbol{\, \cdot \,}\mathbf{s}_{0}\;, }$$
(9.31)

where D = r 1r 2, r 1, and r 2 are vectors from the center of the Earth to each station, and s 0 is the unit vector to the center of the field. Account must be taken of the fact that the Earth moves in the time between the arrival of a wave crest at one station and at another, since the Earth is not an inertial reference. Therefore, in calculating the delay, we should use not the instantaneous baseline but the “retarded” baseline (Cohen and Shaffer 1971). A plane wave reaches the first station at time t 1 and the second station at a time t 2, which satisfies the equation

$$\displaystyle{ \mathbf{k}\boldsymbol{\, \cdot \,}\mathbf{r}_{1}(t_{1}) - 2\pi \nu t_{1} = \mathbf{k}\boldsymbol{\, \cdot \,}\mathbf{r}_{2}(t_{2}) - 2\pi \nu t_{2}\;, }$$
(9.32)

where k = (2πλ)s 0. Now t 2t 1 = τ g , so

$$\displaystyle{ 2\pi \nu \tau _{g} = \mathbf{k}\boldsymbol{\, \cdot \,} [\mathbf{r}_{2}(t_{1} +\tau _{g}) -\mathbf{r}_{1}(t_{1})]\;. }$$
(9.33)

Expansion of r 2 in a Taylor series gives

$$\displaystyle{ \mathbf{r}_{2}(t_{1} +\tau _{g}) \simeq \mathbf{r}_{2}(t_{1}) + \mathbf{\dot{r}}_{2}(t_{1})\tau _{g} + \cdots \,, }$$
(9.34)

where the dot over r 2 indicates the derivative and

$$\displaystyle{ 2\pi \nu \tau _{g} \simeq \mathbf{k}\boldsymbol{\, \cdot \,} [\mathbf{D}(t_{1}) + \mathbf{\dot{r}}_{2}(t_{1})\tau _{g}]\;. }$$
(9.35)

Solving for τ g yields

$$\displaystyle{ \tau _{g} = \frac{\mathbf{D}\boldsymbol{\, \cdot \,}\mathbf{s}_{0}} {c} \left [1 -\frac{\mathbf{s}_{0}\boldsymbol{\, \cdot \,}\mathbf{\dot{r}}_{2}} {c} \right ]^{-1}\;, }$$
(9.36)

where all quantities are evaluated at t 1. Since \(\dot{\mathbf{r}} = \boldsymbol{\omega }_{e} \times \mathbf{r}\), where \(\boldsymbol{\omega }_{e}\) is the angular velocity vector of the Earth and × indicates the vector cross product, we can rewrite Eq. (9.36) as

$$\displaystyle{ \tau _{g} \simeq \frac{\mathbf{D}\boldsymbol{\, \cdot \,}\mathbf{s}_{0}} {c} \left [1 -\frac{\mathbf{s}_{0}\boldsymbol{\, \cdot \,} (\boldsymbol{\omega }_{e} \times \mathbf{r}_{2})} {c} \right ]^{-1}\;, }$$
(9.37)

or

$$\displaystyle{ \tau _{g} \simeq \tau _{g0}(1+\varDelta )\;, }$$
(9.38)

where 1 +Δ is the term in brackets on the right side of Eq. (9.37). From the w term in Eq. (4.3),

$$\displaystyle{ \tau _{g0} = \frac{D} {c} \left [\sin d\sin \delta +\cos d\cos \delta \cos (H - h)\right ]\;. }$$
(9.39)

Here (H, δ) and (h, d) are the hour angle and declination coordinates of the source and baseline, respectively, the hour angles usually being specified with respect to the Greenwich meridian in VLBI practice. Also, we have

$$\displaystyle{ \varDelta = \frac{\omega _{e}r_{2}} {c} \cos \mathcal{L}_{2}\cos \delta \sin (h_{2} - H)\;, }$$
(9.40)

where \(\mathcal{L}_{2}\), h 2, and r 2 are the latitude, hour angle, and magnitude of r 2, where ω e is the magnitude of \(\boldsymbol{\omega }_{e}\). The function Δ has a maximum value of 1. 5 × 10−6, and τ g can differ from τ g0 by a maximum of about 0.05 μs. Note that the appropriate coordinates in Eq. (9.39) are those that are uncorrected for refraction or diurnal aberration. An equivalent way of accounting for the retarded baseline is to use Eq. (9.31) for the delay but correct h and δ for the diurnal aberration at the remote site. We introduced the concept of retarded baseline mainly for pedagogical purposes. It does not appear explicitly when interferometry variables are calculated in a heliocentric frame.

There are different ways to formulate VLBI observables. One system that may be described as station-oriented is to refer the measurements to the center of the Earth, so that if recordings from two antennas are processed once and then interchanged and reprocessed, the phase obtained on the second pass will be the negative of that obtained on the first pass. This method presupposes an Earth model, since the radius vectors must be known. For applications to astrometry or geodesy, a baseline-oriented system is usually preferred, in which the observables have no dependence on a priori values of Earth parameters. A more precise discussion of VLBI observables can be found in Shapiro (1976) and Cannon (1978). For a full barycentric formulation, see Sovers et al. (1998).

9.3.3 Noise in VLBI Observations

In VLBI, it is often necessary to identify and calibrate the fringe visibility in situations of low signal-to-noise ratio and short coherence time. In such cases, a thorough understanding of the noise properties of interferometers can be very useful. The properties of the fringe amplitude and phase were briefly introduced in Sect. 6.2.4 We now develop this discussion further [see Moran (1976) and Hjellming (1992)]. The measured visibility is represented by a vector \(\mathbf{Z} =\pmb{ \mathcal{V}}+\boldsymbol{\varepsilon }\), where \(\pmb{\mathcal{V}}\) and \(\boldsymbol{\varepsilon }\) represent the true visibility (the signal) and noise components, respectively. We select coordinates with x (real) and y (imaginary) so that \(\pmb{\mathcal{V}}\) lies along the x axis, as shown in Fig. 6.8 There is no loss in generality by having \(\pmb{\mathcal{V}}\) lie along the x axis. The phase of the measured visibility resulting from the noise is a random variable denoted by ϕ. The components of \(\boldsymbol{\varepsilon }\) have independent zero-mean Gaussian probability distributions in their x and y coordinates, with an rms deviation σ given by Eq. (6.50). In polar coordinates, the amplitude of \(\boldsymbol{\varepsilon }\) has a Rayleigh probability distribution, and the phase of \(\boldsymbol{\varepsilon }\) has a uniform probability distribution. Z is therefore a random variable whose x and y components, Z x and Z y , have a probability distribution given by

$$\displaystyle{ p(Z_{x},Z_{y}) = \frac{1} {2\pi \sigma ^{2}}\exp \left [-\frac{(Z_{x} -\vert \mathcal{V}\vert )^{2} + Z_{y}^{2}} {2\sigma ^{2}} \right ]\;. }$$
(9.41)

We convert this probability distribution to polar coordinates,

$$\displaystyle\begin{array}{rcl} Z_{x} = Z\,\cos \phi & &{}\end{array}$$
(9.42a)
$$\displaystyle\begin{array}{rcl} Z_{y} = Z\,\sin \phi \;,& &{}\end{array}$$
(9.42b)

by noting that the Jacobian of the transformation is simply \(\vert \mathcal{V}\vert \) [see, e.g., Sivia (2006)] and obtain the result

$$\displaystyle{ p(Z,\phi ) = \frac{\vert \mathcal{V}\vert } {2\pi \sigma ^{2}} \exp \left [-\frac{(Z\cos \phi + \vert \mathcal{V}\vert )^{2} + Z^{2}\sin ^{2}\phi } {2\sigma ^{2}} \right ]\;, }$$
(9.43)

where \(Z =\! \sqrt{Z_{x }^{2 } + Z_{y }^{2}}\).

The marginal distribution of Z is given by

$$\displaystyle{ p(Z) =\int _{ -\pi }^{\pi }p(Z,\phi )\,d\phi \;, }$$
(9.44)

which, as in Eq. (6.63a), is

$$\displaystyle{ p(Z) = \frac{Z} {\sigma ^{2}} \exp \left (-\frac{Z^{2} + \vert \mathcal{V}\vert ^{2}} {2\sigma ^{2}} \right )I_{0}\left (\frac{Z\vert \mathcal{V}\vert } {\sigma ^{2}} \right )\;,\qquad Z > 0\;, }$$
(9.45)

where I 0 is a modified Bessel function of order zero, which is defined by

$$\displaystyle{ I_{0}(x) = \frac{1} {\pi } \int _{0}^{\pi }e^{\,x\cos \theta }\,d\theta \;. }$$
(9.46)

p(Z) is known as the Rice distribution.

The marginal distribution ϕ is

$$\displaystyle{ p(\phi ) =\int _{ 0}^{\infty }p(Z,\phi )\,dZ\;, }$$
(9.47)

which becomes

$$\displaystyle{ \begin{array}{rcl} p(\phi )& =&\;\frac{1} {2\pi }\,\exp \left (-\frac{\vert \mathcal{V}\vert ^{2}} {2\sigma ^{2}} \right ) + \left \{ \frac{1} {\sqrt{8\pi }} \frac{\vert \mathcal{V}\vert \cos \phi } {\sigma } \;\exp \left (-\frac{\vert \mathcal{V}\vert ^{2}\sin ^{2}\phi } {2\sigma ^{2}} \right )\right. \\ & &\left.\times \left [1 +\mathrm{ erf}\left (\frac{\vert \mathcal{V}\vert \cos \phi } {\sqrt{2}\sigma }\right )\right ]\right \}\;, \end{array} }$$
(9.48)

where erf is the error function defined in Eq. (6.63c). Note that p(ϕ) is an even function of ϕ, as expected, since the phase of \(\mathcal{V}\) was set to zero. Hence, 〈ϕ〉 = 0. p(ϕ) was first derived in the interferometry literature by Vinokur (1965). Equations (9.45) and (9.48) correspond to Eqs. (6.63a) and (6.63b). However, here we have written p(ϕ) in a slightly different but equivalent form to make its asymptotic behavior more obvious. These probability distributions are plotted in Fig. 6.9

The expectations of Z, Z 2, and Z 4 are

$$\displaystyle\begin{array}{rcl} \langle Z\rangle = \sqrt{ \frac{\pi } {2}}\sigma \exp \left (-\frac{\vert \mathcal{V}\vert ^{2}} {4\sigma ^{2}} \right )\left [\left (1 + \frac{\vert \mathcal{V}\vert ^{2}} {2\sigma ^{2}} \right )I_{0}\left (\frac{\vert \mathcal{V}\vert ^{2}} {4\sigma ^{2}} \right ) + \frac{\vert \mathcal{V}\vert ^{2}} {2\sigma ^{2}} I_{1}\left (\frac{\vert \mathcal{V}\vert ^{2}} {4\sigma ^{2}} \right )\right ]\;,& &{}\end{array}$$
(9.49)
$$\displaystyle{ \langle Z^{2}\rangle = \vert \mathcal{V}\vert ^{2} + 2\sigma ^{2}\;, }$$
(9.50)

and

$$\displaystyle{ \langle Z^{4}\rangle = \vert \mathcal{V}\vert ^{4} + 8\sigma ^{2}\vert \mathcal{V}\vert ^{2} + 8\sigma ^{4}\;, }$$
(9.51)

where I 1 is the modified Bessel function of order one, defined by

$$\displaystyle{ I_{1}(x) = \frac{1} {\pi } \int _{0}^{\pi }e^{\,x\cos \theta }\,\cos \theta \,d\theta \;. }$$
(9.52)

Higher even-order moments of Z can be readily calculated using the moment theorem for a Gaussian random distribution. When no signal is present, i.e., when \(\vert \mathcal{V}\vert = 0\), I 0(0) = 1, and the probability distributions of Z and ϕ are those of the noise, which are Rayleigh and uniform distributions, respectively:

$$\displaystyle{ p(Z) = \frac{Z} {\sigma ^{2}} \exp \left (-\frac{Z^{2}} {2\sigma ^{2}} \right )\;,\qquad Z > 0\;, }$$
(9.53)

and

$$\displaystyle{ p(\phi ) = \frac{1} {2\pi }\;,\qquad 0 \leq \phi < 2\pi \;. }$$
(9.54)

For the no-signal case,

$$\displaystyle{ \langle Z\rangle = \sqrt{\pi /2}\,\sigma \;, }$$
(9.55)
$$\displaystyle{ \sigma _{Z} = \sqrt{\langle Z^{2 } \rangle -\langle Z\rangle ^{2}} =\sigma \sqrt{2 -\pi /2}\;, }$$
(9.56)

and

$$\displaystyle{ \sigma _{\phi } = \frac{\pi } {\sqrt{3}}\;. }$$
(9.57)

For the weak-signal case, defined as \(\vert \mathcal{V}\vert \ll \sigma\), we use the approximations I 0(x) ≃ 1 + x 2∕4 and I 1(x) ≃ x∕2. The probability distributions of Z and ϕ are

$$\displaystyle{ p(Z) \simeq \frac{Z} {\sigma ^{2}} \exp \left (-\frac{Z^{2}} {2\sigma ^{2}} \right )\left [1 -\frac{1} {2} \frac{\vert \mathcal{V}\vert ^{2}} {\sigma ^{2}} + \frac{1} {4}\left (\frac{Z\vert \mathcal{V}\vert } {\sigma ^{2}} \right )^{2}\right ] }$$
(9.58)

and

$$\displaystyle{ p(\phi ) \simeq \frac{1} {2\pi } + \frac{1} {\sqrt{8\pi }} \frac{\vert \mathcal{V}\vert } {\sigma } \cos \phi \;, }$$
(9.59)

to first order in \(\vert \mathcal{V}\vert /\sigma\). Thus,

$$\displaystyle{ \langle Z\rangle \simeq \sigma \sqrt{ \frac{\pi } {2}}\left (1 + \frac{\vert \mathcal{V}\vert ^{2}} {4\sigma ^{2}} \right )\;, }$$
(9.60)
$$\displaystyle{ \sigma _{Z} \simeq \sigma \sqrt{2 - \frac{\pi } {2}}\left (1 + \frac{\vert \mathcal{V}\vert ^{2}} {4\sigma ^{2}} \right )\;, }$$
(9.61)

and

$$\displaystyle{ \sigma _{\phi } \simeq \frac{\pi } {\sqrt{3}}\left (1 -\sqrt{ \frac{9} {2\pi ^{3}}} \frac{\vert \mathcal{V}\vert } {\sigma } \right )\;. }$$
(9.62)

Note that Z departs from a Rayleigh distribution slowly as \(\vert \mathcal{V}\vert /\sigma\) increases, whereas the probability distribution of ϕ is confined to a spread (full width at half-maximum) of only 110 and 70 for \(\vert \mathcal{V}\vert /\sigma\) equal to 1 and 2, respectively (see Fig. 6.9). Hence, as a practical matter, it is often easier to identify a weak signal by its phase rather than by its amplitude, as shown in Fig. 9.5.

Fig. 9.5
figure 5

A simulated visibility spectrum of a source with a single spectral line with a Gaussian profile of amplitude equal to 2 and centered at 100 MHz (solid line). The spectral resolution is 1 MHz, and σ = 1 (hence \(\vert \mathcal{V}\vert /\sigma = 2\)) at line center. This demonstrates that weak signals can be more easily identified (by eye) in phase than in amplitude.

For the strong-signal case, \(\vert \mathcal{V}\vert \gg \sigma\), \(I_{0}(x) \simeq e^{x}/\sqrt{2\pi x}\). The probability functions for Z and ϕ are approximately Gaussian distributions and are given by

$$\displaystyle{ p(Z) \simeq \frac{1} {\sqrt{2\pi }\,\sigma }\sqrt{ \frac{Z} {\vert \mathcal{V}\vert }}\exp \left [-\frac{(Z -\vert \mathcal{V}\vert )^{2}} {2\sigma ^{2}} \right ] }$$
(9.63)

and

$$\displaystyle{ p(\phi ) \simeq \frac{1} {\sqrt{2\pi }} \frac{\vert \mathcal{V}\vert } {\sigma } \exp \left (-\frac{\vert \mathcal{V}\vert ^{2}\phi ^{2}} {2\sigma ^{2}} \right )\;. }$$
(9.64)

For this case,

$$\displaystyle{ \langle Z\rangle \simeq \vert \mathcal{V}\vert \left (1 + \frac{\sigma ^{2}} {2\vert \mathcal{V}\vert ^{2}}\right )\;, }$$
(9.65)
$$\displaystyle{ \sigma _{Z} \simeq \sigma \left (1 - \frac{\sigma ^{2}} {8\vert \mathcal{V}\vert ^{2}}\right )\;, }$$
(9.66)

and

$$\displaystyle{ \sigma _{\phi } \simeq \frac{\sigma } {\vert \mathcal{V}\vert }\;. }$$
(9.67)

The quantities σ Z and σ ϕ for the full range of \(\vert \mathcal{V}\vert /\sigma\) are shown in Fig. 9.6. Hence, in the strong-signal case, the statistics of Z are approximately Gaussian (see Fig. 6.9), and 〈Z〉 approaches \(\vert \mathcal{V}\vert \). In this case, N samples of Z can be averaged, and the signal-to-noise ratio improves with \(\sqrt{ N}\). In the weak-signal case, the perturbation of the Rayleigh noise distribution by the signal is small, and as we shall discuss in Sect. 9.5, it is difficult to improve the signal-to-noise ratio by averaging beyond the coherence time of the system.

Fig. 9.6
figure 6

The values of σ Z σ and σ ϕ as a function of \(\vert \mathcal{V}\vert /\sigma\). Approximate expressions for \(\vert \mathcal{V}\vert /\sigma \ll 1\) are given in Eqs. (9.61) and (9.62) and for \(\vert \mathcal{V}\vert /\sigma \gg 1\) in Eqs. (9.66) and (9.67).

9.3.4 Probability of Error in the Signal Search

When starting a new session of VLBI observations with an ad hoc array, the first task in the processing is to search for fringes, i.e., fringe finding. This is necessary because of the uncertainties in the station clocks and their drift rates and means that the instrumental delay and fringe frequency must be found. This step is often unnecessary with a dedicated VLBI array, for which the values of fringe rate and delay are continuously updated from successive observations. A fringe search must be carried out on a large two-dimensional grid, as shown in Fig. 9.7. For example, consider an experiment in which Δ ν = 50 MHz at an observing frequency of 1011 Hz. The delay increments are equal to the sampling interval of 0.01 μs. An instrumental delay uncertainty of ± 1 μs requires a search of 200 delay intervals. If the coherent integration time is 200 s and the frequency standards are set only to a fractional accuracy of 10−11, then ± 1 Hz must be searched, which at an interval size of 0.005 Hz is 400 discrete frequencies. The total number of cells to be searched is 80,000. If there is no signal present, then p(Z) will be given by Eq. (9.53). The cumulative probability distribution (that is, the probability that Z is less than Z 0) in this case is the integral of Eq. (9.53) from zero to Z 0, or

$$\displaystyle{ P(Z_{0}) = 1 -\exp \left (-\frac{Z_{0}^{2}} {2\sigma ^{2}} \right )\;. }$$
(9.68)

The cumulative probability distribution for the maximum of n independent samples Z m = max \(\{Z_{1},Z_{2},\ldots,Z_{n}\}\) is

$$\displaystyle{ P(Z_{m}) = \left [1 -\exp \left (-\frac{Z_{m}^{2}} {2\sigma ^{2}} \right )\right ]^{n}\;. }$$
(9.69)

Thus, the probability of one or more samples exceeding Z m , which we call the probability of error, p e , is

$$\displaystyle{ p_{e} = 1 -\left [1 -\exp \left (-\frac{Z_{m}^{2}} {2\sigma ^{2}} \right )\right ]^{n}\;. }$$
(9.70)
Fig. 9.7
figure 7

Fringe amplitude as a function of residual fringe frequency (a ) and delay (b ). The one-dimensional plots are the peak fringe amplitude vs. delay and fringe frequency. The probability distribution of the noise in these plots is given by Eq. (9.71) and the bias level by Eq. (9.72).

This function is shown in Fig. 9.8. The probability distribution of Z m is obtained by differentiating Eq. (9.69),

$$\displaystyle{ p(Z_{m}) = \frac{nZ_{m}} {\sigma ^{2}} \exp \left (-\frac{Z_{m}^{2}} {2\sigma ^{2}} \right )\left [1 -\exp \left (-\frac{Z_{m}^{2}} {2\sigma ^{2}} \right )\right ]^{n-1}\;. }$$
(9.71)

For large n, this probability distribution is nearly Gaussian, with mean value and standard deviation given by

$$\displaystyle{ \langle Z_{m}\rangle \simeq \sigma \sqrt{2\ln n}\;, }$$
(9.72)
$$\displaystyle{ \sigma _{m} \simeq \frac{0.77\sigma } {\sqrt{\ln n}} \;. }$$
(9.73)

Examples of p(Z m ) for various values of n are shown in Fig. 9.9. It is frequently useful to reduce a two-dimensional function, such as the one shown in Fig. 9.7 of fringe amplitude vs. fringe frequency and delay, to a one-dimensional function by searching for the maximum value of the function over one variable. This search process introduces a bias, equal to 〈Z m 〉, into the one-dimensional function. This bias increases with the number of samples and obscures weak signals.

Fig. 9.8
figure 8

Probability that one or more samples of the fringe amplitude will exceed the value Z m σ in the absence of a signal, as given by Eq. (9.70). The curves are labeled by the number of samples measured.

Fig. 9.9
figure 9

Probability distribution of the maximum of n random variables that have Rayleigh distributions, as given Eq. (9.71).

We can also calculate the probability of misidentifying a signal. Suppose that we have measurements of fringe amplitude at two values of delay or fringe frequency with the signal present at one value. The probability that the amplitude in the channel with the signal (Z 1) is larger than the amplitude in the channel with only the noise (Z 2) is

$$\displaystyle{ p(Z_{1} > Z_{2}) =\int _{ 0}^{\infty }p(Z_{ 1})\left [\int _{0}^{Z_{1} }p(Z_{2})dZ_{2}\right ]dZ_{1}\;. }$$
(9.74)

p(Z 1) is given by Eq. (9.45), and p(Z 2) is given by Eq. (9.53). We can generalize this result for a search over n channels where the signal channel amplitude is Z s . The probability that Z s will exceed the values of Z in the other channels is, from Eqs. (9.68) and (9.74),

$$\displaystyle{ p(Z_{s} > Z_{1},\ldots,Z_{n}) =\int _{ 0}^{\infty }p(Z)\left [1 -\exp \left (-\frac{Z^{2}} {2\sigma ^{2}} \right )\right ]^{n-1}\ dZ\;, }$$
(9.75)

where p(Z) is given by Eq. (9.45). Thus, the probability of one or more samples exceeding the amplitude of the signal is

$$\displaystyle{ p'_{e} = 1 -\int _{0}^{\infty }p(Z)\left [1 -\exp \left (-\frac{Z^{2}} {2\sigma ^{2}} \right )\right ]^{n-1}\ dZ\;. }$$
(9.76)

p e is plotted in Fig. 9.10. For example, if the search is over 100 channels, a probability of misidentification of less than 0.1% requires \(\vert \mathcal{V}\vert /\sigma > 6.5\).

Fig. 9.10
figure 10

Probability that one or more samples of fringe amplitude among the samples with no signal will exceed the fringe amplitude of the sample with the signal, vs. the signal amplitude, \(\vert \mathcal{V}\vert \), as given in Eq. (9.76). The curves are labeled according to the total number of samples n. The asymptotic value of p e as \(\vert \mathcal{V}\vert /\sigma\) goes to zero is 1 − 1∕n.

9.3.5 Coherent and Incoherent Averaging

We wish to estimate the amplitude of a barely detectable signal. We examine a time series of correlator output values in which the phase, ϕ(t), represents the effects of receiver noise, fluctuations in the frequency standards, or fluctuations in the atmospheric path. An example of phase vs. time from a VLBI measurement is shown in Fig. 9.11. The correlator output is

$$\displaystyle{ r(t) = Z(t)e^{\,j\phi (t)}\;. }$$
(9.77)

How do we estimate \(\vert \mathcal{V}\vert \) when the time range of the data exceeds the coherence time? There are two useful procedures, the first in the spectral domain and the second in the time domain. Suppose that r(t) is sampled at intervals short with respect to the coherence time, τ c , thus generating a time series of samples r n . The discrete Fourier transform (see Appendix 8.4) of r n is

$$\displaystyle{ R_{k} =\sum _{ n=0}^{N-1}r_{ n}\,e^{-j2\pi kn/N}\;, }$$
(9.78)

where R k is the N-point discrete fringe rate spectrum ranging in frequency from − N∕2τ c to N∕2τ c . Hence, from Parseval’s theorem [Eq. (8.179)],

$$\displaystyle{ \sum _{n=0}^{N-1}\vert r_{ n}\vert ^{2} = \frac{1} {N}\sum _{k=0}^{N-1}\vert R_{ k}\vert ^{2}\;. }$$
(9.79)

Using Eq. (9.50), we can write an unbiased estimator of \(\vert \mathcal{V}\vert ^{2}\), valid for large N, as

$$\displaystyle{ \vert \mathcal{V}\vert _{e}^{2} = \left ( \frac{1} {N^{2}}\sum _{k=1}^{N-1}\vert R_{ k}\vert ^{2}\right ) - 2\sigma ^{2}\;. }$$
(9.80)

When the total span of the data exceeds the coherence time of the interferometer, the fringe rate spectrum becomes complicated, but Eq. (9.80) provides a prescription for gathering all of its frequency components into an unbiased estimate of \(\vert \mathcal{V}\vert ^{2}\). See Clark (1968) and Clark et al. (1968) for applications of this method.

Fig. 9.11
figure 11

Fringe phase vs. time from an observation of a strong source [the water vapor maser in W3 (OH)] on a three-baseline VLBI experiment at 22 GHz. Two of the stations, Haystack Observatory and the Naval Research Laboratory (Maryland Point Observatory), were equipped with hydrogen maser frequency standards, while the National Radio Astronomy Observatory used a rubidium (vapor frequency standard). The phase noise in the top plot is dominated by contributions from the receivers and the atmosphere, while the phase noise in the bottom two plots is dominated by the phase noise in the rubidium frequency standard. These data were obtained in 1971 with the Mark I VLBI system.

The second method for estimating \(\vert \mathcal{V}\vert ^{2}\), based on the time series, comes directly from Eq. (9.50),

$$\displaystyle{ \vert \mathcal{V}\vert _{e}^{2} = \left ( \frac{1} {N}\sum \limits _{i=1}^{N}Z_{ i}^{2}\right ) - 2\sigma ^{2}\;. }$$
(9.81)

Imaging or model analysis is usually based on estimates of \(\vert \mathcal{V}\vert \), not \(\vert \mathcal{V}\vert ^{2}\). To obtain an unbiased estimate of \(\vert \mathcal{V}\vert \), we first examine the properties of the quantity

$$\displaystyle{ \vert \mathcal{V}\vert _{b} = \left [ \frac{1} {N}\sum \limits _{i=1}^{N}Z_{ i}^{2}\right ]^{1/2}\;. }$$
(9.82)

Recall that

$$\displaystyle{ Z_{i}^{2} = (\vert \mathcal{V}\vert +\epsilon _{ x_{i}})^{2} +\epsilon _{ y_{i}}^{2}\;, }$$
(9.83)

where \(\epsilon _{x_{i}}\) and \(\epsilon _{y_{i}}\) are Gaussian random variables with zero mean and variance σ 2. Equation (9.82) becomes

$$\displaystyle{ \vert \mathcal{V}\vert _{b} = \vert \mathcal{V}\vert \left \{1 + \frac{1} {N}\sum \limits _{i=1}^{N}\left [\frac{2\epsilon _{x_{i}}} {\vert \mathcal{V}\vert } + \frac{\epsilon _{x_{i}}^{2} +\epsilon _{ y_{i}}^{2}} {\vert \mathcal{V}\vert ^{2}} \right ]\right \}^{1/2}\;. }$$
(9.84)

We assume that the terms in the brackets are ≪ 1 and then expand Eq. (9.84) to second order, which is necessary to retain all the second-order terms involving \(\epsilon _{x_{i}}\). Then the expectation of \(\vert \mathcal{V}\vert _{b}\) becomes

$$\displaystyle{ \vert \mathcal{V}\vert _{b} \simeq \vert \mathcal{V}\vert \left [1 + \frac{\sigma ^{2}} {\vert \mathcal{V}\vert ^{2}}\left (1 - \frac{1} {2N}\right )\right ]\;, }$$
(9.85)

which leads directly to an unbiased estimate of \(\vert \mathcal{V}\vert \) of

$$\displaystyle{ \vert \mathcal{V}\vert _{e} \simeq \left [ \frac{1} {N}\sum _{i=1}^{N}Z_{ i}^{2} -\sigma ^{2}\left (2 - \frac{1} {N}\right )\right ]^{1/2}\;. }$$
(9.86)

Equation (9.86) is accurate to < 5% for \(\mathcal{V}/\sigma > 2\) and N = 1, and \(\mathcal{V}/\sigma > 0.3\) and N = 100. This estimator has several interesting properties. For N ≫ 1, it leads to the result suggested by Eq. (9.81). However, for N = 1 and Z i  = Z, it leads to the result

$$\displaystyle{ \vert \mathcal{V}\vert _{e} = \left [Z^{2} -\sigma ^{2}\right ]^{1/2}\;. }$$
(9.87)

Equation (9.87) is used to determine the polarized flux from single measurements of Stokes Q and U [see Wardle and Kronberg (1974)]. For one measurement of Z, \(\vert \mathcal{V}\vert _{e}\) in Eq. (9.87) is a good approximation for the most likely value of \(\vert \mathcal{V}\vert \) given p(Z) defined in Eq. (9.45). See Johnson et al. (2015) for further discussion and applications.

From Eqs. (9.50), (9.51), and (9.81), we have \(\langle \vert \mathcal{V}\vert _{e}^{2}\rangle = \vert \mathcal{V}\vert ^{2}\) and \(\langle \vert \mathcal{V}\vert _{e}^{4}\rangle = \vert \mathcal{V}\vert ^{4} + 4\sigma ^{2}(\vert \mathcal{V}\vert ^{2} +\sigma ^{2})/N\), so that the signal-to-noise ratio is

$$\displaystyle{ \mathcal{R}_{\mathrm{sn}} = \frac{\langle \vert \mathcal{V}\vert _{e}^{2}\rangle } {\sqrt{\langle \vert \mathcal{V}\vert _{e }^{4 }\rangle -\langle \vert \mathcal{V}\vert _{e }^{2 }\rangle ^{2}}} = \frac{\sqrt{N}} {2\sigma ^{2}} \vert \mathcal{V}\vert ^{2} \frac{1} {\sqrt{1 + \vert \mathcal{V}\vert ^{2 } /\sigma ^{2}}}\;. }$$
(9.88)

\(\vert \mathcal{V}\vert /\sigma\) is equal to the signal-to-noise ratio at the output of a single-multiplier correlator, as given by Eqs. (6.49) and (6.50). For VLBI observations, the quantization efficiency described in Sect. 8.3, η Q , is replaced by the general loss factor η, described in Sect. 9.7, and from Eq. (6.64), we obtain \(\vert \mathcal{V}\vert /\sigma = (T_{\!A}\eta /T_{\!S})\sqrt{2\varDelta \nu \tau _{c}}\). Equation (9.88) then becomes

$$\displaystyle{ \mathcal{R}_{\mathrm{sn}} = \frac{T_{\!A}^{2}\eta ^{2}} {T_{S}^{2}} \sqrt{ \frac{\varDelta \nu ^{2 } \tau \tau _{c } } {(1 + 2T_{A}^{2}\eta ^{2}\varDelta \nu \tau _{c}/T_{S}^{2})}}\;, }$$
(9.89)

where τ = N τ c is the total integrating time. The two limiting cases of Eq. (9.89) are

$$\displaystyle\begin{array}{rcl} \mathcal{R}_{\mathrm{sn}} \simeq \frac{\eta } {\sqrt{2}} \frac{T_{\!A}} {T_{\!S}}\sqrt{\varDelta \nu \tau }\;,T_{\!A} \gg \frac{T_{\!S}} {\sqrt{2\varDelta \nu \tau _{c}}}\;,& &{}\end{array}$$
(9.90)
$$\displaystyle\begin{array}{rcl} \mathcal{R}_{\mathrm{sn}} \simeq \left (\frac{T_{\!A}\eta } {T_{\!S}}\right )^{2}\varDelta \nu \sqrt{\tau \tau _{ c}}\;,T_{\!A} \ll \frac{T_{\!S}} {\sqrt{2\varDelta \nu \tau _{c}}}\;.& &{}\end{array}$$
(9.91)

Note that in the strong-signal case, incoherent averaging is not needed. When incoherent averaging is used, the coherent averaging time should be as long as possible without decreasing the fringe amplitude. If we assume that \(\mathcal{R}_{\mathrm{sn}} = 4\) for detection, and recall that τ = N τ c , then for the weak-signal case, the minimum detectable antenna temperature can be found from Eq. (9.91) to be

$$\displaystyle{ (T_{\!A})_{\mathrm{min}} = \frac{2T_{\!S}} {\eta N^{1/4}\sqrt{\varDelta \nu \tau _{c}}}\;. }$$
(9.92)

Thus, because of the N 1∕4 dependence in Eq. (9.92), incoherent averaging is effective only if N is very large. If the coherence time is of the order of 1∕Δ ν, then the observing system reduces to a form of incoherent, or intensity, interferometer [see Sect. 17.1 and Clark (1968)]. For the weak-signal case, Eq. (9.91) then becomes

$$\displaystyle{ \mathcal{R}_{\mathrm{sn}} \simeq \left (\frac{T_{\!A}\eta } {T_{\!S}}\right )^{2}\sqrt{\varDelta \nu \tau }\;. }$$
(9.93)

9.4 Fringe Fitting for a Multielement Array

9.4.1 Global Fringe Fitting

In Sect. 9.3, we considered the problem of searching for fringes in the output from a single baseline. For VLBI, the basic requirement in fringe fitting is to determine the fringe phase (i.e., the phase of the visibility) and the rate of change of the fringe phase, with time and with frequency or delay. Fringe rate offsets result from errors in the positions of the source or antennas as well as antenna-related effects such as frequency offsets in local oscillators. Most of these can be specified as factors that relate to individual antennas, rather than to baselines. Because of this, data from all baselines can be used simultaneously to determine the fringe rate parameters. By simultaneously using all of the data from a multielement VLBI array, it is possible to detect fringes that are too weak to be seen on a single baseline. This is particularly important for VLBI arrays with similar antennas and receivers; with an ad hoc array, a possible alternative is to use the data from the two most sensitive antennas to find the fringes and let this result constrain the solutions for other baselines.

A method of analysis that is based on simultaneous use of the complete data set from a multiantenna observation was developed by Schwab and Cotton (1983) and is referred to as global fringe fitting. Let Z m n (t) be the correlator output, that is, the measured visibility, from the baseline for antennas m and n. The complex (voltage) gain for antenna n and the associated receiving system is g n (t k , ν ), where t k represents a (coherently) time-integrated sample of the correlator output for frequency channel ν . Thus,

$$\displaystyle{ Z_{mn}(t_{k},\nu _{\ell}) = g_{m}(t_{k},\nu _{\ell})g_{n}^{{\ast}}(t_{ k},\nu _{\ell})\mathcal{V}_{mn}(t_{k},\nu _{\ell}) +\epsilon _{mnk\ell}\;, }$$
(9.94)

where \(\mathcal{V}_{mn}\) is the true visibility for baseline m n, and ε m n k ℓ represents the observational errors that result principally from noise. It should be remembered that the noise terms are present in all the measurements, but beyond this point, they will usually be omitted from the equations. The gain terms can be written as

$$\displaystyle{ g_{n}(t_{k},\nu _{\ell}) = \vert g_{n}\vert e^{\,j\psi _{n}(t_{k},\nu _{\ell})}\;. }$$
(9.95)

To simplify the situation in Eq. (9.95), we assume that the gain terms and the amplitude of the source visibility are constant over the range of (t, ν) space covered by the observation. To first order, we can then write

$$\displaystyle\begin{array}{rcl} Z_{mn}(t_{k},\nu _{\ell})& =& \ \vert g_{m}\vert \vert g_{n}\vert \vert \mathcal{V}\vert \,\mathrm{exp}\left [\,j(\psi _{m} -\psi _{n})(t_{0},\nu _{0})\right ] \\ & \ \times & \mathrm{exp}\left [\,j\;\left (\frac{\partial (\psi _{m} -\psi _{n} +\phi _{mn})} {\partial t} \bigg\vert _{(t_{0},\nu _{0})}(t_{k} - t_{0})\right.\right. \\ & & \left.\left.\ + \frac{\partial (\psi _{m} -\psi _{n} +\phi _{mn})} {\partial \nu } \bigg\vert _{(t_{0},\nu _{0})}(\nu _{\ell} -\nu _{0})\right )\right ]\;,{}\end{array}$$
(9.96)

where ϕ m n is the phase of the (true) visibility \(\mathcal{V}_{mn}\). The rates of change of the phase of the measured visibility with respect to time and frequency are the fringe rate

$$\displaystyle{ r_{mn} = \frac{\partial (\psi _{m} -\psi _{n} +\phi _{mn})} {\partial t} \bigg\vert _{(t_{0},\nu _{0})}\;, }$$
(9.97)

and the delay

$$\displaystyle{ \tau _{mn} = \frac{\partial (\psi _{m} -\psi _{n} +\phi _{mn})} {\partial \nu } \bigg\vert _{(t_{0},\nu _{0})}\;, }$$
(9.98)

for the baseline m n at time and frequency (t 0, ν 0). In terms of these quantities, we can relate the measured visibility (correlator output) to the true visibility as follows:

$$\displaystyle{ \begin{array}{rcl} Z_{mn}(t_{k},\nu _{\ell})& =&\ \vert g_{m}\vert \vert g_{n}\vert \mathcal{V}_{mn}(t_{k},\nu _{\ell})\ \mathrm{exp}\left \{\,j\left [(\psi _{m} -\psi _{n})\vert _{t=t_{0}}\right.\right. \\ & &\Big.\Big.\ + (r_{m} - r_{n})(t_{k} - t_{0}) + (\tau _{m} -\tau _{n})(\nu _{\ell} -\nu _{0})\Big]\Big\}\;.\end{array} }$$
(9.99)

For each antenna, there are four unknown parameters: the modulus of the gain, the phase of the gain, the fringe rate, and the delay. Since all of the data are in the form of relative phases of two antennas, it is necessary to designate one antenna as the reference. For this antenna, the phase, fringe rate, and delay are usually taken to be zero, leaving 4n a − 3 parameters to be determined. However, it is possible to simplify further and consider only the phase terms in the fringe fitting. The amplitudes of the antenna gains are subsequently calibrated separately. The number of parameters to be determined is thereby reduced to 3(n a − 1). Then to obtain the global fringe solution, the source visibility \(\mathcal{V}_{mn}\) is represented by a model of the source, and a least-mean-squares fit of the parameters in Eq. (9.99) to the visibility measurements is made. For details on a method for the least-mean-squares solution, see Schwab and Cotton (1983). The source model, which is a “first guess” of the true structure, could in some cases be as simple as a point source.

Another method of using the data for several baselines simultaneously in fringe fitting is an extension of the method described earlier for single baselines. The measured visibility data are required to be specified in terms of fringe frequency and delay, which can be obtained, for example, by a time-to-frequency Fourier transformation of the data from a lag correlator. Then for each antenna pair, there is a matrix of values of the interferometer response at incremental steps in the delay and fringe rate. The maximum amplitude indicates the solution for delay and fringe rate for the corresponding baseline, as illustrated in Fig. 9.7. However, the method can be extended to include the responses from a number of baselines by using the closure phase principle, which is discussed in more detail in Sect. 10.3. Because we are considering fringe fitting in phase only, the measured data are represented by ϕ m n . Since ψ m n , the instrumental phase for baseline m n, is equal to the difference between the measured and true visibility phases, we can write

$$\displaystyle{ \psi _{mn} =\psi _{m} -\psi _{n} =\tilde{\phi } _{mn} -\phi _{mn}\;, }$$
(9.100)

where the ψ terms represent the instrumental phases, the ϕ terms represent the visibility phases, and the tilde ( ̃) indicates measured visibility phases. Now consider including a third antenna, designated p. For this combination, we can write

$$\displaystyle{ \psi _{mpn} =\psi _{mp} +\psi _{pn} = (\psi _{m} -\psi _{p}) + (\psi _{p} -\psi _{n}) =\psi _{m} -\psi _{n}\;. }$$
(9.101)

Thus, ψ m p n provides another measured value of ψ m n , equal to

$$\displaystyle{ \psi _{mp} +\psi _{pn} = (\tilde{\phi }_{mp} -\phi _{mp}) + (\tilde{\phi }_{pn} -\phi _{pn})\;. }$$
(9.102)

Similarly, for four antennas

$$\displaystyle{ \psi _{mpqn} =\psi _{m} -\psi _{n} = (\tilde{\phi }_{mp} +\tilde{\phi } _{pq} +\tilde{\phi } _{qn}) - (\phi _{mp} +\phi _{pq} +\phi _{qn})\;. }$$
(9.103)

Thus, estimated values of ψ m n can be obtained from the measurements from loops of antenna pairs, starting with antenna m and ending with antenna n. Combinations of more than three baselines (four antennas) can be expressed as combinations of smaller numbers of antennas, and the noise in such larger combinations is not independent. Loops of three and four antennas provide additional information that contributes to the sensitivity and accuracy of the fringe fitting for antennas m and n. Note, however, that the model visibilities are also required.

Of the two techniques, the least-mean-squares fitting is better with respect to uniform combination of the data, but it requires a good starting estimate if it is to converge efficiently. Schwab and Cotton (1983) used the second of the two methods to provide a starting point for the full least-mean-squares solution. This procedure has subsequently become the basis of standard reduction programs for VLBI data (Walker 1989a,b).

Although global fringe fitting provides sensitivity superior to that of baseline-based fitting, in practice, some experience is needed to determine when use of the global method is appropriate. If the source under study has complicated structure, with large variations in the visibility amplitude, it will probably not be well represented by the model visibility required in the global fitting method. In such a case, it may be better to start with a smaller number of antennas in the fringe fitting or, if the source is sufficiently strong, to consider baselines separately. On the other hand, if the source contains a strong unresolved component, it may be adequate to consider smaller groups of antennas separately and thus reduce the overall computing load.

9.4.2 Relative Performance of Fringe Detection Methods

In the regime in which the phase noise limits the sensitivity, careful investigation of detection techniques is warranted. The most important of these have been examined by Rogers et al. (1995) to determine their relative performance. We assume in all cases that the visibility data from the correlator outputs have been averaged for a time equal to the coherence time, τ c , discussed earlier. We have seen in Eq. (9.92) that incoherent averaging of N time segments of data reduces the level at which a signal is detectable by an amount proportional to N −1∕4. Rogers et al. show that for a detection threshold for which the probability of a false detection is < 0. 01% in a search of 106 values, the threshold of detection is lower than that without incoherent averaging (in effect, N = 1) by a factor 0. 53N −1∕4. This result is accurate only for large N, and they find empirically that for smaller N, the detection threshold decreases in proportion to N −0. 36; that is, the improvement with increasing N is greater when N is small. Table 9.2 includes the improvement factor 0. 53N −1∕4, together with other results that are discussed below. The fourth column of Table 9.2 gives numerical examples of relative sensitivity for N = 200 time segments and n a  = 10 antennas. Note that for lines 1–5 of Table 9.2, the criterion for detection is a probability of error of less than 1% in a search of 106 values of delay and fringe rate for each of n a − 1 elements of the array, the values for the reference antenna taken to be zero. For line 6, the search spans only the two dimensions of right ascension and declination.

Table 9.2 Relative thresholds for various detection methodsa

9.4.3 Triple Product, or Bispectrum

Another form of the output of a multielement array that can be considered is the triple product, or bispectrum, which is the product of the complex outputs for three baselines that form a triangle. The triple product is given by the product of measured visibilities

$$\displaystyle{ P_{3} = \vert Z_{12}\vert \vert Z_{23}\vert \vert Z_{31}\vert e^{\,j(\tilde{\phi }_{12}+\tilde{\phi }_{23}+\tilde{\phi }_{31})} = \vert Z_{ 12}\vert \vert Z_{23}\vert \vert Z_{31}\vert e^{\,j\phi _{c} }\;, }$$
(9.104)

where ϕ c represents the closure phase (Sect. 10.3), which is zero if the source is unresolved. We assume here that the amplitude of the measured visibility, Z, is calibrated separately, so that the moduli of the gain factors g m and g n in Eq. (9.94) are unity. Each of the measured visibility terms includes noise of power 2σ 2, that is, the noise power in the output of a complex correlator. For the weak-signal case, the noise determines the variance of the triple product, which is

$$\displaystyle{ \langle \vert P_{3}\vert ^{2}\rangle =\langle \vert Z_{ 12}\vert ^{2}\vert Z_{ 23}\vert ^{2}\vert Z_{ 31}\vert ^{2}\rangle = 8\sigma ^{6}\;. }$$
(9.105)

For a point source, the signal is real and is equal to \(\langle (\mathcal{R}eP_{3})^{2}\rangle =\langle \vert P_{3}\vert ^{2}\rangle /2\), where \(\mathcal{R}e\) indicates the real part. The ratio of this triple product signal term to the noise in the real output of the correlator is \(\mathcal{V}^{3}/2\sigma ^{3}\). Rogers et al. (1995) also give an expression for the signal-to-noise ratio that is not restricted to the weak-signal case, and Kulkarni (1989) gives a general expression in a detailed analysis of the subject.

Now consider the incoherent average of N values of the triple product for three antennas, each of which represents an average of the correlator output over the coherence interval, τ c . We represent this average of triple products by

$$\displaystyle{ \overline{P}_{3} = \frac{1} {N}\sum _{N}\vert Z_{12}\vert \vert Z_{23}\vert \vert Z_{31}\vert e^{\,j\phi _{c} }\;. }$$
(9.106)

If the signal amplitudes are equal, the expectation of the real part of \(\overline{P}_{3}\) is

$$\displaystyle{ \langle \mathcal{R}e\overline{P}_{3}\rangle = \mathcal{V}^{3}\;, }$$
(9.107)

and the second moment of \(\mathcal{R}e\overline{P}_{3}\) is

$$\displaystyle{ \langle (\mathcal{R}e\overline{P}_{3})^{2}\rangle = \frac{1} {N}\langle \vert P_{3}\vert ^{2}\rangle \langle \cos ^{2}\phi _{ c}\rangle \;. }$$
(9.108)

In the weak-signal case, in which the value of 〈 | P 3 | 2〉 results mainly from noise, the expectation of the second moment is, from Eq. (9.105), 4σ 6N. The signal-to-noise ratio is equal to the expectation of \(\overline{P}_{3}\) divided by the square root of the expectation of the second moment,

$$\displaystyle{ \mathcal{R}_{\mathrm{sn}} = \frac{\sqrt{N}\mathcal{V}^{3}} {2\sigma ^{3}} \;, }$$
(9.109)

from which

$$\displaystyle{ \mathcal{V} = (2\mathcal{R}_{\mathrm{sn}})^{1/3}\sigma N^{-1/6}\;. }$$
(9.110)

Line 3 of Table 9.2 gives the signal strength for a value of \(\mathcal{R}_{\mathrm{sn}}\) that allows detection at a level corresponding to the specified error criterion.

9.4.4 Fringe Searching with a Multielement Array

With an array of n a VLBI antennas, the amount of information gathered in a given time is greater than that with a single antenna pair by a factor n a (n a − 1)∕2. One might thus expect that the array would offer an increase in sensitivity ≃ [n a (n a − 1)∕2]1∕2. However, the larger number of antennas also introduces a very large increase in the parameter space to be searched. Thus, the probability of encountering high noise amplitudes within this parameter space is correspondingly greater. It is therefore necessary to increase the signal level used as a detection threshold in order to avoid increasing the probability of false detection.

Consider a two-element array in which the number of data points to be searched in the parameter space (frequency × delay) is n d . If a third antenna is then introduced, and correlation is measured for all baselines, the number of data points to be searched becomes n d 2. For n a antennas, it becomes \(n_{d}^{(n_{a}-1)}\). The probability distribution of the maximum of n Rayleigh-distributed values of the signal plus noise, Z m , is given in Eq. (9.71) and for large n has a mean value of σ(2 ln n)1∕2; see Eq. (9.72). Thus, for a given probability of occurrence, increasing the number of points to be searched from n d to \(n_{d}^{(n_{a}-1)}\) increases the level Z m from σ(2 ln n d )1∕2 to σ[2 (n a − 1)ln n d ]1∕2; that is, the probability of finding a level (n a − 1)1∕2 Z m in a search of \(n_{d}^{(n_{a}-1)}\) points is the same as that of finding a level Z m in a search of n d points. By increasing the number of antennas from 2 to n a , the overall rms uncertainty in the signal level is reduced by a factor [n a (n a − 1)∕2]1∕2, but since the detection threshold has increased by (n a − 1)1∕2, the effective gain in sensitivity for detection of sources is increased by only (n a ∕2)1∕2. Rogers (1991) and Rogers et al. (1995) consider other factors in deriving this result and show that the sensitivity increase (n a ∕2)1∕2 should be multiplied by a factor that lies between 0.94 and 1. This factor is not included in Table 9.2.

9.4.5 Multielement Array with Incoherent Averaging

In Table 9.2, the last two lines are concerned with incoherent averaging of data taken with a multielement array. The method on line 5 involves data that have been averaged over the coherence time and subsequently averaged incoherently before the application of a global fringe search. The relative threshold value is the product of the threshold on line 4 for a multielement global search with that on line 2 for incoherent averaging over a single baseline. The method in line 6 involves incoherent averaging over both time segments (equal to the coherence time) and baselines. The relative threshold is obtained from that in line 2 by increasing the number of data from N (the number of time segments per baseline) to N multiplied by the number of baselines.

9.5 Phase Stability and Atomic Frequency Standards

Precision oscillators have been steadily improved since the 1920s, when the invention of the crystal-controlled (quartz) oscillator had immediate application to the problem of precise timekeeping. In the early 1950s, cesium-beam clocks allowed better timekeeping than could be obtained from astronomical observations. This development led to an atomic definition of time that differs from the astronomical one, and to the establishment of the definition of the second of time based on a particular transition frequency of cesium.

The mathematical theory of the interpretation of measurements of oscillator phase was systematized by an IEEE committee (Barnes et al. 1971). This paper helped standardize the approach to handling low-frequency divergence in the noise of oscillators. The physical theory of noise in oscillators was treated by Edson (1960). In this section, we develop relevant aspects of the theory and describe the operation of atomic frequency standards, with particular emphasis on the hydrogen maser. The theory and analysis of phase fluctuations are discussed in more detail by Blair (1974) and Rutman (1978).

9.5.1 Analysis of Phase Fluctuations

The desired signal from an oscillator is a pure sine wave:

$$\displaystyle{ V (t) = V _{0}\cos 2\pi \nu _{0}t\;. }$$
(9.111)

This is unobtainable since all devices have some phase noise. A more realistic model is given by

$$\displaystyle{ V (t) = V _{0}\cos \left [2\pi \nu _{0}t +\phi (t)\right ]\;, }$$
(9.112)

where ϕ(t) is a random process characterizing the phase departure from a pure sine wave. We ignore amplitude fluctuations since they do not directly affect performance in VLBI applications. The instantaneous frequency ν(t) is the derivative of the argument of Eq. (9.112) divided by 2π, that is,

$$\displaystyle{ \nu (t) =\nu _{0} +\delta \nu (t)\;, }$$
(9.113)

where

$$\displaystyle{ \delta \nu (t) = \frac{1} {2\pi } \frac{d\phi (t)} {dt} \;. }$$
(9.114)

The instantaneous fractional frequency deviation is defined as

$$\displaystyle{ y(t) = \frac{\delta \nu (t)} {\nu _{0}} = \frac{1} {2\pi \nu _{0}} \frac{d\phi (t)} {dt} \;. }$$
(9.115)

This definition allows the performance of oscillators at different frequencies to be compared. We assume that the random processes ϕ(t) and y(t) are statistically stationary, so that correlation functions can be defined. This assumption is not always valid and can cause difficulty (Rutman 1978). The autocorrelation function of y(t) is

$$\displaystyle{ R_{y}(\tau ) =\langle y(t)\,y(t+\tau )\rangle \;. }$$
(9.116)

R y (τ) is a real and even function, so \(\mathcal{S}'_{y}(\,f)\), the power spectrum of y(t), is a real and even function of frequency f. In order to prevent confusion between ν(t) and its frequency components, we use the symbol f for the frequency variable in the following spectral analysis. Following the somewhat nonstandard convention that is used in most of the literature on phase stability (Barnes et al. 1971), we replace the double-sided spectrum \(\mathcal{S}'_{y}(\,f)\) with a single-sided spectrum \(\mathcal{S}_{y}(\,f)\), where \(\mathcal{S}_{y}(\,f) = 2\mathcal{S}'_{y}(\,f)\) for f ≥ 0, and \(\mathcal{S}_{y}(\,f) = 0\) for f < 0. Since \(\mathcal{S}'_{y}(\,f)\) is even, no information is lost in this procedure. Thus, the Fourier transform relation \(R_{y}(\tau )\longleftrightarrow \mathcal{S}'_{y}(\,f)\), can also be written as

$$\displaystyle{ \begin{array}{rcl} \mathcal{S}_{y}(\,f)& =&4\!\int _{0}^{\infty }R_{y}(\tau )\cos (2\pi f\tau )\,d\tau \;, \\ R_{y}(\tau )& =&\int _{0}^{\infty }\mathcal{S}_{y}(\,f)\cos (2\pi f\tau )\,df\;. \end{array} }$$
(9.117)

Similarly, the autocorrelation function of the phase is

$$\displaystyle{ R_{\phi }(\tau ) =\langle \phi (t)\,\phi (t+\tau )\rangle \;. }$$
(9.118)

\(\mathcal{S}_{\phi }(\,f)\), the power spectrum of ϕ, and R ϕ (τ) are related by a Fourier transform. From the derivative property of Fourier transforms, the relationship between \(\mathcal{S}_{y}(\,f)\) and \(\mathcal{S}_{\phi }(\,f)\) can be shown to be

$$\displaystyle{ \mathcal{S}_{y}(\,f) = \frac{f^{2}} {\nu _{0}^{2}} \mathcal{S}_{\phi }(\,f)\;. }$$
(9.119)

\(\mathcal{S}_{y}(\,f)\) and \(\mathcal{S}_{\phi }(\,f)\) serve as primary measures of frequency stability. They both have the dimensions of Hz−1. Another commonly used specification of oscillator performance is \(\mathcal{L}(\,f)\), which is defined as the power in 1-Hz bandwidth at frequency f in one sideband of a double-sided spectrum, expressed as a fraction of the total power of the oscillator. When the phase deviation is small compared with one radian, \(\mathcal{L}(\,f) \simeq \mathcal{S}_{\phi }(\,f)/2\).

A second approach to frequency stability is based on time-domain measurements. The average fractional frequency deviation is

$$\displaystyle{ \overline{y}_{k} = \frac{1} {\tau } \int _{t_{k}}^{t_{k}+\tau }\!y(t)\,dt\;, }$$
(9.120)

which, from Eq. (9.115), becomes

$$\displaystyle{ \overline{y}_{k} = \frac{\phi (t_{k}+\tau ) -\phi (t_{k})} {2\pi \nu _{0}\tau } \;, }$$
(9.121)

where the measurements of \(\overline{y}_{k}\) are made with a repetition interval T(T ≥ τ) such that t k+1 = t k + T (see Fig. 9.12a). Measurements of \(\overline{y}_{k}\) are directly obtainable with conventional frequency counters. The measure of frequency stability is the sample variance of \(\overline{y}_{k}\), given by

$$\displaystyle{ \langle \sigma _{y}^{2}(N,T,\tau )\rangle = \frac{1} {N - 1}\left \langle \sum \limits _{n=1}^{N}\left (\overline{y}_{ n} - \frac{1} {N}\sum \limits _{k=1}^{N}\overline{y}_{ k}\right )^{2}\right \rangle \;, }$$
(9.122)
Fig. 9.12
figure 12

(a ) Time intervals involved in the measurement of \(\overline{y}_{k}\) as defined in Eq. (9.121). (b ) Plot of a series of phase samples vs. time. The Allan variance, defined in Eq. (9.123), is the average of the square of the deviation, (δ ϕ)2, of each sample from the mean of its two adjacent samples.

where N is the number of samples in a single estimate of σ y 2. In the limit as N → , the quantity presented above is the true variance, which we represent as I 2(τ). However, in many cases, Eq. (9.122) does not converge because of the low-frequency behavior of \(\mathcal{S}_{y}(\,f)\), and I 2(τ) is then not defined. To avoid some of the convergence problems, a particular case of Eq. (9.122), the two-sample or Allan variance, σ y 2(τ), has gained wide acceptance (Allan 1966). The Allan variance, for which T = τ (no dead time between measurements) and N = 2, is defined as follows:

$$\displaystyle{ \sigma _{y}^{2}(\tau ) = \frac{\langle (\overline{y}_{k+1} -\overline{y}_{k})^{2}\rangle } {2} \;, }$$
(9.123)

or, from Eq. (9.121):

$$\displaystyle{ \sigma _{y}^{2}(\tau ) = \frac{\langle [\phi (t + 2\tau ) - 2\phi (t+\tau ) +\phi (t)]^{2}\rangle } {8\pi ^{2}\nu _{0}^{2}\tau ^{2}} \;. }$$
(9.124)

The procedure for estimating the Allan variance can be understood as follows. Take a series of phase measurements at interval T, as shown in Fig. 9.12b. For each set of three independent points, draw a straight line between the outer two and determine the deviation of the center point from the line. With m samples of \(\overline{y}\), the average of the squared deviations divided by (2π ν 0 τ)2 is an estimate of σ y 2(τ), denoted σ y e 2(τ), where

$$\displaystyle{ \sigma _{ye}^{2}(\tau ) = \frac{1} {2(m - 1)}\sum \limits _{k=1}^{m-1}(\overline{y}_{ k+1} -\overline{y}_{k})^{2}\;. }$$
(9.125)

The accuracy of this estimate is (Lesage and Audoin 1979)

$$\displaystyle{ \sigma (\sigma _{ye}) \simeq \frac{K} {\sqrt{m}}\sigma _{y}\;, }$$
(9.126)

where K is a constant of order unity, whose exact value depends on the power spectrum of y.

We can now relate the true variance and the Allan variance to the power spectrum of y or ϕ. From Eq. (9.121), the true variance is \(I^{2}(\tau ) =\langle \overline{y}_{k}^{2}\rangle\), given by

$$\displaystyle{ I^{2}(\tau ) = \frac{1} {(2\pi \nu _{0}\tau )^{2}}\left [\langle \phi ^{2}(t+\tau )\rangle - 2\langle \phi (t+\tau )\,\phi (t)\rangle +\langle \phi ^{2}(t)\rangle \right ]\;, }$$
(9.127)

which, from Eq. (9.118), is

$$\displaystyle{ I^{2}(\tau ) = \frac{1} {2(\pi \nu _{0}\tau )^{2}}\left [R_{\phi }(0) - R_{\phi }(\tau )\right ]\;. }$$
(9.128)

Then, since R ϕ (τ) is the Fourier transform of \(\mathcal{S}_{\phi }(\,f)\), by using Eq. (9.119), we obtain from Eq. (9.128) the result

$$\displaystyle{ I^{2}(\tau ) =\int _{ 0}^{\infty }\mathcal{S}_{ y}(\,f)\left (\frac{\sin \pi f\tau } {\pi f\tau }\right )^{2}df\;. }$$
(9.129)

Similarly, from Eq. (9.124), we obtain

$$\displaystyle{ \sigma _{y}^{2}(\tau ) = \frac{1} {(2\pi \nu _{0}\tau )^{2}}\left [3R_{\phi }(0) - 4R_{\phi }(\tau ) + R_{\phi }(2\tau )\right ]\:, }$$
(9.130)

and therefore,

$$\displaystyle{ \sigma _{y}^{2}(\tau ) = 2\int _{ 0}^{\infty }\mathcal{S}_{ y}(\,f)\left [ \frac{\sin ^{4}\pi f\tau } {(\pi f\tau )^{2}}\right ]df\;. }$$
(9.131)

I 2(τ) and σ y 2(τ) are dimensionless quantities, measured in rad2, but we can think of them as the power obtained after filtering y(t) with two different frequency responses, H I 2( f) and H A 2( f), respectively. These are

$$\displaystyle{ H_{I}^{2}(\,f) = \left (\frac{\sin \pi f\tau } {\pi f\tau }\right )^{2} }$$
(9.132)

and

$$\displaystyle{ H_{A}^{2}(\,f) = \frac{2\sin ^{4}\pi f\tau } {(\pi f\tau )^{2}}\;. }$$
(9.133)

The functions H I 2( f) and H A 2( f) and the corresponding impulse responses h I (t) and h A (t) are shown in Fig. 9.13. Note that I 2(τ) can be estimated from a series of measurements \(\overline{y}_{k}\) as the average of the square of \(h_{I}(t_{k}) {\ast}\overline{y}_{k}\), where the asterisk indicates convolution. Similarly, σ y 2(τ) can be estimated as the average of the square of \(h_{A}(t_{k}) {\ast}\overline{y}_{k}\). Other transfer functions could be chosen. In time-domain measurements, additional filtering with high- and low-frequency cutoffs can be performed. For example, removing a long-term trend from the frequency data is a form of highpass filtering. Clearly, measurements of \(\mathcal{S}_{y}(\,f)\) are preferable to those of σ y 2(τ), because σ y 2 can be calculated from \(\mathcal{S}_{y}\) using Eq. (9.131), but \(\mathcal{S}_{y}\) cannot be calculated from σ y 2. However, in many cases of interest, as in the power-law spectra discussed below, the form of σ y 2 is indicative of the behavior of \(\mathcal{S}_{y}\). Traditionally, it has been easier to make time-domain measurements, and most published results are given in terms of the Allan variance σ y 2.

Fig. 9.13
figure 13

(top ) The impulse function h I (t) and the square of its Fourier transform | H I ( f) | 2, given by Eq. (9.132), which is used to relate the power spectrum \(\mathcal{S}_{y}(\,f)\) to the true variance I 2(τ), as defined in Eq. (9.129). (bottom ) The impulse response h A (t) and the square of its Fourier transform, | H A ( f) | 2, given by Eq. (9.133), which is used to relate the power spectrum \(\mathcal{S}_{y}(\,f)\) to the Allan variance σ y 2(τ), as defined in Eq. (9.131). Note that the sensitivity of the Allan variance decreases rapidly with decreasing frequency for f < 0. 3∕τ.

The effect of local oscillator noise on the measured coherence of signals received at two antennas is given by Eq. (7.34) in terms of the rms deviation of the phase of the oscillator at one antenna relative to that at the other. For VLBI, this rms deviation is equal to the square root of the sum of the true variances of the local oscillators at the two antennas. In the case of a connected-element array, low-frequency components of the phase noise of the master oscillator cause similar effects in the local oscillator phase at each antenna, and therefore their contributions to the relative phase at different antennas tend to cancel. For exact cancellation, the time delay in the path of the reference signal from the master oscillator to each antenna, plus the time delay of the IF signal from the corresponding mixer to the correlator input (including the variable delay that compensates for the geometric delay), should be equal for each antenna. It is generally impractical to preserve this equality. The bandwidths of phase-locked loops in the local oscillator signals at the antennas can also limit the frequency range over which phase noise in the master oscillator is canceled. In practice, cancellation of phase noise resulting from the master oscillator should generally be effective up to a frequency f in the range of a few hundred hertz to a few hundred kilohertz, depending on the parameters of the particular system.

Laboratory measurements show that \(\mathcal{S}_{y}(\,f)\) is often a combination of power-law components. A useful model, shown in Fig. 9.14, is

Fig. 9.14
figure 14

(a ) The idealized power spectrum \(\mathcal{S}_{y}(\,f)\) of the fractional frequency deviation y(t) [see Eq. (9.134)]. The various spectral regimes are marked by Roman numerals, and the power-law coefficients are given in parentheses. The regimes are I, white-phase noise; II, flicker-phase noise; III, white-frequency noise; IV, flicker-frequency noise; and V, random-walk-of-frequency noise. (b ) Two-point rms deviation, or Allan standard deviation, vs. the time between samples. The spectral regimes are marked by the Roman numerals, and the power-law coefficients are given in parentheses.

$$\displaystyle{ \mathcal{S}_{y}(\,f) =\sum \limits _{ \alpha =-2}^{2}h_{\alpha }f^{\alpha }\;,\qquad 0 < f < f_{ h}\;, }$$
(9.134)

where α is a power-law exponent with integer values between –2 and 2, and f h is the cutoff frequency of a lowpass filter. An equation similar to Eq. (9.134) can be written for \(\mathcal{S}_{\phi }(\,f)\) using Eq. (9.119). Each term in Eq. (9.134) or the equivalent equation for \(\mathcal{S}_{\phi }(\,f)\) has a name based on traditional terminology (see Table 9.3). Noise with a power-law dependence f0, independent of frequency, is called “white-phase noise”; f −1 is called “flicker-phase noise,” or colloquially, “one-over-f noise”; and f −2 is called “random-walk noise.” There are well-known origins for some of these processes, which we discuss briefly [see also Vessot (1976)]. The frequency dependence given in parentheses below is for \(\mathcal{S}_{y}\).

Table 9.3 Characteristics of noise in oscillatorsa
  1. 1.

    White-phase noise ( f 2) is usually due to additive noise outside the oscillator, for example, noise introduced by amplifiers. This process dominates at large values of f, corresponding to short averaging times.

  2. 2.

    Flicker-phase noise ( f 1) is seen in transistors and may be due to diffusion processes across junctions.

  3. 3.

    White-frequency or random-walk-of-phase noise ( f  0) is due to internal additive noise within the oscillator, such as the thermal noise inside the resonant cavity. Shot noise also has this spectral dependence.

  4. 4.

    Flicker-frequency noise ( f −1) and random-walk-of-frequency noise ( f −2) are the processes that limit the long-term stability of oscillators. They are due to random changes in temperature, pressure, and magnetic field in the oscillator environment. This noise is associated with long-term drift. There is a large body of literature on flicker-frequency noise, which is encountered in many situations [see Keshner (1982) for a general discussion, Dutta and Horn (1981) for applications in solid-state physics, and Press (1978) for applications in astrophysics].

The variances I 2(τ) and σ y 2(τ) can be calculated for the various types of noise described above. For α = 1 and 2, the variances converge only if a high-frequency cutoff f h is specified. With this restriction, σ y 2 converges for all cases. I 2(τ) converges only for α ≥ 0. These functions are listed in Table 9.3. Except for the logarithmic dependence in flicker-phase noise, each noise component maps into a component of Allan variance of the form τ μ. From Table 9.3, we can write the total Allan variance as

$$\displaystyle{ \sigma _{y}^{2}(\tau ) = [K_{ 2}^{2} + K_{ 1}^{2}\ln (2\pi f_{ h}\tau )]\tau ^{-2} + K_{ 0}^{2}\tau ^{-1} + K_{ -1}^{2} + K_{ -2}^{2}\tau \;, }$$
(9.135)

where the K values are constants. The subscripts correspond to the subscripts of h (see Table 9.3). White-phase and flicker-phase noise both result in μ ≃ −2, but these two processes can be distinguished by varying f h . Note that for white-phase and white-frequency noise, the following relations hold [see Eqs. (9.129) and (9.131)]:

$$\displaystyle\begin{array}{rcl} \sigma _{y}^{2}(\tau ) = \frac{3} {2}I^{2}(\tau )\;,\qquad \qquad \alpha = 2\;,& &{}\end{array}$$
(9.136)
$$\displaystyle\begin{array}{rcl} \sigma _{y}^{2}(\tau ) = I^{2}(\tau )\;,\qquad \qquad \alpha = 0\;.& &{}\end{array}$$
(9.137)

In general, when I 2(τ) is defined, we see from Eqs. (9.128) and (9.130) that

$$\displaystyle{ \sigma _{y}^{2}(\tau ) = 2[I^{2}(\tau ) - I^{2}(2\tau )]\;. }$$
(9.138)

9.5.2 Oscillator Coherence Time

A quantity of special interest in VLBI is the coherence time. The approximate coherence time is that time τ c for which the rms phase error is 1 radian:

$$\displaystyle{ 2\pi \nu _{0}\tau _{c}\sigma _{y}(\tau _{c}) \simeq 1\;. }$$
(9.139)

Rogers and Moran (1981) calculated a more exact expression for the coherence time that they defined in terms of the coherence function

$$\displaystyle{ C(T) = \left \vert \frac{1} {T}\int _{0}^{T}e^{\,j\phi (t)}dt\,\right \vert \;, }$$
(9.140)

where ϕ(t) is the component of fringe phase of instrumental origin, and T is an arbitrary integration time. ϕ(t) includes effects that cause the fringe phase to wander, such as atmospheric irregularities and noise in frequency standards. The rms value of C(T) is a monotonically decreasing function of time with the range 1–0. The coherence time is defined as the value of T for which 〈C 2(T)〉 drops to some specified value, say, 0.5. The mean-squared value of C is

$$\displaystyle{ \langle C^{2}(T)\rangle = \frac{1} {T^{2}}\int _{0}^{T}\int _{ 0}^{T}\left \langle \exp \left \{j\left [\phi (t) -\phi (t')\right ]\right \}\right \rangle dt\,dt'\;. }$$
(9.141)

If ϕ is a Gaussian random variable, then

$$\displaystyle{ \langle C^{2}(T)\rangle = \frac{1} {T^{2}}\int _{0}^{T}\int _{ 0}^{T}\exp \left [-\frac{\sigma ^{2}(t,t')} {2} \right ]dt\,dt'\;, }$$
(9.142)

where σ 2(t, t′) is the variance 〈[ϕ(t) −ϕ(t′)]2〉, which we assume depends only on τ = t′ − t. Then from Eq. (9.118),

$$\displaystyle\begin{array}{rcl} \sigma ^{2}(t,t')& =& \,\sigma ^{2}(\tau ) \\ & =& \,\langle [\phi (t) -\phi (t')]^{2}\rangle = 2[R_{\phi }(0) - R_{\phi }(\tau )]\;.{}\end{array}$$
(9.143)

Note that σ 2(τ) is the structure function of phase and is related to I 2(τ) by Eq. (9.128):

$$\displaystyle{ \sigma ^{2}(\tau ) = 4\pi ^{2}\tau ^{2}\nu _{ 0}^{2}I^{2}(\tau )\;. }$$
(9.144)

The integral in Eq. (9.142) can be simplified by noting that the integrand is constant along diagonal lines in (t, t′) space for which t′ − t = τ. These lines have length \(\sqrt{2}(T-\tau )\) so that

$$\displaystyle{ \langle C^{2}(T)\rangle = \frac{2} {T}\int _{0}^{T}\left (1 - \frac{\tau } {T}\right )\exp \left [-\frac{\sigma ^{2}(\tau )} {2} \right ]d\tau \;. }$$
(9.145)

Thus, from Eqs. (9.129) and (9.144),

$$\displaystyle{ \langle C^{2}(T)\rangle = \frac{2} {T}\int _{0}^{T}\left (1 - \frac{\tau } {T}\right )\exp \left [-2(\pi \nu _{0}\tau )^{2}\int _{ 0}^{\infty }\mathcal{S}_{ y}(\,f)H_{I}^{2}(\,f)df\right ]d\tau \;, }$$
(9.146)

where H I 2( f) is defined in Eq. (9.132). Since \(\mathcal{S}_{y}(\,f)\) is often not available, it is useful to relate 〈C 2(T)〉 to σ y 2(τ). We can solve Eq. (9.138) for I 2(τ) by series expansion, obtaining

$$\displaystyle{ 2I^{2}(\tau ) =\sigma _{ y}^{2}(\tau ) +\sigma _{ y}^{2}(2\tau ) +\sigma _{ y}^{2}(4\tau ) +\sigma _{ y}^{2}(8\tau ) + \cdots \;, }$$
(9.147)

provided that the series converges. Therefore, from Eqs. (9.144), (9.145), and (9.147),

$$\displaystyle{ \langle C^{2}(T)\rangle = \frac{2} {T}\int _{0}^{T}\left (1 - \frac{\tau } {T}\right )\exp \left \{-\pi ^{2}\nu _{ 0}^{2}\tau ^{2}\left [\sigma _{ y}^{2}(\tau ) +\sigma _{ y}^{2}(2\tau ) + \cdots \,\right ]\right \}d\tau \;. }$$
(9.148)

This integral is readily calculable for the cases where I 2(τ) is defined.

We now consider white-phase noise and white-frequency noise, which are important processes in frequency standards on short time scales. For the case of white-phase noise, σ y 2 = K 2 2 τ −2, where K 2 2 = 3h 2f h ∕4π 2 is the Allan variance in 1 s (Table 9.3), and the coherence function can be evaluated from Eq. (9.146) or Eq. (9.148):

$$\displaystyle{ \langle C^{2}(T)\rangle =\exp \left (-\frac{4\pi ^{2}\nu _{ 0}^{2}K_{ 2}^{2}} {3} \right ) =\exp (-h_{2}\,f_{h}\nu _{0}^{2})\;. }$$
(9.149)

For white-frequency noise, σ y 2 = K 0 2 τ −1, where K 0 2 = h 0∕2, and we obtain

$$\displaystyle{ \langle C^{2}(T)\rangle = \frac{2(e^{-aT} + aT - 1)} {a^{2}T^{2}} \;. }$$
(9.150)

Here, a = 2π 2 ν 0 2 K 0 2 = π 2 h 0 ν 0 2. The limiting cases for white-frequency noise are

$$\displaystyle\begin{array}{rcl} \begin{array}{lll} \langle C^{2}(T)\rangle & = 1 -\frac{2\pi ^{2}\nu _{ 0}^{2}K_{ 0}^{2}T} {3} \;,&\qquad \qquad 2\pi ^{2}\nu _{ 0}^{2}K_{ 0}^{2}T \ll 1\;, \\ & = \frac{1} {\pi ^{2}\nu _{0}^{2}K_{0}^{2}T}\;, &\qquad \qquad 2\pi ^{2}\nu _{ 0}^{2}K_{ 0}^{2}T \gg 1\;. \end{array} & &{}\end{array}$$
(9.151)

The approximate relation for coherence time in Eq. (9.139) corresponds to rms values of the coherence function of 0.85 and 0.92 for white-phase noise and white-frequency noise, respectively. These calculations assume that one station has a perfect frequency standard. In practice, the effective Allan variance is the sum of the Allan variances of the two oscillators:

$$\displaystyle{ \sigma _{y}^{2} =\sigma _{ y1}^{2} +\sigma _{ y2}^{2}\;. }$$
(9.152)

Thus, if two stations have similar standards, the coherence loss is doubled if the loss is small. If the short-term stability is dominated by white-phase noise, which is usually the case for hydrogen masers, the coherence function is independent of time. This means there is a maximum frequency above which a particular standard will not be usable for VLBI, regardless of the integration time. This frequency is approximately 1∕(2π K 2) Hz, which for a hydrogen maser is about 1000 GHz.

In practice, the coherence C(T) is measured at the peak amplitude of the correlator output, which varies as a function of fringe frequency. This operation is equivalent to removing a constant frequency drift from the phase data and can be considered as highpass filtering of the data with a cutoff frequency of 1∕T. Modeling this operation as the response of a single-pole, highpass filter, one can show that it ensures the convergence of Eq. (9.148) for all processes for which the Allan variance exponent μ < 1. To compare the various representations of frequency stability, we show in Figs. 9.15 and 9.16 examples of the performance of a hydrogen maser given by the functions σ y 2, \(\mathcal{S}_{y}(\,f)\), and 〈C 2(T)〉1∕2.

Fig. 9.15
figure 15

(a ) Power spectrum of the fractional frequency deviation \(\mathcal{S}_{y}(\,f)\) for a hydrogen maser frequency standard, and (b ) the normalized power spectrum of the phase noise \(\nu _{0}^{2}\mathcal{S}_{\phi }(\,f)\). \(\mathcal{S}_{y}(\,f)\) and \(\mathcal{S}_{\phi }(\,f)\) are related by Eq. (9.119). For frequencies above 10 Hz, \(\mathcal{S}_{\phi }(\,f)\) approaches the spectrum of the crystal oscillator to which the maser is locked, which declines as f −3. Adapted from Vessot (1979).

Fig. 9.16
figure 16

(a ) Allan standard deviation vs. sample time for a hydrogen maser frequency standard. Data from Vessot (1979). (b ) Coherence \(\sqrt{ \langle C^{2}(T)}\rangle\), defined by Eq. (9.145), for various radio frequencies based on two frequency standards with Allan standard deviations given in (a ). (c ) Signal-to-noise ratio, normalized to unity at one second, of the measured visibility vs. integration time for various frequencies. In a VLBI system, the coherence and signal-to-noise ratios will be further reduced by atmospheric fluctuations.

9.5.3 Precise Frequency Standards

Precise frequency standards of interest for VLBI include crystal oscillators and atomic frequency standards such as rubidium vapor cells, cesium-beam resonators, and hydrogen masers (Lewis 1991). Atomic frequency standards incorporate crystal oscillators that are phase-locked or frequency-locked to the atomic process, using loops with time constants in the range 0.1–1 s, so that short-term performance becomes that of the crystal oscillator. Details of how these loops are implemented are given by Vanier et al. (1979). The performance of the crystal oscillator is very important because unless it has high spectral purity, the phase-locked loops involved in generating the local oscillator signal from the frequency standard will not operate properly (Vessot 1976).

We first consider a frequency standard as a “black box” that puts out a stable sinusoid at a convenient frequency such as 5 MHz, or some higher frequency, at which the crystal oscillator is locked to the atomic process. The performance of various devices is shown in Fig. 9.17. These somewhat idealized plots show that the Allan variances of the standards have three regions: short-term noise dominated by either white-phase or white-frequency noise; flicker-frequency noise, which gives the lowest value of Allan variance and is therefore referred to as the “flicker floor”; and finally, for long periods, random-walk-of-frequency noise. Two other parameters can be specified, a drift rate and an accuracy. The drift rate is the linear change in frequency per unit time interval. Note that if the standard drives a clock, then a constant drift rate results in a clock error that accumulates as time squared. The accuracy refers to how well the standard can be set to its nominal frequency. The performance parameters are summarized in Table 9.4.

Fig. 9.17
figure 17

Idealized performance of various frequency standards and other systems. Rubidium (1965) = Hewlett-Packard (HP) 5065; cesium (1965) = HP 5061-004; cesium (1984) = NBS Laboratory device no. 4; hydrogen (1970) = early Varian/HP hydrogen maser oscillator; hydrogen (1984) = hydrogen maser SAO VLG-11; quartz (2013) = crystal oscillator, Oscilloquartz 8607. Dots represent performance of the hydrogen maser oscillator by T4 Science, iMaser 3000; CSO (2011) = cryogenic sapphire oscillator stabilizer by GPS (Doeleman et al. 2011). Millisecond pulsars are very stable clocks, and the data on one of them from Davis et al. (1985) are shown. The stability of some of them, i.e., those with small amounts of “red noise,” reaches 10−15 on a time scale of ten years (3 × 108 s) [see Verbiest et al. (2009) and Hobbs et al. (2012)]. VLBI data, which show the effect of path length stability through the atmosphere in approximately average conditions at low elevation sites, are from Rogers and Moran (1981).

Table 9.4 Typical performancea data on available frequency standardsb

Atomic frequency standards are based on the detection of an atomic or molecular resonance. There are three parts to any frequency standard [e.g., Kartashoff and Barnes (1972)]: particle preparation, particle confinement, and particle interrogation. Particle preparation involves enhancing the population difference in the desired transition. This is necessary for radio transitions in a gas with temperature T g for which h νk T g  ≪ 1, so that the level populations are nearly equal. Preparation is usually done either by state selection in a beam passing through a magnetic or electric field, or by optical pumping. Particle confinement makes it possible to obtain narrow resonance lines from long interaction times, since according to the Heisenberg (uncertainty) principle, the line width is equal to the reciprocal of the interaction time. Particles can be confined in beams or storage cells. Storage cells either contain a buffer gas or have specially coated walls so that particle collisions do not result in phase changes. Finally, particle interrogation is the process of sensing the interaction of particles and radiation fields. Frequency standards can be either active or passive. An example of an active standard is a maser oscillator. Passive standards require an external radiation field, and transitions are observed by (1) absorption, (2) re-emission, (3) detection of particles having made the transition, or (4) indirect detection of a quantity such as a variation in the rate of optical pumping. To show how some principles are implemented in practice, we give brief descriptions of the operation of several types of standards in the next two sections.

Other types of frequency standards are under development. For a general review of types of technology, see Drullinger et al. (1996). The cryogenic sapphire oscillator has excellent short-term stability (better than that of the hydrogen maser) and may be useful for VLBI at frequencies approaching 1 THz (Doeleman et al. 2011; Rioja et al. 2012). Other laboratory devices include the laser-cooled mercury ion frequency standard (Berkeland et al. 1998) and the ultracold atomic ytterbium oscillator, whose stability approaches 10−18 in 7 h (Hinkley et al. 2013).

9.5.4 Rubidium and Cesium Standards

Rubidium is an alkali metal with a single valence electron and thus a hydrogenlike spectrum. The electronic ground state is split into two levels, with a transition frequency of 6835 MHz. These levels correspond to the spin of the unpaired electron being parallel or antiparallel to the nuclear spin vector. A schematic diagram of the oscillator system is shown in Fig. 9.18. An RF plasma discharge in a tube containing87Rb excites the gas to an electronic level about 0. 8 μm above the ground state. The light from this discharge passes through a filter that removes the components involving the F = 2 level and passes the light at 0. 7948 μm. This filter consists of a cell of85Rb atoms whose energy levels are slightly shifted from those of the87Rb atoms, such that both gases have transitions near 0. 7800 μm. The filtered light passes through another cell of87Rb gas inside a microwave cavity resonant at the transition frequency between the F = 2 and F = 1 levels. With no RF signal applied to the cavity, the gas is nearly transparent, and the discharge beam is unattenuated as it reaches the photodetector. The application of an RF signal at 6835 MHz stimulates transitions from the F = 2 to F = 1 level. The atoms reaching the lower level are then pumped to the excited state by the light from the filtered87Rb lamp. The87Rb light therefore suffers absorption. A buffer gas, consisting of inert atoms that collide elastically with the87Rb atoms in the resonance cell, extends the interaction time to about 10−2 s, the mean collision time with the cell walls, and gives an absorption resonance with a line width of about 102 Hz. The cavity is magnetically shielded to minimize external fields. A weak homogeneous field is applied so that only Δ M F  = 0 transitions, which have zero first-order Doppler shift, are obtained. The absorption resonance has a width of 102–103 Hz. The shot noise of individual arriving photons leads to white-frequency noise.

Fig. 9.18
figure 18

(a ) Schematic diagram of a rubidium gas-cell frequency standard; (b ) pump and microwave transitions; (c ) magnetic sublevels of microwave transition vs. magnetic field; (d ) absorption of87Rb light vs. microwave frequency. Adapted from Vessot (1976).

The radio frequency signal is frequency- or phase-modulated so that the resonance line is continuously scanned. A control voltage is generated by comparing the modulation signal and the detector signal and is fed back to the slave oscillator driving the cavity to correct its frequency to the peak of the resonance.

Rubidium standards have the advantage of being small, inexpensive, and readily transportable. They are sometimes used in VLBI below 1 GHz, where the ionosphere dominates system stability. At higher frequencies, the use of rubidium standards results in degraded performance. They are useful as a backup for a primary standard and can also be used in OVLBI spacecraft to reduce the uncertainty in the timing when the radio link from the ground station is interrupted.

Cesium, like rubidium, is an alkali metal with a single valence electron. The cesium standard is important because it is used to define the standard of atomic time. The frequency of the ground-state, spin-flip transition is exactly 9192.631770 MHz, by definition of the second of atomic time. A ribbon-shaped beam of cesium gas is passed through a state-selector magnet that passes the atoms in the F = 3 level into a resonator. Cesium frequency standards are larger and substantially more expensive than rubidium standards. Because of their low signal-to-noise ratio, their short-term stability is poor. Thus, they are not used in VLBI for controlling local oscillators. However, they provide excellent long-term stability and are used to monitor time. They have also been used to verify the capability of transferring time via VLBI (Clark et al. 1979). The historical development of the cesium-beam resonator is described by Forman (1985).

9.5.5 Hydrogen Maser Frequency Standard

The hydrogen maser oscillator is the usual VLBI standard, and we discuss its operating principles in some detail.Footnote 1 The quantum mechanical analysis of the hydrogen maser is presented in a classic paper by Kleppner et al. (1962). Fundamental principles of masers are given by Shimoda et al. (1956), and details of maser construction are given by Kleppner et al. (1965) and Vessot et al. (1976).

The hydrogen maser oscillator uses the ground-state, spin-flip transition at 1420.405 MHz, the well-known 21-cm line in radio astronomy. A schematic diagram of the oscillator is shown in Fig. 9.19. The hydrogen for the maser comes from a tank of molecular hydrogen gas that is dissociated in an RF discharge. The gas in the discharge is ionized and emits the reddish glow of the Balmer lines as the hydrogen atoms recombine and cascade to the ground state. The atomic gas flows out of the dissociator through a hexapole-magnet state selector. The inhomogeneous magnetic field separates the two upper states, F = 1, M F  = 1 and F = 1, M F  = 0, from the lower states, F = 1, M F  = −1 and F = 0, M F  = 0. The beam of atoms in the two upper states is directed into the storage bulb that is located inside a microwave cavity resonant in the TE011 or TE111 mode at 1420.405 MHz. The atoms bounce around the inside of the bulb about 105 times before escaping through the entrance hole. The spent atoms are evacuated from the system, which operates at low pressure, by an ion pump. The cavity is surrounded by several layers of material with high magnetic permeability that shield it from ambient magnetic fields. Inside the shield is a solenoid that creates a weak homogenous field. This field allows the (F = 1, M F  = 0)-to-(F = 0, M F  = 0) transition to radiate and minimizes transitions from the F = 1, M F  = 1 level. There is no first-order Zeeman effect for the Δ M F  = 0 transition (see Fig. 9.19). The maser will oscillate if the cavity is tuned close to the transition frequency and the losses are small enough. In the active maser, the 1420-MHz signal is picked up by a cavity probe and used to phase-lock a crystal oscillator from which a signal at the hydrogen line frequency has been synthesized.

Fig. 9.19
figure 19

(a ) Schematic diagram of a hydrogen maser frequency standard. The line frequency shown is the rest frequency of the transition in free space from Hellwig et al. (1970). The actual frequency will differ typically by ∼ 0. 1 Hz because of cavity pulling, second-order Doppler, and the wall shift. (b ) Energies of magnetic sublevels vs. magnetic field for the 21-cm transition. Adapted from Vessot (1976). (c ) Curves of resonance frequency ν 0 vs. cavity frequency ν C for two values of line width [see Eq. (9.158)]. The intersection of the curves, which can be found empirically, gives the best operating frequency.

The interaction lifetime of an atom in the bulb can be described by an exponential probability function

$$\displaystyle{ f(t) =\gamma e^{-\gamma t}\;, }$$
(9.153)

where γ is the total relaxation rate. The line has an approximately Lorentzian profile with a line width (full width at half-maximum) Δ ν 0 of γπ. The most important contribution to γ is the rate at which atoms escape through the entrance hole. This rate is

$$\displaystyle{ \gamma _{e} = \frac{v_{0}A_{h}} {6V } \;, }$$
(9.154)

where \(v_{0} = \sqrt{8kT_{g } /m}\) is the average particle speed, T g is the gas temperature, m is the mass of a hydrogen atom, A h is the area of the entrance hole, and V is the volume of the bulb. γ e is about 1 s−1. The atoms lose coherence after many wall collisions, and this leads to a loss rate γ w  ≃ 10−4 s−1. Collisions between hydrogen atoms cause spin-exchange relaxation at a rate γ s e that is proportional to the gas density and to v 0. The net relaxation rate is approximately the sum of the three most important terms:

$$\displaystyle{ \gamma =\gamma _{e} +\gamma _{w} +\gamma _{se} =\pi \varDelta \nu _{0}\;. }$$
(9.155)

All three terms are proportional to v 0 and thus also to \(\sqrt{T_{g}}\). Note that the random thermal motions of the atoms do not give rise to a first-order Doppler broadening of the line, because the interaction between the atoms and the RF field takes place in a resonant cavity [see Kleppner et al. (1962)].

The maser oscillator has two resonant frequencies, the line frequency ν L and the electromagnetic cavity resonance frequency ν C , defined by the cavity’s dimensions. In classical oscillators, the frequency is the mean of these two, weighted by the respective Q factors, Q L for the line and Q C for the cavity:

$$\displaystyle{ \nu _{0} = \frac{\nu _{L}Q_{L} +\nu _{C}Q_{C}} {Q_{L} + Q_{C}} \;. }$$
(9.156)

The Q factor is defined as π times the reciprocal of the fractional loss in energy per cycle of the resonant frequency. Hence, from Eq. (9.153), Q L is given by [see, e.g., Siegman (1971)]

$$\displaystyle{ Q_{L} \simeq \frac{\pi \nu _{0}} {\gamma } = \frac{\nu _{0}} {\varDelta \nu _{0}}\;. }$$
(9.157)

A typical value of Q L is about 109. The practical value of Q C for a silver-plated cavity is about 5 × 104. Since Q L  ≫ Q C , the resonance frequency is approximately

$$\displaystyle{ \nu _{0} \simeq \nu _{L} + \frac{Q_{C}} {Q_{L}}(\nu _{C} -\nu _{L})\;. }$$
(9.158)

Equation (9.158) describes the effect of “cavity pulling” on the resonance frequency. Temperature changes cause the size, and thus the resonant frequency, of the cavity to change. Hence, a fractional frequency stability of 10−15 for the maser requires a fractional mechanical stability of about 5 × 10−10 for the cavity. The cavity dimensions therefore must be stable to about 10−8 cm. The cavity must be made from material with a small thermal expansion coefficient or the temperature must be carefully controlled. Extreme mechanical stability is also required so that atmospheric pressure changes do not affect the frequency. The TE011 cavity is a cylinder about 27 cm in length and diameter, appreciably larger than the free-space wavelength because of the loading by the storage bulb. Coarse tuning is accomplished by moving the end plate of the cavity and fine tuning by a varactor diode. From Eq. (9.158), it is clear that the maser frequency is most stable when ν C is set to ν L so that ν 0 equals ν L regardless of the values of Q C and Q L . This optimal tuning point of the maser can be found by making a plot of ν 0 vs. ν C , which is a straight line with slope Q C Q L , according to Eq. (9.158). By varying Q L (for example, by varying the gas pressure and thereby changing γ), a family of straight lines can be generated that intersect at the desired frequency ν 0 = ν L  = ν C (see Fig. 9.19c). Servomechanisms are used in some systems to keep the maser cavity continuously tuned.

The performance of hydrogen masers is shown in Figs. 9.16 and 9.17. For periods less than 103 s, the performance is limited by two fundamental processes: white-frequency noise due to thermal noise generated inside the cavity and white-phase noise due to thermal noise in the external amplifier. The thermal noise generated inside the cavity produces a fractional frequency variance (Allan variance) of

$$\displaystyle{ \sigma _{yf}^{2} = \frac{1} {Q_{L}^{2}} \frac{kT_{g}} {P_{0}\tau } \;, }$$
(9.159)

where P 0 is the power delivered by the atoms (Edson 1960; Kleppner et al. 1962). There is also shot noise in the cavity due to the discrete radiation of photons. However, this process, described by the Allan variance σ y s 2, is smaller than σ yf 2 by the ratio h νk T g , which is 2 × 10−4 at room temperature. Spontaneous emission also contributes a small amount of noise, equivalent to increasing T g by h νk ≃ 0. 07 K. Finally, the maser receiver adds a noise power, k T R Δ ν, to the signal coupled out of the cavity, where T R is the receiver noise temperature and Δ ν is the receiver bandwidth. This noise causes an Allan variance of (Cutler and Searle 1966)

$$\displaystyle{ \sigma _{yR}^{2} = \frac{1} {(2\pi \nu _{0}\tau )^{2}} \frac{kT_{R}\varDelta \nu } {P_{0}} \;. }$$
(9.160)

These two processes are independent, so the net Allan variance is σ y 2 = σ yf 2 +σ y R 2. The effects of both processes are clearly evident in the data in Fig. 9.17. Note that a flicker floor is not reached because of long-term drifts. The short-term performance can be improved by increasing the atomic flux level, which increases P 0. However, increasing the flux increases the spin-exchange rate, which decreases Q L , thereby making the oscillator more susceptible to the long-term effects of cavity pulling.

The frequency of a maser is not exactly equal to the atomic transition frequency because of several effects. These effects limit the accuracy to which the frequency can be set, and because most of them are temperature dependent, they probably contribute to flicker-frequency and random-walk-of-frequency noise. Cavity pulling, which has been described already, is an important effect, and to minimize it, the cavity must be tuned carefully. The collision-induced spin-exchange process gives a frequency shift that varies with Q L in the same way as the cavity pulling. Thus, the cavity-tuning procedure also eliminates this shift. Collisions with the cavity walls produce an effect called the “wall shift,” which is difficult to predict and may be the ultimate limiting factor in the absolute precision of the maser frequency (Vessot and Levine 1970). This shift depends on the temperature and wall coating material. Its fractional value is about 10−11. The first-order Doppler effect cancels, but the second-order Doppler effect does not, because of its v 2c 2 dependence [see Kleppner et al. (1962)]. The fractional frequency shift is about equal to − 1. 4 × 10−13 T g . Finally, there is no first-order Zeeman effect in the (F = 1, M F  = 0)-to-(F = 0, M F  = 0) transition. However, the second-order Zeeman fractional-frequency shift is 2. 0 × 102 B 2, where B is the magnetic field in tesla.

9.5.6 Local Oscillator Stability

Local oscillator signals are generated by multiplying a signal from the locked oscillator of the frequency standard. The multipliers must have exceptional stability, as discussed in Sect. 7.2, to avoid the introduction of additional noise and drift. Imperfect multipliers are sensitive to vibration and temperature and may have modulation at harmonics of the power line frequency. In an ideal multiplier, a signal of the form of Eq. (9.112) is converted to

$$\displaystyle{ V (t) =\cos [2\pi M\nu _{0}t + M\phi (t)]\;, }$$
(9.161)

where M is the multiplication factor, ν 0 is the fundamental frequency, and ϕ is the random phase noise of the frequency standard. If the phase noise is small, M ϕ(t) ≪ 1, then the single-sided power spectrum of V (t) is given by

$$\displaystyle{ \mathcal{S}_{v}(\nu ) =\delta (\nu -M\nu _{0}) + M^{2}\mathcal{S}_{\phi }(\nu -M\nu _{ 0})\;, }$$
(9.162)

where δ is a delta function representing the desired signal, and \(\mathcal{S}_{\phi }\) is the power spectrum of the phase noise. Thus, the noise power increases as the square of the multiplication factor. In the general case, \(\mathcal{S}_{v}\) can be written (Lindsey and Chie 1978)

$$\displaystyle{ \mathcal{S}_{v}(\nu ) =\delta (\nu -M\nu _{0}) +\sum \limits _{ n=1}^{\infty }\frac{M^{2n}} {n!} \left [\mathcal{S}_{\phi }(\nu -M\nu _{0}) {\ast}\mathcal{S}_{\phi }(\nu -M\nu _{0}) {\ast}\cdots \,\right ], }$$
(9.163)

where the term in brackets contains n replications of the same function convolved together. When only the leading term in the summation is retained, Eq. (9.163) reduces to Eq. (9.162). The higher-order terms in Eq. (9.163) represent a series of approximately Gaussian components because of the repeated convolutions. The rms phase deviation of the multiplier output frequency, M ν 0, is proportional to the rms voltage of the noise in the output bandwidth, that is, to the square root of the noise power. Thus, for the case represented by Eq. (9.162), the rms phase fluctuation is proportional to M.

9.5.7 Phase Calibration System

One way to check the integrity of an entire VLBI system is to inject into the front end of the receiver an RF signal that is independently derived from the frequency standard. The RF test signal can be derived by driving a step-recovery diode with, say, a 1-MHz signal from the frequency standard so as to generate a pulse train with 1-μs period. Such a signal has harmonics at 1-MHz intervals throughout the microwave region, all of which have the same phase at the reference intervals. When the RF band is mixed down to baseband, one of the injected harmonics can be made to appear at a convenient frequency of order 10 kHz. This is then compared with a reference signal from the frequency standard. The phase calibration signal can be continuously injected during VLBI recording since a low enough level can be used that it can be detected only by very narrowband filtering in the processor( ∼ 10-Hz bandwidth). The calibration allows one to compensate for variations such as those caused by thermal effects in cables (Whitney et al. 1976; Thompson and Bagri 1991; Thompson 1995). Similar methods are used in some connected-element interferometers.

9.5.8 Time Synchronization

The clocks at VLBI stations must be synchronized accurately enough to avoid time-consuming searches for interference fringes. Until around 1980, Loran C was widely used to monitor time at VLBI stations. Loran, an acronym for Long Range Navigation, is a system originally developed during World War II for ocean navigation (Pierce et al. 1948). The transmission frequency is 100 kHz. The relative time of arrival of signals from three stations defines the observer’s location on the Earth’s surface. For a detailed discussion of Loran C, see Frank (1983). Accuracies from a few hundred nanoseconds to a few tens of microseconds are possible, depending on the accuracy of the estimate of propagation time.

The Global Positioning System (GPS) provides higher accuracy than Loran and has been used in almost all VLBI systems since the early 1980s. In the GPS system, the user receives signals at 1.23 or 1.57 GHz from a number of satellites whose positions are known and whose clocks are synchronized to Coordinated Universal Time (UTC; see Sect. 12.3). If timing measurements from four satellites are made, and corrected for propagation effects in the atmosphere, users can determine their positions in three coordinates and their clock errors. The accuracies available to civilian users have improved over about a decade from 100 ns in time (Parkinson and Gilbert 1983; Lewandowski et al. 1999) to ∼  1 ns (Rose et al. 2014), and further improvement down to 100 ps is expected (Ray and Senior 2005). An analysis of the time-transfer problem, including relativistic effects, is given by Ashby and Allan (1979). For general information on GPS usage, see, for example, Leick (1995).

For time scales of a year, the accuracy of timing from pulsar observations approaches 1 part in 1014 (Davis et al. 1985). Ultimately, the best time transfer may be obtainable from the processed VLBI data (Counselman et al. 1977; Clark et al. 1979).

9.6 Data Storage Systems

The basic consideration for any storage system is the representation of the signal and the method of incorporating the time information. Recording can be either analog or digital, and various data storage technologies are available. Here, we discuss only digital recording since the technologies involved are well suited to VLBI and are widely used.

A basic parameter of a recording system is its data rate, ν b (bits s−1). This parameter limits the number of bits that can be recorded in a given time and, thus, also the sensitivity of continuum observations in which the potential IF bandwidth is larger than ν b ∕2N b , where N b is the number of bits per sample. The signal is represented by samples having Q quantization levels taken at β times the Nyquist rate. For N samples, there are Q N possible data configurations, which require a minimum of Nlog2 Q bits. Therefore, as noted in Sect. 8.4.3, the maximum RF bandwidth is

$$\displaystyle{ \varDelta \nu = \frac{\nu _{b}} {2\beta N_{b}} = \frac{\nu _{b}} {2\beta \log _{2}Q}\;. }$$
(9.164)

The signal-to-noise ratio obtained in time τ is proportional to \(\eta _{Q}\sqrt{\varDelta \nu \tau }\), where η Q is the quantization efficiency (see Table 8.3). From Eq. (9.164),

$$\displaystyle{ \eta _{Q}\sqrt{\varDelta \nu \tau } =\eta _{Q}\sqrt{ \frac{\nu _{b } \tau } {2\beta N_{b}}}\;. }$$
(9.165)

If τ is the recording time, ν b τ is equal to the number of recorded bits. The quantity \(\eta _{Q}/\sqrt{\beta N_{b}}\) thus provides an indication of the performance per bit, which it is desirable to maximize. For two- and four-level sampling, the obvious encoding schemes are one bit and two bits per sample, respectively. For three-level sampling, a problem arises since encoding one sample (one of three possible states) in two data bits (representing four possible states) is inefficient. Putting three samples into five bits or five samples into eight bits gives data rates of 1.67 and 1.60 bits per sample, respectively, compared with the theoretical optimum value of log23 = 1. 585. The values of \(\eta _{Q}/\sqrt{\beta N_{b}}\) for various values of Q and β, and several encoding schemes, are listed in Table 9.5. The highest signal-to-noise ratio is achieved with three-level sampling at the Nyquist rate, although two- and four-level sampling give almost the same performance.

Table 9.5 Performance of various signal representations as a function of number of quantization levels, sampling rate, and encoding formata

In addition to the encoding schemes discussed above, in which the number of bits required for a given number of samples is constant, one can also envisage a scheme in which the number of bits depends on the sample values, that is, a variable-length code. For example, D’Addario (1984) has suggested encoding the +1, 0, and –1 values in three-level quantization as the binary numbers 11, 0, and 10, respectively. It is possible to decode such a data string uniquely, since all one-bit representations begin with 0 and all two-bit representations with 1. The average number of bits per sample depends on the amplitude probability distribution of the signal waveform and the threshold level settings. For a given number of bits, the threshold settings that maximize the signal-to-noise ratio are generally not the same as those derived in Sect. 8.3, which are optimum for a given number of bits per sample. With D’Addario’s encoding scheme, the best performance is achieved with the threshold set such that η Q  = 0. 769 and N b  = 1. 370 bits per sample, giving a performance factor \(\eta _{Q}/\sqrt{\beta N_{b}}\) equal to 0.657. Thus, an increase in sensitivity of about 3% compared with the use of the scheme with 1.6 bits per sample could be achieved. However, the effects of bit errors or interfering signals that change the amplitude distribution could be more serious. Finally, the data could be encoded statistically in large blocks that would allow a theoretically optimal value of N b of 1.317 bits per sample, which, with η Q of 0.769, would give a performance factor of 0.670 (D’Addario 1984).

In practice, the desirability of a simple encoding scheme and other design considerations have usually resulted in the choice of two-level quantization. All five VLBI systems developed in the United States during the period 1968–1997 (Mark I, Mark II, Mark III, VLBA, and Mark IV) use two-level sampling, but for the last two of these, four-level sampling is also an option. For spectral line observations, where the bandwidth of the signal is small with respect to the bandwidth of the recording system, multilevel sampling is advantageous. Note that multilevel sampling is a more effective way of using recording capacity than sampling faster than the Nyquist rate (Table 9.5).

Each data sample must have either an implicit or explicit time tag. Although an error rate of 10−3 in decoding the data bits is acceptable, a one-bit shift in the time axis can be a serious defect and is not acceptable. In virtually all recording systems, the data are blocked into records. Each new record begins at a precise time so that the temporal registration of the data stream can be recovered if it is lost during the previous record. These record lengths are: Mark I, 0.2 s (144,000 bits); Mark II, 16.7 ms (66,600 bits); and Mark III, 5 ms (20,000 bits). In the Mark I system, which used standard computer tape format, the accuracy of recording was very high, and the time of any bit was obtained by counting bits from the beginning of the record and counting records from the beginning of the tape. In the Mark II system, which uses video cassette recorders (VCRs), the data are recorded with a self-clocking code, while in the Mark III system, which uses instrumentation recorders, the data transitions themselves serve as the clock. The characteristics of several systems are given in Fig. 9.20 and Table 9.6. In all of these, the recording is in digital form, except for the Canadian system used during 1971–83. Wietfeldt and D’Addario (1991) discuss the compatibility of some of these systems.

Fig. 9.20
figure 20

Trends in VLBI recording system data rates (circles) and storage costs (squares). (left) Data rate in Gbits s−1 (Gbps) vs. time for various systems. (right) Cost of data storage system in K$/Gbps. Note that before 2000, data storage was on magnetic tape and after 2000 on disk. From Whitney et al. (2013), courtesy of and © the Astronomical Society of the Pacific.

Table 9.6 Characteristics of some VLBI data storage systems

9.7 Processing Systems and Algorithms

A VLBI processor has two main functions: (1) reproduction of smooth data streams and (2) cross-correlation analysis of the data streams. Before 2000, VLBI data were stored on magnetic tape. During that period, the data stream from a tape recorder could have time-base irregularities of up to 100 μs, caused by jitter in the mechanical playback system, and could be subject to dropouts because of tape imperfections. The processor had to derive the true time base either from the encoded clock transitions, in the case of a self-clocking code, or from the data transitions themselves, when a bit synchronizer was used. There had to be enough buffer storage to handle at least the mechanical jitter. The geometric delay was corrected with minimal buffer space by shifting the playback time, thereby retaining the data on the tape until they were needed by the correlator. If the data were read in synchronism from the tapes, a buffer memory of sample capacity about 5 × 104 times the clock rate in megahertz would be needed for geometric delay compensation. Even today, with disk storage or transmission of data over fiber optic networks, some buffer storage is required.

The major differences between the design of the correlation part of the processor for VLBI and for a conventional interferometer are related to the fact that in VLBI, fringe rotation and delay compensation are usually performed on the quantized and sampled signal. This leads to special problems, which we discuss here. Digitization of the signals introduces several signal-to-noise loss factors: η Q , the loss factor associated with amplitude quantization of the recorded signals, discussed in Sect. 8.3; η R , the loss factor incurred by quantizing the phase of the fringe rotation waveform; η S , the loss factor incurred by inadequate sideband rejection as a result of the limited number of delays in the correlator; and η D , the loss caused by compensating the geometric delay in discrete steps.

Fringe rotation and delay compensation can be done on the analog signals at the telescope before recording. For example, the fringe rotation can be done at the telescopes by offsetting the local oscillators, as described in Sect. 6.1.6 for a connected-element array. The advantage of this arrangement is that only a real correlation function (with both positive and negative delays) needs to be calculated (see Sects. 8.8 and 9.1). Hence, only half the correlator circuits are required. Also, the sensitivity loss from a digital fringe rotator is not incurred. A disadvantage is that the output of the correlator must be averaged over a short enough interval to accommodate the residual fringe frequency of a source anywhere in the primary beams of the antennas. The maximum residual fringe frequency of a source at the half-power point of the primary beam is Δ ν f  ≃ D ω e d [see Eq. (9.11)], where D is the baseline length, d is the antenna diameter, and ω e is the angular velocity of the Earth in radians per second. Hence, the averaging time of the correlator output must be less than 1/(2Δ ν f ); for example, it should not exceed 30 ms for a baseline equal to the Earth’s diameter and d = 25 m. The correlation functions can be averaged further after they have been passed through a fringe rotator, which removes the residual fringe frequency. Also, the unit at the telescope that continually changes the local oscillator frequency must be carefully designed so that full phase accountability is provided for astrometric work. Further information on VLBI systems and processing algorithms can be found in Thomas (1981); Herring (1983), and Deller et al. (20072011).

9.7.1 Fringe Rotation Loss (η R )

Fringe rotation is used to reduce to near zero the frequency of the fringe component of the correlated signals (see Sect. 6.1.6). Here we consider the fringe frequency to include the effect of offsets in the frequency standards. Fringe rotation in the processor can be implemented in a number of ways, as shown in Fig. 9.21. If the fringe rotator is placed after the correlator (Fig. 9.21a), then the correlation function from the correlator must be averaged over an interval short with respect to the fringe period. If the local oscillators at the antennas are offset to slow the fringes, so that only a little further adjustment is required after the correlator, then this scheme is convenient. Otherwise, the short averaging time required and the resulting high data rate from the correlator make this arrangement unattractive. Alternately, before correlation, one of the data streams can be passed through a digital single-sideband mixer that shifts the Fourier components of the signal by the appropriate fringe frequency, as shown in Fig. 9.21b. The 90 phase shift in this mixer is difficult to implement without introducing spectral distortion, so this type of fringe rotator is rarely used (see also Sect. 8.7). The fringe rotation scheme shown in Fig. 9.21c is commonly used, but application of fringe rotation to the quantized signal introduces two complications. First, the fringe function with which the signal is multiplied must be coarsely quantized so as not to increase the number of bits per sample going to the correlator: this also applies to scheme (b). Second, the multiplication introduces an unwanted noise sideband, which is described below in Sect. 9.7.2. We now consider the first of these effects.

Fig. 9.21
figure 21

Various processor configurations showing possible locations of fringe rotator. \(\mathcal{F}_{R}\) and \(\mathcal{F}_{I}\) are cosine and sine representations of the fringe function. See text for discussion of relative merits.

The data stream is multiplied by a complex function \(\mathcal{F}\) whose real and imaginary parts, \(\mathcal{F}_{R}\) and \(\mathcal{F}_{I}\), approximate cosϕ and sinϕ, where ϕ is the desired phase function. In the simplest approximation, these functions are square waves with the appropriate frequency and phases. Thus, as shown in Fig. 9.22, the quantized signal is multiplied by a fringe rotation function whose amplitude is constant but whose phase steps by 90 every quarter cycle instead of smoothly progressing. The resulting visibility function then has a phase component with a 90 sawtooth modulation at the fringe frequency. This resembles phase noise in which the phase is uniformly distributed between ± 45. Therefore, the average signal amplitude is degraded by sin(π∕4)∕(π∕4) = 0. 900. Another approach to calculating the loss in signal-to-noise ratio is to calculate the harmonics in the fringe rotation function. The first harmonic of \(\mathcal{F}_{R}\) or \(\mathcal{F}_{I}\) has an amplitude of 4∕π = 1. 273. Only the signal mixed with the first harmonic appears in the processor output, since the other harmonics are removed by time averaging. Thus, part of the signal is scattered out of the fringe passband. The fraction retained is the square root of the ratio of the power in the first harmonic to the total power of the fringe rotation function, which is \(\sqrt{ 8}/\pi = 0.900\). This represents the loss in signal-to-noise ratio. There is also a scale-factor change since the fringe amplitudes are increased by the action of the fringe rotator. Thus, the fringe amplitudes must be divided by 4∕π, the relative amplitude of the first harmonic of \(\mathcal{F}_{R}\).

Fig. 9.22
figure 22

(a ) Mathematical model of two-level fringe rotator showing \(\mathcal{F}_{R}\) and \(\mathcal{F}_{I}\), functions that approximate cos ϕ and sin ϕ (left); the amplitude and phase representation of \(\mathcal{F}\) (center); and the phasor plot of \(\mathcal{F}\) (right). (b ) Same plots for a three-level fringe rotator.

A better fringe rotation function is the three-level approximation of a sine wave (Clark et al. 1972) shown in Fig. 9.22b. When the fringe rotation function is zero, the correlator is inhibited. Since the real and imaginary parts of \(\mathcal{F}\) are never zero simultaneously, all data bits are used at least once. This fringe rotation function can be thought of as a phasor whose tip traces out a square such that it has phase jumps in 45 increments and its amplitude alternates between \(\sqrt{ 2}\) and 1. The resulting jitter in phase is uniformly distributed between ± 22. 5 and results in a loss of signal amplitude of sin(π∕8)∕(π∕8) = 0. 974. Also, the variation in the amplitude of the phasor introduces a nonuniform weighting of the signal samples. This reduces the signal-to-noise ratio by a further factor equal to (\(1 + \sqrt{2})/\sqrt{6} = 0.986\). The net loss in signal-to-noise ratio is 0.960. The reduction in signal-to-noise ratio is also equal to the square root of the ratio of the power in the first harmonic to the total power in \(\mathcal{F}_{R}\). The first harmonic of \(\mathcal{F}_{R}\) is (4∕π)cos(π∕8) = 1. 18, which is the scale factor correction for the visibility. The three-level fringe function considered here is used in many VLBI processors. The fringe period is divided into 16 parts to generate \(\mathcal{F}\). The transitions in \(\mathcal{F}\), which then occur at integral multiples of 1/16 of the fringe period, are not optimally located, but this approximation results in no more than 0.1% additional loss. Note that an FX correlator can be made to accept input data with more than one or two bits per sample rather more easily than a lag correlator. With more data bits per signal sample, more accurate representations of sine and cosine functions can be used.

9.7.2 Fringe Sideband Rejection Loss (η S )

The digital fringe rotator shown in Fig. 9.21c is not a single-sideband mixer. Thus, as well as the wanted output, shifted in frequency by the fringe frequency, an unwanted component of noise corresponding to the image response of a mixer also appears. To understand the effect of this noise, consider the cross power spectrum of the correlator output. Recall that ν′ is the intermediate frequency defined following Eq. (9.18), and note that in the output of a spectral correlator, ν′ > 0 and ν′ < 0 refer to the upper and lower sidebands, respectively. For upper-sideband operation, the cross power spectrum of the signal is given by Eq. (9.26), which is nonzero only for the upper sideband. However, there will be noise at both positive and negative frequencies. Thus, the cross power spectrum of the correlator output is

$$\displaystyle{ \mathcal{S}'_{12}(\nu ') = \left \{\begin{array}{@{}l@{\quad }l@{}} \mathcal{S}(\nu ')e^{\,j\varPhi (\nu ')} + n_{u}(\nu ')\;,\quad &\qquad \nu ' > 0\;,\\ \quad \\ \quad \\ n_{\ell}(\nu ')\;, \quad &\qquad \nu ' < 0\;, \end{array} \right. }$$
(9.166)

where \(\mathcal{S}(\nu ')\) is the instrumental response defined in Eq. (9.19), j Φ(ν′) is the exponent in Eq. (9.26), and n u and n are the noise spectra for the upper- and lower-sideband responses. For observations in which a spectral line correlator is used, \(\mathcal{S}'_{12}(\nu ')\) is computed and the noise at ν′ < 0 is simply ignored. For continuum observations using a correlator with only a small number of channels (lags), the noise at ν′ < 0 contributes excess noise in the correlation function and must be removed. A straightforward way to remove the noise at ν′ < 0 is to compute \(\mathcal{S}'_{12}(\nu ')\) and multiply it by the filtering function

$$\displaystyle{ H_{F}(\nu ') = \left \{\begin{array}{@{}l@{\quad }l@{}} 1,\quad &\qquad 0 <\nu ' <\varDelta \nu \\ \quad \\ \quad \\ 0,\quad &\qquad \mathrm{elsewhere}\;. \end{array} \right. }$$
(9.167)

The resulting function, \(\mathcal{S}'_{12}(\nu ')H_{F}(\nu ')\), can be Fourier transformed back into a correlation function. Alternately, the filtering can be applied by convolving the correlation function at the output of the correlator with the Fourier transform of H F (ν′), which is

$$\displaystyle{ h_{F}(\tau ) =\varDelta \nu e^{\,j\pi \varDelta \nu \tau }\left (\frac{\sin \pi \varDelta \nu \tau } {\pi \varDelta \nu \tau }\right )\;, }$$
(9.168)

or

$$\displaystyle{ h_{F}(\tau ) = F_{1}(\tau ) + jF_{2}(\tau )\;, }$$
(9.169)

where F 1 and F 2 are as defined in Eq. (9.23). The convolution leaves the desired signal unchanged but removes the negative (lower) sideband noise. Thus, the resulting correlation function still has the form of Eq. (9.25), plus the positive (upper) sideband noise that cannot be removed.

The role of h F (τ) can be understood in a different way. The correlation function at the output of the correlator is computed at discrete delays at intervals of (2Δ ν)−1. Therefore, the correlation function in Eq. (9.25) has a full width at half-maximum of about three delay steps. In order to estimate the amplitude and phase of the correlation function, one would like to do more than just take these values from the peak of ρ12(τ). Rather, one would like to use all the information provided by the correlation function at various delays. h F (τ) is the appropriate interpolation function that properly weights the correlation function, gathering up the power at different delays to provide an optimal estimate of the fringe amplitude, phase, and delay. Note that h F (τ) and ρ12(τ) are identical forms except for the unknown amplitude, phase, and delay. These unknown quantities can be estimated by the usual procedure of matched filtering or, equivalently, least-mean-squares analysis in which the correlation function is convolved with h F (τ). However, ρ12(τ) is measured only over a finite number of delay steps, and some information is lost, so the signal-to-noise ratio is reduced. Assume that the system lowpass response is rectangular and the delay errors Δ τ g and τ e are zero, so that the correlation function is centered in the delay range of the correlator. Let M be the number of delay steps (lags) in the correlator. The loss factor η S is the signal-to-noise ratio when M values of the correlation function are available, divided by the signal-to noise ratio when the entire function is available:

$$\displaystyle{ \eta _{S} = \sqrt{\frac{\sum \limits _{k=-M' }^{M' }\vert h_{F } (\tau _{k } )\vert ^{2 } } {\sum \limits _{k=-\infty }^{\infty }\vert h_{F}(\tau _{k})\vert ^{2}}} \;, }$$
(9.170)

where τ k  = k∕2Δ ν, M′ = (M − 1)∕2, and M is an odd integer. The denominator in Eq. (9.170) equals 2Δ ν 2, so

$$\displaystyle{ \eta _{S} = \sqrt{\frac{1} {2} +\sum \limits _{ k=1}^{M'}\left [\frac{\sin (\frac{\pi k} {2} )} {\frac{\pi k} {2} } \right ]^{2}}\;. }$$
(9.171)

For M = 1, \(\eta _{S} = 1/\sqrt{2}\), which corresponds to the case of no image rejection. M must be at least 3 to ensure that the peak of the correlation function can be determined; M ≃ 7, for which η S  = 0. 975, is adequate for most purposes. For large M, η S approaches unity [see Eq. (A8.5)]. Note that because we assumed the correlation function was exactly centered, its value will be zero at delay steps 2, 4, 6, 8, \(\ldots\) , and so on. This suggests, for example, that a nine-delay correlator (M′ = 4) is no better than a seven-delay correlator (M′ = 3). In practice, the nine-delay correlator is better because the correlation function is rarely aligned perfectly in the correlator. In general, η S is slightly smaller than given in Eq. (9.171) if the correlation function is not perfectly aligned (Herring 1983).

9.7.3 Discrete Delay Step Loss (η D )

The delay introduced to align the bit streams is quantized at the sampling rate, which we assume to be the Nyquist rate. Thus, there is a periodic sawtooth delay error with a peak-to-peak amplitude equal to the sampling period. This effect is also known as the fractional bit-shift error. The delay error gives rise to a periodic phase shift that is a function of the baseband frequency, as shown in Fig. 9.23. The phase error has a peak-to-peak value of

$$\displaystyle{ \phi _{\mathrm{pp}} = \frac{\pi \nu '} {\varDelta \nu }\;, }$$
(9.172)

and the sawtooth frequency is proportional to the fringe frequency and has a maximum value of

$$\displaystyle{ \nu _{\mathrm{ds(max)}} = \frac{2\varDelta \nu D\omega _{e}} {c} \;(\mathrm{delay\ steps\ per\ second})\;, }$$
(9.173)

where D is the baseline length and ω e is the angular velocity of the Earth’s rotation in radians per second. If nothing is done to correct for this effect and the fringe amplitude is averaged over many times 1∕ν ds, then the phase at any frequency ν′ is uniformly distributed over ϕ pp. The amplitude loss as a function of baseband frequency is

$$\displaystyle{ L(\nu ') = \frac{\int _{0}^{\phi _{\mathrm{pp}}/2}\cos (\phi _{\mathrm{pp}}/2)d\phi } {\int _{0}^{\phi _{\mathrm{pp}}/2}d\phi } = \frac{\sin (\phi _{\mathrm{pp}}/2)} {\phi _{\mathrm{pp}}/2} \;, }$$
(9.174)

and the net signal-to-noise reduction over a baseband response of width Δ ν is, using Eqs. (9.172) and (9.174),

$$\displaystyle{ \eta _{D} = \frac{1} {\varDelta \nu } \int _{0}^{\varDelta \nu }\frac{\sin (\pi \nu '/2\varDelta \nu )} {\pi \nu '/2\varDelta \nu } d\nu ' = 0.873\;. }$$
(9.175)
Fig. 9.23
figure 23

Discrete delay step effect. Case (a ) applies when the fringe rotator corrects the phase for zero baseband frequency, and case (b ) applies when the fringe rotator also inserts a π∕2 phase shift when the delay changes by one Nyquist sample. The top plots show the phase vs. time at baseband frequency ν′. The middle plots show the phase across the baseband at three different times denoted by 1, 2, and 3. The bottom plots show the average amplitude across the baseband.

Unless the fringe amplitude averaging is done over an integral number of fringe periods, there is also a residual phase error, the amplitude of which decreases with the number of periods. When the fringe frequency is near zero, this phase error can be significant.

The effect of the discrete delay step can be compensated for, and no sensitivity loss need occur. The delay error caused by delay quantization is a known quantity that introduces a phase slope in the cross power spectrum. Therefore, if the cross power spectra are calculated on a period short with respect to 1∕ν ds, which can be as small as 20 ms on a 5000-km baseline with Δ ν = 20 MHz [see Eq. (9.173)], then the effect of the discrete delay step can be removed by adjusting the slope of the phase of the cross power spectrum. This correction is easily done in spectral line work where spectra are calculated anyway. Note that if this correction is not made, the sensitivity loss factor is 0.64 at the high-frequency edge of the band, as given by Eq. (9.174). In this case, the amplitude response should be compensated for by dividing the cross power spectra by L(ν′). In continuum work, the correction is sometimes omitted because of the need to Fourier transform to the frequency domain and then back to cross-correlation.

A way to compensate partially for the effect of discrete delay steps is to move the frequency at which the phase is unperturbed from zero to Δ ν∕2, the baseband center. The phase of the fringe rotator is increased by π Δ ν Δ τ s , where Δ τ s is the delay error. Thus, when the delay changes by one sampling interval, a phase jump of π∕2 is inserted in the fringe rotator. The resulting loss at the band edges is then only 0.90. The average loss over the band is given by an equation similar to Eq. (9.175), but with the upper limit of integration changed to Δ ν∕2, and equals 0.966. Also, for a symmetrical bandpass response, the residual phase error is zero because the net phase shift over the band at any instant is zero.

9.7.4 Summary of Processing Losses

The loss factors we have considered are all multiplicative, so the total loss is given by the equation

$$\displaystyle{ \eta =\eta _{Q}\,\eta _{R}\,\eta _{S}\,\eta _{D}\;, }$$
(9.176)

where η Q = quantization loss, η R = fringe rotation loss, η S = fringe sideband rejection loss, and η D = discrete delay step loss.

If there are fringe rotators in each signal path to the correlator, the fringe rotation loss will be η R 2 because the fringe rotator phases will be uncorrelated. A summary of the loss factors is given in Table 9.7. As an example, a processor might have two-level sampling (η Q  = 0. 637), three-level fringe rotators in each signal path (η R  = 0. 922), 11-channel correlation function (η S  = 0. 983), and band-center delay compensation (η D  = 0. 966), giving a net loss of 0.558. Thus, the sensitivity is worse than that of an ideal analog system with the same bandwidth by a factor of about 2.

There are other loss factors we have not discussed here. The passband will not in reality be perfectly flat, or the response zero, for frequencies above half the Nyquist sampling frequency. These imperfections introduce loss, which for an ideal nine-pole Butterworth filter amounts to 2% (Rogers 1980). The frequency responses will not be perfectly matched for different antennas (see Sect. 7.3). The phase settings of the fringe rotator may be calculated exactly at convenient intervals and extrapolated by Taylor series; this approximation will introduce periodic phase jumps. The local oscillators may have power-line harmonic and noise sidebands that put some fringe power outside the usual fringe filter passband. Empirical values of η typical of the first decade of VLBI development were about 0.4 (Cohen 1973).

The η values refer to loss in signal-to-noise ratio. The fringe amplitudes must also be corrected for scale changes due to signal quantization and fringe rotation. We summarize the multiplicative normalization factors to be applied to the fringe amplitudes in Table 9.8.

Table 9.7 Signal-to-noise loss factors
Table 9.8 Normalization factorsa

9.8 Bandwidth Synthesis

For geodetic and astrometric purposes, it is useful to measure the geometric group delay

$$\displaystyle{ \tau _{g} = \frac{1} {2\pi }\ \frac{\partial \phi } {\partial \nu } }$$
(9.177)

as accurately as possible. With a single RF band, the delay can be found by fitting a straight line to the phase vs. frequency of the cross power spectrum. The uncertainty in this delay, from the usual application of least-mean-squares analysis, is

$$\displaystyle{ \sigma _{\tau } = \frac{\sigma _{\phi }} {2\pi \varDelta \nu _{\mathrm{rms}}}\;, }$$
(9.178)

where σ ϕ is the rms phase noise for a bandwidth Δ ν and Δ ν rms is the rms bandwidth, which for a single band of width Δ ν is equal to \(\varDelta \nu /(2\sqrt{3})\) [see discussion following Eq.(A12.28) in Appendix 12.1]. σ ϕ can be obtained from Eq. (6.64), and if processing losses are neglected, Eq. (9.178) becomes

$$\displaystyle{ \sigma _{\tau } = \frac{T_{\!S}} {\zeta \,T_{\!A}\sqrt{\varDelta \nu _{\mathrm{rms } }^{3}\tau }}\;, }$$
(9.179)

where ζ is a constant equal to π(768)1∕4 ≃ 16. 5 [see derivation of Eq. (A12.33)], and T S and T A are the geometric mean system and antenna temperatures. A much higher value of Δ ν rms can be realized by observing at several different radio frequencies. This can be accomplished by switching the local oscillator of a single-band system sequentially in time among N frequencies, or by dividing up the recorded signal into N simultaneous RF bands (channels), which are spread over a wide frequency interval. The temporal switching method has the disadvantage that phase changes during the switching cycle degrade or bias the delay estimate. These methods are commonly referred to as bandwidth synthesis (Rogers 19701976).

In a practical system, signals from a small number of RF bands (about ten) are recorded. The problem of determining the optimum distribution of these bands in frequency is similar to the problem of finding a minimum-redundancy distribution of antenna spacings in a linear array, as discussed in Sect. 5.5 However, here we do not need to have all multiples of the unit (frequency) spacing up to the maximum value, and some gaps are not necessarily detrimental. From the spectral point of view, we wish to have the bands placed in some geometric sequence of increasing separation so that phase can be extrapolated from one band to the next, as shown in Fig. 9.24, without having any 2π ambiguities in the phase connection process. The rms bandwidth depends critically on the unit spacing, which depends on the minimum signal-to-noise ratio. The delay accuracy for a multiband system is obtained from Eq. (9.178) in the same way as for Eq. (9.179) but without the condition \(\varDelta \nu _{\mathrm{rms}} =\varDelta \nu /(2\sqrt{3})\). Thus, we obtain

$$\displaystyle{ \sigma _{\tau } = \frac{T_{\!S}} {2\sqrt{2}\pi T_{\!A}\sqrt{\varDelta \nu \tau }\varDelta \nu _{\mathrm{rms}}}\;, }$$
(9.180)
Fig. 9.24
figure 24

Fringe phase vs. frequency for a bandwidth synthesis system. The phase is measured over discrete bands (crosshatched) spaced at multiples of the fundamental band separation frequency ν s . The turn ambiguities give rise to sidelobes in the delay resolution function defined in Eq. (9.181) and shown in Fig. 9.25.

where Δ ν rms for a typical bandwidth synthesis system is approximately 40% of the total frequency interval spanned, Δ ν is the total bandwidth, and τ is the integration time for each band. To avoid explicitly the problem of phase connection, we can form an equivalent delay function from the cross power spectra [see Eq. (9.26)] of the various bands observed:

$$\displaystyle{ D_{R}(\tau ) =\sum \limits _{ i=1}^{N}\int _{ 0}^{\varDelta \nu }\mathcal{S}_{ 12i}(\nu -\nu _{i})e^{\,j2\pi \nu \tau }d\nu \;, }$$
(9.181)

where the ν i are the local oscillator frequencies relative to the lowest one, and νν i is the baseband frequency. The maximum of | D R (τ) | gives the maximum-likelihood estimate of the interferometer delay (Rogers 1970). The a priori normalized delay resolution function, obtained from Eq. (9.181) by setting \(\mathcal{S}_{12} = 1\) at frequencies where it is measured and \(\mathcal{S}_{12} = 0\) otherwise, is

$$\displaystyle{ \vert D_{R}(\tau )\vert =\varDelta \nu \frac{\sin \pi \varDelta \nu \tau } {\pi \varDelta \nu \tau }\left \vert \sum \limits _{i=1}^{N}e^{\,j2\pi \nu _{i}\tau }\right \vert \;. }$$
(9.182)

The sinc-function envelope is the delay resolution function for a single channel. The frequencies ν i should be chosen to minimize the width of D R (τ) while not allowing any subsidiary maximum to rise above a level such that it could be confused with the principal peak. In situations with low signal-to-noise ratio, the minimum unit spacing should be about four times the bandwidth of a single channel. The delay resolution function for a five-channel system is shown in Fig. 9.25.

Fig. 9.25
figure 25

Delay resolution function for a five-channel system with a unit spacing ν s  = 4Δ ν and spacing of 0, 1, 3, 7, and 15 ν s , as shown in part in Fig. 9.24. The “grating” lobe at τ Δ ν = 0. 25 need only be reduced sufficiently below unity to avoid delay ambiguity.

9.8.1 Burst Mode Observing

For certain observations, there are advantages in limiting the observing time to short bursts, during which the bit rate can be much higher than the mean data acquisition rate as limited by the recording technology [see, e.g., Wietfeldt and Frail (1991)]. In pulsar observations, the duration of the pulsed emission is typically ∼ 3% of the total time, so by recording data taken only during pulse-on time, the bandwidth can be increased by a factor of ∼ 33 over the maximum bandwidth for continuous observation. This technique requires the use of a high-speed sampler, high-speed memory, and pulse-timing circuitry at each antenna. During the pulse, the data are stored in the memory at the high rate and then read out continuously at a lower rate. If the ratio of these two rates is a factor w, then the bandwidth can be increased by the same factor over constant-rate observing. For pulsars, this results in an increase in sensitivity by a factor w, of which \(\sqrt{w}\) can be attributed to the increased bandwidth, and \(\sqrt{w}\) to the fact that noise is not being recorded during the pulse-off time. The second of these \(\sqrt{w}\) factors can be obtained without an increase in the data rate by simply deleting data during the pulse-off periods. Burst mode observing is also useful for astrometry and geodesy because it increases the accuracy of measurement of the geometric delay, and it has been used for this purpose in observations of continuum sources at millimeter wavelengths.

9.9 Phased Arrays as VLBI Elements

A phased array is a series of antennas for which the received signals are combined, as indicated in Fig. 5.4 Such systems could be used to form multiple beams, but we describe only the case of forming a single beam. The phase and delay of the signal from each antenna can be adjusted so that the signals from a particular direction in the sky combine in phase, thereby maximizing the sensitivity. It is important to consider the use of phased arrays as VLBI elements for two reasons. First, the elements of a connected-element synthesis array can be combined to form a phased array, thus improving the signal-to-noise ratio of a very-long-baseline interferometer in which they participate as a single station. Second, if elements with very large collecting area are desired to achieve a high signal-to-noise ratio on each baseline, it may be advantageous to build phased arrays rather than monolithic antennas because the cost of a parabolic reflector antenna increases approximately as the diameter to the power 2.7 (Meinel 1979).

Synthesis arrays such as the Westerbork Array, the VLA, the SMA, the Plateau de Bure interferometer, and ALMA are also used as phased arrays to provide a large collecting area for one element in a VLBI system or other applications. Phasing the array consists of adjusting the phase and delay of the signal from each antenna so as to compensate for the different geometric paths for a wavefront from the desired direction. These corrections are easily made through the delay and fringe rotation systems that are used for synthesis imaging. The signals are then summed and go to a VLBI recorder.

We can analyze the performance of a phased array that is used to simulate a single large antenna. Consider an array of n a identical antennas for which the system temperature is T S and the antenna temperature for a given source that is unresolved by the longest spacings in the array is T A . The output of the summing port is

$$\displaystyle{ V _{\mathrm{sum}} =\sum _{i}(s_{i} +\epsilon _{i})\;, }$$
(9.183)

where s i and ε i represent the random signal and random noise voltages, respectively, from antenna i. Now 〈s i 〉 = 〈ε i 〉 = 0 and, omitting constant gain factors, we can write 〈s i 2〉 = T A and 〈ε i 2〉 = T S . The power level of the combined signals is represented as the average squared value of Eq. (9.183),

$$\displaystyle{ \langle V _{\mathrm{sum}}^{2}\rangle =\sum _{ i,j}\ [\langle s_{i}s_{j}\rangle +\langle s_{i}\epsilon _{j}\rangle +\langle s_{j}\epsilon _{i}\rangle +\langle \epsilon _{i}\epsilon _{j}\rangle ]\;. }$$
(9.184)

If the array is accurately phased, s i  = s j . Also, since we are considering an unresolved source, 〈s i s j 〉 = T A . If the array is unphased, that is, if the signal phases at the combination point are random, then 〈s i s j 〉 = T A only for i = j and is otherwise zero. In either case, 〈s i ε i 〉 = 0 and 〈ε i ε j 〉 = 0. Thus, Eq. (9.184) can be reduced to

$$\displaystyle\begin{array}{rcl} \langle V _{\mathrm{sum}}^{2}\rangle = n_{ a}^{2}T_{\! A} + n_{a}T_{\!S}\qquad (\mathrm{array\ phased})& &{}\end{array}$$
(9.185)
$$\displaystyle\begin{array}{rcl} \langle V _{\mathrm{sum}}^{2}\rangle = n_{ a}T_{\!A} + n_{a}T_{\!S}\qquad (\mathrm{array\ unphased})\;,& &{}\end{array}$$
(9.186)

where the first term on the right side represents the signal and the second term represents the noise. When the array is phased, the signal-to-noise (power) ratio is n a T A T S , and when it is unphased, it is T A T S . Thus, the collecting area of the phased array is equal to the sum of the collecting areas of the individual antennas, but when it is unphased, it is, on average, equal to that of a single antenna.

A question of interest concerns the case in which the antennas have different sensitivities resulting from different effective collecting areas and/or system temperatures. This is a matter of practical importance even for nominally uniform arrays, since maintenance or upgrading programs can result in differences in sensitivity. Consider a phased array in which the individual system temperatures and antenna temperatures are represented by T S i and T A i , respectively. Here, T A i is defined as the signal from a point source of unit flux density,Footnote 2 so T A i is a characteristic of the antenna alone and is proportional to the collecting area. We consider only the weak-signal case for which T A  ≪ T S . For antenna i, the output voltage from a source of flux density S is V i  = s i +ε i , and we can write 〈s i 2〉 = S T A i and 〈ε i 2〉 = T S i .

It is convenient to think of the output of each antenna as providing a measure of the flux density of the source, which is equal to V i 2T A i . The expectation of the measured value of S should be the same for each antenna. The corresponding voltages are \(\sqrt{ S} = V _{i}/\!\!\sqrt{T_{Ai}}\) for the signal and \(\epsilon _{i}/\!\!\sqrt{T_{Ai}}\) for the noise. In the cross-correlation of the array output with another VLBI antenna, the signal-to-noise ratio at the correlator output is proportional to the signal-to-noise voltage ratio of the signal from the array. Thus, in combining the signal voltages in the array, we are, in effect, interested in maximizing the signal-to-noise ratio in an estimate of \(\sqrt{ S}\). Because the array antennas are not identical, we should use weighting factors w i in combining their signals. The weights should be chosen to maximize the signal-to-noise ratio of the combined array signals which, in voltage, is

$$\displaystyle{ \mathcal{R}_{\mathrm{sn}} =\sum _{i} \frac{w_{i}V _{i}} {\sqrt{T_{Ai}}}\Biggm /\sqrt{\sum _{i } \frac{w_{i }^{2 }T_{\!Si } } {T_{Ai}}} \;. }$$
(9.187)

Note that we add the signal voltages and the squares of the rms noise voltages. Selecting the weights to provide the best signal-to-noise ratio for \(V _{i}/\!\!\sqrt{T_{Ai}}\) is mathematically equivalent to the general problem of obtaining the best estimate of a measured quantity from a series of measurements for which the rms error levels are different but are known. The optimum procedure is to take a mean in which the weight of each measurement is inversely proportional to the variance of the error of that measurement [see Eq. (A12.6)]. The variance of V i is proportional to T S i , and thus the variance of \(V _{i}/\!\!\sqrt{T_{Ai}}\) is T S i T A i . Thus, we insert w i  = T A i T S i in Eq. (9.187) and obtain

$$\displaystyle\begin{array}{rcl} \mathcal{R}_{\mathrm{sn1}}& =& \sum _{i} \frac{V _{i}} {\sqrt{T_{Ai}}} \frac{T_{Ai}} {T_{\!Si}}\Biggm /\sqrt{\sum _{i } \frac{T_{\!Si } } {T_{Ai}}\left (\frac{T_{Ai}} {T_{\!Si}}\right )^{2}} \\ & =& \sum _{i}\frac{V _{i}\sqrt{T_{Ai}}} {T_{\!Si}} \Bigg/\sqrt{\sum _{i } \frac{T_{Ai } } {T_{\!Si}}}\;. {}\end{array}$$
(9.188)

Note that in the numerator, V i is multiplied by \(\sqrt{T_{Ai}}/T_{\!Si}\), which is therefore the (voltage) weighting factor for optimum sensitivity in the signal combination. This conclusion is in agreement with an analysis by Dewey (1994). (Note that the weighting factors for the signal voltages at the combination point are not w i but \(w_{i}/\!\!\sqrt{T_{Ai}}\).) The corresponding weighting of the signal power at the combination point is proportional to T A i T S i 2.

In synthesis arrays such as the VLA, the IF signals from the antennas are each delivered to a digital sampler at the same power level (of signal plus noise), and the signals are combined after that point so that the time delays required can be inserted digitally. Thus, to avoid modifying the receiving system (which is designed for synthesis imaging), the signals are combined with equal powers when the array is used in the phased mode. For the case of T A  ≪ T S that we are considering, the corresponding weighting is \(w_{i} = 1/\!\!\sqrt{T_{\!Si}}\), and the signal-to-noise ratio becomes

$$\displaystyle{ \mathcal{R}_{\mathrm{sn2}} =\sum _{i} \frac{V _{i}} {\sqrt{T_{Ai } T_{\!Si}}}\Bigg/\sqrt{\sum _{i } \frac{1} {T_{Ai}}}\;. }$$
(9.189)

Equal-power weighting usually provides sensitivity within a few percent of optimum weighting.

With optimum weighting in the signal combination, all antennas make some contribution to increasing the signal-to-noise ratio. With other weighting, the overall sensitivity may be improved by omitting antennas with poor performance. Moran (1989) has investigated this effect for equal-power weighting. To simplify the situation, it was assumed that T A is the same for all antennas and only T S varies. Consider an array undergoing an upgrade of the receiver input stages, in which a fraction n 1n a have been refitted with new input stages that reduce the system temperature from T S to T S ξ. After a certain fraction of the antennas have been refitted, the array sensitivity is improved by omitting the unimproved antennas because their input stages are noisier. When T A does not vary, we can represent the signal voltage received by each antenna by V, and Eq. (9.189) for equal-power weighting becomes

$$\displaystyle{ \mathcal{R}_{\mathrm{sn2}} = \frac{V } {\sqrt{N}}\sum _{i} \frac{1} {\sqrt{T_{\!Si}}}\;. }$$
(9.190)

Thus, we can write

$$\displaystyle{ \frac{\mathcal{R}_{\mathrm{sn2}}(n_{1}\mathrm{\ refits\ only})} {\mathcal{R}_{\mathrm{sn2}}(\mathrm{all}\ n_{a}\ \mathrm{antennas})} = \frac{1} {\sqrt{n_{1}}}\left ( \frac{n_{1}\sqrt{\xi }} {\sqrt{T_{\!S}}}\right )\Biggm / \frac{1} {\sqrt{n_{a}}}\left ( \frac{n_{1}\sqrt{\xi }} {\sqrt{T_{\!S}}} + \frac{n_{a} - n_{1}} {\sqrt{T_{\!S}}} \right )\;. }$$
(9.191)

The unimproved antennas should be omitted if the expression above is greater than unity, which occurs for

$$\displaystyle{ \frac{n_{1}} {n_{a}} > \left (\frac{\sqrt{\xi }} {2} + \sqrt{1 - \sqrt{ \xi } + \frac{\xi } {4}}\right )^{-2}\;. }$$
(9.192)

Figure 9.26 shows n 1n a as a function of ξ. Thus, for example, if the refitting reduces T S by a factor of six, then when about half the antennas have been refitted, the others should be omitted. However, unless ξ > 4, all antennas should be retained. In practice, a factor of four would be an unusually big improvement, so it can be concluded that omitting antennas is rarely useful. A similar analysis based on Eq. (9.188) shows that with optimum weighting, the sensitivity is never improved by omitting antennas.

Fig. 9.26
figure 26

The fraction of antennas, n 1n a , in a phased array with equal-power weighting, for which the system temperature must be reduced by a factor ξ before the remaining antennas should be omitted. From Moran (1989), © Kluwer Academic Publishers. With kind permission from Springer Science and Business Media.

For VLBI, the output of a phased array is usually requantized to fit the recording format. The first quantization of the signals, before they are combined, introduces quantization noise that, after combination, has a probability distribution that tends to Gaussian as the number of antennas becomes large. Thus, for such arrays, the additional loss in sensitivity in requantizing is close to the values of η Q derived in Chap. 8, for which Gaussian noise is assumed. For other cases, see Kokkeler et al. (2001).

The phasing of the SMA is described by Young et al. (2016) and of ALMA by Baudry et al. (2012). An example of a phased array in operation is shown in Fig. 9.27.

Fig. 9.27
figure 27

Examples of the phased-array operation of the SMA at 280 GHz on the source 3C354.3, which had a flux density of 10 Jy. The phasing efficiency is the ratio of the sum of the pairwise visibilities divided by their scaler sum. Seven antennas of the SMA were used in the extended configuration with baselines between 44 and 226 m. The weather conditions were: clear sky, 1.3 mm of precipitable water vapor, and wind speed of 2 m/s. The elevation angle range was 44–50 in the left panel and 65–71 in the right panel. The improving atmospheric stability with increasing elevation angle and time since sunset is evident. Adapted from Young et al. (2016).

9.10 Orbiting VLBI (OVLBI)

The basic requirements for a VLBI station, whether orbiting or terrestrial, include a timing system so that the time associated with each digital sample of the received signal is recoverable, and a position for the antenna known with sufficient accuracy that the fringe frequency (but not necessarily the fringe phase) can be determined. The timing system must be stable to a fraction of the period of the received signal frequency over a coherence time of tens or hundreds of seconds. If it is not possible to put a precise frequency standard on a satellite, then a timing link of equivalent stability must be implemented. Establishing this timing system, which provides the local oscillators and the sampling clock at the satellite, is a major technical challenge in OVLBI. The radial motion of the satellite introduces Doppler shifts, and the tangential motion causes the link path to move relative to the atmospheric irregularities. One or more reference frequencies are transmitted to the satellite over a radio link. The position of the satellite at any time is known from standard orbit-tracking procedures to an accuracy of some tens of meters. This is sufficient to determine the (u, v) coordinates of the baseline but not sufficient for the timing accuracy required. To solve the timing problem, a round-trip phase system implemented by radio link is required. This is identical in principle to the round-trip systems for cables discussed in Sect. 7.2. A discussion of the basic requirements of the timing system is given by D’Addario (1991).

Figure 9.28 shows a simplified example of a system at the satellite and Earth station, which illustrates the essential functions. In this case, a frequency standard is not included in the satellite. A frequency standard in the Earth station provides a reference frequency to synthesizer S x , from which a signal is transmitted to the satellite. This signal provides a reference for synthesizers S y , S L , and S s that produce signals for the round-trip phase measurement, the local oscillator (LO) of the radio astronomy receiver, and the sampling clock, respectively. The signal from S y is radiated to the Earth station, where its phase is compared in a correlator with a locally generated signal at the same frequency. The correlator output is a measure of Δ τ, the change in the time delay of the round-trip path. The signal from the radio telescope on the spacecraft goes to a low-noise amplifier, a filter, and a mixer in which it is converted to intermediate frequency (IF) by the LO signal from S L . The IF signal then goes to an IF amplifier, a sampler (represented by a switch), and a quantizer, Q(x). The counter n is driven by the sampler clock signal from synthesizer S s and provides timing signals. These provide a record of when each data point was taken, information for formatting the data, and other timing functions required on the satellite. The counter n g provides timing at the ground location. Some complications with the operation of the scheme just outlined are:

  1. 1.

    The round-trip phase measures the length of the round-trip path with an ambiguity of an integral number of wavelengths. It provides a measure of changes in path length that are continuous.

  2. 2.

    Unless the frequencies generated by the three synthesizers at the satellite are harmonics of one or more reference frequencies supplied (so that no frequency division is necessary in the synthesizers), then the phases of the frequencies will be ambiguous.

  3. 3.

    The transmission times for the reference frequencies and the data may differ because of dispersion in the path or differences in the electronics.

Fig. 9.28
figure 28

Simplified block diagram of the basic signal transmission and processing required on an OVLBI spacecraft and at the Earth station. See text for further explanation. © 1991 IEEE. Reprinted, with permission, from L. R. D’Addario (1991).

These limitations cause problems when there are discontinuities in the link contact between the satellite and the Earth station. If there is continuous contact during an observing period, then once fringes are found, the combined effect of the ambiguities is determined. The continuous monitoring of the variation of the path enables the solution to be extended throughout the observing period. However, if signal contact is lost due to interference, atmospheric effects, or equipment problems, phase-locked loops in the synthesizers lose lock, and a phase discontinuity will result when the signals are regained. If the round-trip tracking is interrupted for a long period, another fringe search of the data may be required.

For any round-trip measurement, use of the same frequency in both directions would simplify the determination of the one-way propagation time, since the effects of dispersion would be largely eliminated. This would be technically feasible with time sharing or a very small frequency offset to allow signals in the two directions to be separated. However, the international radio regulations usually allocate different frequency bands for the two directions of transmission. Measurement of the round-trip path at two frequencies is therefore important in determining the relative contributions of the neutral and ionized media to the propagation time. If a high-stability frequency standard is included on a satellite, it could serve as the primary clock or as a backup to a radio-link timing system to help keep time at the satellite during link dropouts. Relativistic effects are a complication in the use of an onboard clock, causing its time to vary with respect to Earth-station clocks as the satellite moves through regions of differing strength of the Earth’s gravitational field (Ashby and Allan 1979; Vessot 1991).

The first OVLBI experiment was carried out with a satellite in the NASA Tracking and Data Relay Satellite System (TDRSS), which was adapted for VLBI use (Levy et al. 19861989). The purpose of this geostationary satellite was to relay signals from low Earth orbit to ground stations. It was equipped with two 4.9-m-diameter antennas, both with receivers at 2.3 and 15 GHz, and an up–down link communication system at 15.0 and 13.7 GHz. One of the 4.9-m-diameter antennas was used to receive the astronomical signals. The experiment provided limited astronomical data [see Linfield et al. (19891990)] but proved to be an invaluable test bed for time and phase transfer techniques as well as data recovery and processing methods. It was necessary to time-tag the data at the ground station, and hence, the satellite range was part of the interferometer delay. The onboard oscillators were phase-locked via the timing link, much as described in the previous paragraph. However, the coherence of the interferometer was greatly improved by using the second 4.9-m-diameter antenna as part of a separate two-way link at 2.278 GHz. The coherence of the interferometer at 2.3 GHz was found to be 0.98, 0.95, and 0.94 for integration times of 100, 200, and 700 s, respectively. This shows the effective Allan variance of the whole interferometer system of better than 10−13 (see discussion in Sect. 9.5.2).

The first satellite specifically designed for use as an orbiting element in a VLBI array was the HALCA satellite (VSOP project), launched in 1997, followed by the Spektr-R satellite (RadioAstron project), launched in 2011. Some of the key specifications of these satellites are listed in Table 9.9. Typical (u, v) plane tracks are shown in Fig. 5.22 RadioAstron has an onboard hydrogen maser frequency standard so that the timing transfer link is not required to synchronize the local oscillator. However, the search for fringes can be a significant task. With orbit position and velocity uncertainty of ± 500 m and 20 mm/s, the delay uncertainty is about 30 ns, or the equivalent of about ± 2000 delay steps, and the fringe rate uncertainty is ± 3 Hz at 6-cm wavelength. The processing must also include a fringe acceleration term. A description of a lag-type correlator designed specifically to include OVLBI stations is given by Carlson et al. (1999).

Table 9.9 Parameters of orbiting VLBI stations

9.11 Satellite Positioning

The three-dimensional locations of geostationary satellites can be determined with a VLBI array because they lie within its near field (see Section 15.1.3). To understand the sensitivity of a VLBI array to the range, or altitude, of a satellite, consider the simplified geometry shown in Fig. 9.29. In this exercise, the satellite is directly overhead at station 1 of a three-station linear array whose baseline is normal to the direction to the satellite. As a result of being in the near field, the curvature of the spherical wavefront of a broadband-transmitted signal from the satellite can be measured. Note that at least three stations are required, because with only two stations, wavefront tilt cannot be distinguished from wavefront curvature. For the purpose of this exercise, we assume that the bandwidth of the transmitted signal is broad enough that delays can be measured accurately since the signal-to-noise ratio can be expected to be very high. The accuracy of the measurement of R will be limited by the effects of the atmosphere and ionosphere. Phase measurement could also be used but might be subject to phase ambiguities.

Fig. 9.29
figure 29

Simplified geometry for tracking a geostationary satellite with a three-station VLBI array. Because the satellite will be in the near field of the array, i.e., R ≪ D 2λ for typical values of D and λ, the wavefront curvature can be measured.

The excess geometric path length, x, to station 2 or 3 with respect to station 1 is determined by the relation (R + x)2 = R 2 + D 2. To first order, x ≃ D 2∕2R, and the delay is τ = xc = D 2∕2R c.

Taking the differential of this expression gives the result that the sensitivity of the delay, Δ τ, to the sensitivity in range, Δ R, is

$$\displaystyle{ \varDelta \tau = \frac{1} {2c}\left (\frac{D} {R}\right )^{2}\varDelta R\;. }$$
(9.193)

This expression can be used to determine the range accuracy, given the accuracy of the delay measurement.

Now consider the limitations imposed by the atmosphere. Normal astrometric positioning can be done to an accuracy of σ θ , which implies an uncertainty in delay of σ τ  ≃ D σ θ c. Hence, the uncertainty in range σ R is given by

$$\displaystyle{ \sigma _{R} = \frac{2R^{2}} {D} \sigma _{\theta }\;, }$$
(9.194)

while the uncertainty in the transverse direction, \(\sigma _{R_{T}}\), is

$$\displaystyle{ \sigma _{R_{T}} = R\sigma _{\theta }\;. }$$
(9.195)

Hence, the relative accuracy of the longitudinal and transverse position is

$$\displaystyle{ \frac{\sigma _{R}} {\sigma _{R_{T}}} \simeq 2\left (\frac{R} {D}\right )\;. }$$
(9.196)

Consider the following example, with parameters R = 36, 000 km, D = 3600 km, λ = 3 cm (a typical wavelength for geostationary satellites), and σ θ  = 100 μas. From the above equations, we obtain σ τ  = 0. 2 ns (which corresponds to an rms phase uncertainty of about 20), σ R  = 40 cm, \(\sigma _{R_{T}} = 2\) cm, and \(\sigma _{R}/\sigma _{R_{T}} = 40\). The position can be determined from a single short observation without reliance on Earth rotation. The velocity of the satellite can be determined from the rate of change of position parameters. For an operational system, at least four systems are required since three are needed to define a reference plane. The earliest attempt to measure a satellite position with VLBI was done by Preston et al. (1972).