Improvements and comparison of VLBI INT2 and INT3 session performance

One of the core products of VLBI is the rapid determination of the Earth rotation parameter, expressed through dUT1. Multiple so-called Intensive observing programs exist that are observing dUT1 on a regular basis. Within this work, a detailed overview over the last five years of the VLBI Intensive observing programs INT2 and INT3 is provided. INT2 sessions are typically observed with a single baseline using a recording rate of 256Mbps\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${256}\,\hbox {Mbps}$$\end{document} while INT3 sessions are multi-baseline Intensives with up to five stations and a recording rate of 1Gbps\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${1}\,\hbox {Gbps}$$\end{document}. The median dUT1 precision estimated from INT2 sessions is 10.5μs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${10.5}\,{\mu }\hbox {s}$$\end{document} while it is 5.9μs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${5.9}\,\mu \hbox {s}$$\end{document} for INT3 sessions. The best performing INT2 baseline is between station MK-VLBA and WETTZELL with a median dUT1 formal error of 6.4μs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${6.4}\,\mu \hbox {s}$$\end{document} and showing only a small bias of -2.5μs\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${-2.5}\,\mu \hbox {s}$$\end{document} w.r.t. the JPL EOP2 series. Starting in 2019, the scheduling strategy of the INT3 sessions was significantly changed, leading to a reduction in the estimated average dUT1 formal errors by 25 % for 4-station sessions and 45 % for 5-stations sessions. Mid-2020, the same change was performed for INT2 sessions, leading to a reduction in the average dUT1 mean formal error of up to 44 % for the baseline between MK-VLBA and WETTZELL. It is further revealed that the precision of single-baseline INT3 analysis is not significantly better than its INT2 counterpart, although a four-times higher data-rate is used. The reason for this is differences in scheduling optimization. On average, the mean formal error of the best single-baseline INT3 analysis is 50 % higher compared to utilizing the whole network. Besides analyzing dUT1 formal errors, the latency of the dUT1 results is compared for three analysis centers. For INT3 sessions, results are typically available within 24 hours, while it takes two to three days for INT2 sessions, due to observations occurring on weekends. Overall, this work provides detailed insight into the INT2 and INT3 session performances while revealing a strong positive trend in the precision of dUT1 measurements over the last years due to changes in the scheduling strategy.


Introduction
The rapid determination of the Earth rotation parameter is one of the core objectives of geodetic VLBI. Therefore, specifically designed so-called Intensive sessions are observed every day (Robertson et al. 1985). These sessions aim to determine dUT1, the difference between Universal Time (UT1), defined by the Earth's rotation angle w.r.t. a celestial reference frame, and the Coordinated Universal Time (UTC), defined by a network of atomic clocks. Typically, Intensive sessions are one hour long and are observed on a single baseline with a low recording rate to reduce the amount of data that has to be transferred and processed, ensuring a quick release of dUT1 estimates. Due to the short duration and small network size, only a limited amount of parameters can be estimated during the analysis. Depending on the session, analysis center (AC) and software, these typically include the dUT1 offset, an offset or linear drift of the tropospheric zenith wet delay, and a linear or quadratic clock model.
Within the International VLBI Service for Geodesy and Astronomy (IVS) (Nothnagel et al. 2017), several Intensive observing programs exist.
From Monday to Friday, INT1 sessions are observed using a 128 Mbps observing mode. Over the last years, several improvements for INT1 sessions have been published. For example, Baver et al. (2012); Gipson (2014, 2020) studied improvements gained by using different radio source lists. In the latest work, a source list, called "Balanced 50" is proposed that improves the weighted dUT1 formal error by 2.6 μs compared to previous source lists.
On Saturday and Sunday, INT2 sessions are observed using a 256 Mbps observing mode. While most IVS observing modes typically distribute the observed bandwidth over 10 channels in X-band and 6 channels in S-band with one-bit sampling, observations on the baseline MK-VLBA (USA) and WETTZELL (Germany) are using 4 channels in Xband and 4 channels in S-band with two-bit quantification. With the release of a new VLBI scheduling software called VieSched++ , the INT2 Intensive scheduling strategy was changed in 2020, leading to significant improvements as will be reported in this work.
Additionally, every Monday INT3 sessions are performed using a 1 Gbps observing mode. INT3 sessions are typically observed with a larger network of three to five stations, leading to three to ten baselines, although not all of these baselines are sensitive to dUT1 changes. Besides estimating dUT1 for the full network simultaneously, some analysis centers also provided single baseline dUT1 estimates from INT3 sessions. In 2019, the scheduling strategy of INT3 sessions was also changed, similarly as done for INT2 sessions.
Additionally, so-called INT9 sessions between Wettzell and AGGO (Argentina) are observed, utilizing a 1 Gbps observing mode (Plötz et al. 2019). Although the baseline is not necessarily optimal for Intensive observations, and station AGGO has a lower sensitivity compared to other telescopes, it was proposed that a mean formal error of 30 μs might be feasible with an increased session duration of two hours.
Currently, almost all Intensive sessions are observed using telescopes located in the Northern Hemisphere. The only exception is INT9 sessions, with a baseline between Germany and Argentina. In 2020, a Southern Hemisphere Intensive program was newly established. These Intensive sessions are now observed weekly, between station HART15M (South Africa), and the AuScope (Lovell et al. 2013) stations HOBART12 (Australia), and YARRA12M (Australia).
Besides that, several other national Intensive programs exist. For example, the Very Long Baseline Array (VLBA) has observed Intensive sessions since 2011 (Geiger et al. 2019), while the Russian Quasar network (Shuygina et al. 2019) has observed daily Intensive sessions since 2012 (Kurdubov and Melnikov 2014).
With the development of the VLBI Global Observing System (VGOS) (Petrachenko et al. 2012;Niell et al. 2018), new Intensive baselines and observing programs were created, observing with 8 Gbps distributed over four bands.
The first VGOS Intensive program was scheduled from December 2019 until March 2020. These so-called VGOS-B sessions were observed between the Onsala twin telescopes (Sweden) and ISHIOKA (Japan) (Haas et al. 2021). They derived dUT1 results with a mean formal error of 4.5 μs and showed good agreement w.r.t. the IERS Bulletin B series. Based on their analysis, the VGOS Intensives achieved three to four times lower formal uncertainties in dUT1 compared to the legacy S/X-band sessions that were observed simultaneously. From November 2020 until May 2021, further VGOS-B sessions were performed at the same baseline and even extended by VGOS-C sessions.
Another VGOS Intensive program, called V2, started in March 2020 and was observed between station KOKEE12M (USA) and WETTZ13S (Germany). Preliminary analysis results presented by Mondal et al. (2021) depict a dUT1 mean formal error of about 5.9 μs with a μs-level bias between VGOS and S/X dUT1 estimates, highlighting good agreement between the different networks. However, they reported that a further investigation of the differences between VGOS and S/X is needed.
Over the last years, conceptual improvements of Intensive sessions were studied as well. While some studies focused on improving dUT1 accuracy by tagging along a third station (Kareinen et al. 2017), other research focused on improving Intensive sessions through a special scheduling approach for twin telescopes (Leek et al. 2015). Recently, a study by  identified optimal VLBI Intensive baseline geometry based on large-scale Monte-Carlo simulations of artificial VGOS telescopes located on a regular latitude-longitude grid.
The objective of this paper is to provide a detailed overview over the current status of the INT2 and INT3 Intensive programs operated by the IVS Operation Center DACH since this has not been studied in previous literature, as well as to report on improvements gained by changing the scheduling strategy. By investigating analysis results of 621 INT2 and INT3 sessions from several IVS ACs, comparisons between mean formal errors of individual baselines and networks are highlighted in Sect. 3.1. Additionally, INT3 single baseline analysis results are compared in Section 3.2. Comparisons w.r.t. JPL EOP2 and the IERS C04 series are presented in Section 3.3 and an evaluation of the latency between observation start and published analysis report is performed in Sect. 3.4. Finally, some station-, baseline-and source-based statistics are presented in Section 3.5, Section 3.6, and Section 3.7, respectively.
A special emphasis is laid on comparisons of improvements gained by changing the scheduling strategy over the last years, as discussed in Sect. 4, while Sect. 5 concludes the work.

Data
The analysis of the INT2 and INT3 session performance is done based on the publicly available files at the IVS data centers 1 . This work focuses on the performance of all sessions after the replacement of station TSUKUB32 with ISHIOKA at the end of 2016 until session Q21226 (INT2) and Q21214 (INT3) in August 2021, which are the most recent sessions at the time the manuscript was written. In total, there have been 623 Intensive sessions, 446 INT2 and 177 INT3. While the INT2 session is mainly observed on a single baseline, the INT3 sessions are mostly observed using an extended network of up to five stations. Figure 1 depicts a map of the stations that are most frequently participating in INT2 and INT3 session, namely ISHIOKA (Is), KOKEE (Kk), MK-VLBA (Mk), NYALES20 (Ny), SESHAN25 (Sh), WETTZ13N (Wn), and WETTZELL (Wz). All of these stations are located in the northern hemisphere.
The distributed schedule files were used to gain information about the scheduled station network. Based on the schedule files, operation notes files were generated using VieSched++ and parsed to calculate statistics regarding the number of observations, the number of observed sources, the sky-coverage scores and other metrics.
The estimated dUT1 precision, defined through the dUT1 formal errors, was taken from the EOP-I files provided by various IVS ACs. The number of INT2 and INT3 sessions included in the EOP-I files varies between ACs: 1 https://ivscc.gsfc.nasa.gov/products-data/data.html.
-BKG (bkg2020a) (Engelhardt et al. 2021 While all EOP-I files are available at the IVS data centers 2 , the VIE solution was downloaded from their webpage 3 since they do not yet upload their solution to the IVS servers. In case of multi-baseline Intensives, some ACs such as GSF and USN provide solutions for the full network as well as singlebaseline solutions. Within this work, the full network solution is investigated, except for Sect. 3.2, where the single-baseline solutions are compared with the full network solution.
Finally, statistics regarding the station, source, and baseline performance were extracted from the so-called spoolfiles that are provided by GSF, USN and BKG.

dUT1 precision
The main objective of VLBI Intensive sessions is to obtain highly accurate dUT1 measurements. To quantify the dUT1 session precision, the formal errors listed within the EOP-I files of various ACs were compared.
While the correlation of the estimated dUT1 values is close to 1.00 between almost all ACs, Fig. 2 depicts the correlation of the reported dUT1 formal errors. The left plot displays the correlation coefficient while the right plot lists the number of corresponding sessions analyzed by both ACs. High correlation is visible between GSF and USN (0.91) and between GSF and BKG (0.74). This is not surprising since all three analysis centers are using the same software, namely Calc/Solve (Ma 1978;Bolotin et al. 2014), in the analysis, while VIE uses VieVS (Böhm et al. 2018), GSI uses C5++ (Hobiger et al. 2010), IAA uses OCCAM/GROSS (Malkin and Skurikhina 2005), and OPA uses Calc/Solve. It is also to note that the results from IAA w.r.t. the reported formal errors do not agree well with the rest of the analysis centers which has to be further investigated. OPA provides the least number of analyzed sessions, less than half compared to the other ACs. GSI only provides INT2 results. Later, in Section 3.3, correlations between the differences of the estimated dUT1 values and the JPL EOP2 and IERS C04 time series are further provided. For a more in-depth analysis of the session performances the GSF solution was selected since it agrees well with other solutions, while providing the most complete data set. Figure 3 depicts the dUT1 formal error, σ dUT1 , from the gsf2020a solution. In the left figure, the INT2 result is visualized. Overall, the mean σ dUT1 for INT2 sessions is 12.9 μs with a minimum of 3.8 μs for session Q21052 (MkWz) and a median of 10.5 μs. It can be seen that the precision is quite homogeneous for the IsWz baseline and the KkWz baseline. Starting mid of 2020, many sessions were observed using MkWz. Most of these sessions have lower dUT1 formal errors and less scatter in the dUT1 formal errors. The mean σ dUT1 for MkWz is 8.5 ± 5.7 μs, with ± denoting the single standard deviation, while it is 12.1 ± 6.1 μs for IsWz, and 14.6 ± 6.9 μs KkWz. The median σ dUT1 for MkWz is 6.4 μs, while it is 10.7 μs for IsWz and 13.7 μs for KkWz. As a reference, the average mean formal error from the roughly one thousand INT1 KkWz dUT1 estimates over the same period is at a comparable level, with 15.5 ± 8.8 μs after removing some obvious outliers.
In the work by , the theoretical Intensive baseline performance w.r.t. its geometry is analyzed based on simulations using VGOS-style telescopes. In that work, the baseline KkWz and MkWz performed equally well, while IsWz showed 25 % lower performance. Compared to the actual results, with IsWz performing 40 % worse compared to MkWz and KkWz performing 70 % worse compared to MkWz, it is obvious that the different baseline geometries cannot explain the differences in obtained dUT1 precision. The remaining difference can be explained by the antenna characteristics. The sensitivity of Mk is four times higher in X-band compared to Is and Kk. In S-band it is two times higher compared to Kk and four times higher compared to Is. Reasons for the increased sensitivity is the larger dish size of 25 meters for Mk compared to 20 meters for Kk and 13.2 meters for Is as well as different receiver electronics used at the stations. Although Mk has the lowest slew speed, the higher sensitivity and the resulting lower integration times result in a larger total number of observations per session, as can be seen later in Fig. 11, resulting in a higher dUT1 precision.
The right plot in Fig. 3 depicts the comparison of formal errors for the INT3 network. It can be seen that in general, these results are significantly more precise compared to the INT2 results. The mean σ dUT1 for INT3 sessions is 7.3 μs, with a minimum of 2.6 μs for session Q19315 (IsNyWnWz) and a median of 5.9 μs. However, one has to keep in mind that they are observed with a bigger network and higher data rate. One can also see that there might be a seasonal signal within the σ dUT1 values. The reason for this seasonal signal is unclear and has to be studied further. It might be explained by the different tropospheric conditions, especially w.r.t. water vapor. Another possible explanation would be a poorer visibility of strong, compact radio sources during these times of the year.

INT3 single baseline results
As already noted, the INT3 sessions are typically observed using a network with more than two stations and thus multiple baselines. In contrast to the full session analysis, some ACs, such as USN and GSF, provide single baseline solutions as well. They are used to gain a consistent dUT1 estimation time series per baseline. Figure 4 depicts the estimated average dUT1 formal errors from various INT3 single baseline solutions based on the GSF solution.
The left figure compares the average performance of the IsWz baseline from INT3 sessions with the average performance of the same baseline from INT2 sessions. Although INT3 is observed with a four times higher data-rate, the average dUT1 formal errors are only slightly improved. The mean σ dUT1 is 12.1 μs for INT3 sessions and 13.5 μs for INT2 sessions. The main reason why the INT3 IsWz results are not significantly better compared to the INT2 results is the difference in scheduling. The INT3 sessions are nowadays optimized for an analysis using the full network. This means that all stations are always observing all scans together leading to the highest possible number of observations, as further  It is known that for dUT1 estimation, observations of sources located in the cusps of the mutually visible sky are most valuable for achieving the highest quality results (Nothnagel and Campbell 1991;Uunila et al. 2012;Baver and Gipson 2015;. In practice, this means that observations conducted at low elevation for all stations are preferable. In the INT3 sessions, the mutually visible sky is not only restricted by two stations, as it is the case for INT2 sessions, but by all participating stations. This leads to a higher minimum elevation that can be observed and thus observations located further away from the cusps of the mutually visible sky from the single baseline. Additionally, in the INT3 sessions, all stations always have to wait for the slowest station to finish slewing until the next scan can start, leading to a lower total number of scans in case slower slewing telescopes participate. Some INT2 and INT3 stations have significantly lower slew rates compared to others. For example, station Sh has a slew rate of only 56 degrees per minute in azimuth and 30 degrees per minute in elevation, while station Wz achieves 240 degrees per minute in azimuth and 90 degrees per minute in elevation. Other stations such as Wn and Is are VGOS-style radio telescopes with a slew rate of 720 degrees per minute in azimuth and 360 (Wn), or 180 (Is) degrees per minute in elevation.
Overall, this leads to INT3 sessions that are less optimized for single baseline analysis compared to INT2 sessions, explaining the results of Fig. 4. However, it is to note that the full network analysis results still outperform the INT2 precision by far, as can be seen in Fig. 3.
The middle plot in Fig. 4 depicts the results of INT3 single baseline analysis grouped by the individual baselines. On average, the IsWn and IsWz baselines perform best, followed by ShWn and ShWz.
The right plot in Fig. 4 compares the best performing single baseline per INT3 session with the full network results. While the best performing single baseline result has an average σ dUT1 of 11.1 μs, the average full network INT3 σ dUT1 is 7.3 μs. Thus, the best single baseline precision is on average 50 % worse compared to the full network analysis.

Comparison with JPL EOP2 and IERS C04
To further assess the quality of the INT2 and INT3 results, Fig. 5 depicts the difference between the reported dUT1 estimate from the GSF Intensive analysis results and the dUT1 To interpolate the daily JPL EOP2 solution to the Intensive reference epochs, first, tidal effects with periods < 35 days, provided within the IERS conventions (Petit and Luzum 2010, Chapter 8), were subtracted, followed by a Lagrangian interpolation of order four to the Intensive reference epochs and re-adding of the previously subtracted tidal effects. Based on this analysis, it can be seen that most of the reported Intensive dUT1 results agree at the level of ±50 μs with the JPL EOP2 solution. Table 1 lists the biases and standard deviation (std) of the estimated dUT1 values w.r.t. JPL EOP2, grouped by indi- Regarding std, it can be seen that for all ACs, baseline MkWz has the best performance, further confirming that this baseline performs best.
For the INT3 sessions, there is not enough data to reliably estimate bias or std estimation. However, based on the GSF solution, it can be stated that most networks have a small negative bias of below 10 μs except of the network NyShWnWz with a positive bias of 13.8 μs. The std is between 8.2 μs and 16.2 μs for the networks including station Is, while it is significantly higher (≈ 35 μs) for networks without station Is.
A similar analysis was performed using the IERS EOP 14 CO4 series (Bizouard et al. 2018) as reference. However, In particular, the INT2 sessions analysis results of GSF during 2017 displayed a linear trend w.r.t. IERS C04 leading to a negative bias. In the year 2021, a positive bias is visible. Additionally, the dUT1 mean formal errors reported in the IERS C04 series had several jumps, as well as timedependent changes. For example, the reported uncertainties before the end of 2017 are very noisy compared to the reported values after 2018.
Nevertheless, Table 2 lists the biases and std w.r.t. IERS C04. Here, especially baseline KkWz has a very low bias in the solutions of all ACs. The std values are at the same level compared to the comparisons w.r.t. JPL EOP2, except for baseline MkWz, which shows significantly higher std w.r.t. IERS C04.
Finally, Fig. 7 depicts the weighted average correlation coefficients between the difference of the estimated INT2 dUT1 values and the JPL EOP2 solution (left) and IERS C04 (right) between various ACs. First, the difference between the estimated dUT1 value and JPL EOP2 and IERS C04 is calculated. Next, the correlation coefficient is calculated for each of the three INT2 baselines individually. Finally, the average correlation coefficient is calculated, weighted by the number of sessions observed on this baseline and analyzed by both ACs.
It can be seen that all ACs agree well with each other except of IAA. The reason why the correlation coefficients for IAA are lower has to be further investigated. In general, correlations w.r.t. IERS C04 are slightly higher compared to correlations w.r.t. JPL EOP2. However, it has to be noted that IERS C04 was likely selected as the a priori EOP series in the analysis, which might have a positive effect. It is also to note that some ACs, like OPA, submitted significantly fewer solutions as can be seen in Fig. 2.

Latency
One of the key requirements of VLBI intensive sessions is a short latency between observations and analysis results. Therefore, the sessions are usually only observed over one hour using a small network to ensure a fast turnaround time. Figure 8 depicts latency between the session start time and the release of the analysis report. The release time of the analysis report was taken from the spoolfiles uploaded to the IVS data centers. Within the spoolfiles, the local time is reported, which was converted to UTC. The spoolfiles are only provided by three ACs, namely GSF, USN and BKG. At the IVS data centers, the GSF and USN analysis center started uploading spoolfiles at the end of 2017. In total, GSF uploaded 484 spoolfiles while USN uploaded 404 spoolfiles. The BKG analysis center started uploading spoolfiles in mid Most of the INT3 sessions (right column) are analyzed within the first 24 hours, while most of the INT2 sessions (left column) take two to three days until the analysis is performed. The fast turnaround time for INT3 is especially noteworthy since these sessions are observed with a high data rate and a large number of stations, resulting in more data that has to be transferred and processed. Here, the cooperation of the station personnel, the correlator in Bonn (Bernhart et al. 2021) and the ACs works very well.
The delay in releasing the INT2 results can be explained by the session observing time. INT2 sessions are observed on Saturdays and Sundays, while the analysis is typically performed on Mondays. It can clearly be seen that the majority of sessions with a latency of 24-48 hours are sessions In Fig. 8, it is also visible that the solution of BKG tends to have a lower latency compared to GSF and USN. This might be explained by the EU and US time difference and the resulting different UTC working hours.
Finally, it is to note that this latency investigation only considers ACs submitting spoolfiles to the IVS. The latency of other ACs can be different. One noteworthy case is the GSI AC, which is responsible for the submission of the INT2 sessions. The GSI AC developed a fully automated pipeline for correlation and analysis of the INT2 sessions. The correlation is automatically performed using the "rapid_" programs (Takagi and Hayashi 2021b). The automated analysis of the INT2 sessions is performed using "rapid_c5pp," which runs c5++ (Hobiger et al. 2010) to estimate dUT1 with a latency of approximately one to two hours only (Takagi and Hayashi 2021a).

Station performance
In Table 3, statistics of station performances are collected.
The statistics are extracted from the spoolfiles uploaded to the IVS data centers with the exception of the number of scheduled observations, which is extracted from the schedule files directly since the reported values in the spoolfiles only contain the number of successfully observed and correlated observations and not the number of scheduled observations. Within the spoolfiles, the weighted root mean squared (wrms) error per station is provided. In Table 3, column "mean wrms [ps]" lists the average value of the station-wrms error from all sessions, while column "std wrms [ps]" lists their standard deviation.
It is to note that not all sessions are represented within this analysis because there are not spoolfiles for all sessions available. In particular, there are no spoolfiles for the earliest sessions included in this study. In case multiple analysis centers uploaded spoolfiles, the priority was set on reports provided by GSF, followed by USN, followed by BKG to ensure that only one spoolfile per session is used. In case multiple spoolfiles were uploaded by one AC, the latest version was used. In total, there are spoolfiles for 514 different sessions.
Based on this investigation, it can be seen that the lowest mean wrms was obtained by station Wz, followed by Mk and Ny. Station Wz also participated in the most sessions and was also obtaining the highest number of observations. Station Sh has a high wrms as well as a high scatter in its performance. It can also be seen that observations with Wn provided the lowest fraction of usable observations. However, this can mostly be explained by the local baseline WnWz (see Sect. 3.6), which was removed in the correlation and analysis in many sessions.

Baseline performance
Similar to Table 3, Table 4 lists baseline-based statistics obtained by analyzing the provided spoolfiles.
It can be seen that the Baseline NyWz and MkWz have the lowest mean wrms followed by IsWz. It can also be seen, that, by far, the most observations were conducted on the baseline IsWz due to the high number of INT2 sessions on this baseline.
The local baseline WnWz has the lowest fraction of observations used within the analysis with respect to scheduled observations. In early sessions, problems with the phase calibration signal during correlation led to a deselection of this baseline. This was later resolved by applying a notch-filter on the affected frequencies. For the determination of dUT1, the loss of the local baseline is not critical, since it does not contribute to dUT1 measurements. Fortunately, the baselines with the highest fraction of usable observations are the main Intensive baselines IsWz, KkWz and MkWz.
It is to note that the numbers in Table 3 and Table 4 do not match 100 % since sometimes stations were added in tagalong mode without changing the official schedule. However, the differences are minimal and do not affect the result.

Source Performance
Finally, some source-based statistics are provided within this section. In total, 108 different sources were observed in the INT2 and INT3 sessions. Figure 9 depicts the source distribution in right ascension and declination.
In the left plot, the marker-size corresponds to the number of sessions in which the source is observed, while the number of observations is color-coded. The majority of observations focus on high-declination sources, which is unsurprising given that all stations are located in the northern hemisphere as can be seen in Fig. 1. The right plot presents the fraction of usable observations per source as indicated by the marker size and the color codes used for the mean wrms. From the 108 observed sources, only three yields less than 50 % usable observations. On average, per source, 72 % of the observations are usable in the analysis. Table 5 lists the ten sources most and least often successfully observed within the INT2 and INT3 observing program.
Here, the imbalance of the observations per source is visible. The most observed source is 1803+784 followed by 0059+581. Both sources have a reasonable low mean wrms of below 40 ps. On the other side, some sources that are only rarely observed have a high mean wrms or a low fraction of usable observations. For example, source 0256-005 was scheduled in 89 observations out of which only nine were finally used in the analysis. Table 3 Station-based statistics. Column "mean wrms" lists the mean weighted root mean squared (wrms) error per station of individual INT2 and INT3 sessions. Column "std wrms" lists their standard deviation. Column "#sessions" lists the number of sessions in which the station participated. Column "#obs" lists the number of observations used within the analysis. Column "#sched" lists the number of scheduled observations. Column "frac" denotes the ratio of the number of observations used in the analysis and the number of scheduled observations  Two further tables where the sources are sorted by the mean wrms and by the fraction of used observations can be found in appendix.

Comparison of scheduling approaches
Originally, the INT2 and INT3 sessions were scheduled using sked (Gipson 2016). Since end of January 2019 (session Q19021), the INT3 sessions started to be scheduled using VieSched++. Since end of March 2020 (session Q20081), the INT2 sessions were scheduled using VieSched++ except of baseline MkWz, which started using VieSched++ mid of September 2020 (session Q20256). From mid-2020 onward, all INT2 and INT3 sessions were scheduled fully automatically using AI-based scheduling parameter selection described in . Starting in the year 2021, the INT2 and INT3 sessions have been officially assigned to the newly created IVS operations center DACH. Overall, 405 sessions were scheduled using sked and, so far, 221 sessions have been scheduled using VieSched++.
With the change of the scheduling software also came a change in the scheduling strategy. The new VieSched++ schedules are generated using a special Intensive scheduling algorithm, described in , Appendix A. This algorithm puts special favor on observations of sources located in the cusps of the mutually visible sky, as already discussed in Sect. 3.2. Additionally, for every session not only one schedule, but over one thousand schedules are generated. Every schedule is further simulated one thousand times. From this pool of possible schedules, the best one is selected by comparing the simulated precision of dUT1 based on its mean dUT1 formal error and its repeatability value, as well as the number of scheduled observations and the average station sky-coverage score.
The sky-coverage score is a metric used to quantify how equally distributed the observations are w.r.t. their azimuth and elevation angles. During the analysis, equally distributed observations, especially at different elevation angles, help to distinguish delays caused by tropospheric influences e.g., from delays caused by clock drifts (Pany et al. 2011;Nothnagel et al. 2002). The sky-coverage score is calculated as discussed in Schartner and Böhm (2020). First, the visible sky above each station is distributed into 37 areas based on azimuth and elevation using two different approaches. Next, the first 30 minutes of observations are selected and the number of areas in which observations are occurring is counted. This is repeated for observations between minutes 15 and 45 as well as 30 and 60. Finally, the number of observed areas is counted and normalized by the total number of areas. Thus, the sky-coverage score is between zero and one, with one indicating a perfect sky-coverage where every tested time interval includes observations within every area.
In the next subsections, comparisons between the performance of sessions scheduled with sked and sessions scheduled with VieSched++ using the new scheduling strategy are discussed.

INT2
Within the INT2 observing program, the most observed INT2 networks are IsWz, KkWz, and MkWz. All of these baselines were scheduled at least 20 times using both, sked and VieSched++. This offers the possibility to compare the session performance based on scheduling strategies. Figure 10 depicts the average σ dUT1 per baseline as reported in the GSF EOP-I file.
In all cases, the sessions scheduled with VieSched++ using the new strategy provided higher precision of the dUT1 estimates, compared to the sessions scheduled with sked. For the IsWz baseline, the σ dUT1 were reduced by 11 %, from 12.4 ± 6.3 μs to 11.0 ± 4.3 μs. In case of the KkWz baseline, the improvement is 32 %, from 17.2 ± 7.7 μs to 11.7 ± 4.6 μs, while it is 44 % for baseline MkWz, from 11.7 ± 5.7 μs to 6.6 ± 4.7 μs.  Figure 11 gives an explanation how these improvements were achieved.
The top row depicts the number of observed sources per session, the middle row depicts the number of observations per sessions, the bottom row depicts the sky-coverage score per session, as discussed previously. For a better visual comparison, normalized histograms are displayed.
The first column depicts the results of the IsWz baseline. On average, the number of different sources that are observed per session is increased by 34 %. This is advantageous since it means that unmodeled effects caused by source structure are better averaged out and a session is less sensitive to source-losses in the analysis. Additionally, the number of observations is on average increased by 26 % while the sky-coverage score is improved by 28 %.
For the KkWz baseline (second column in Fig. 11), the main difference is the number of observations per session which was improved by a factor of two. Additionally, the number of different observed sources per session as well as the sky-coverage score was slightly increased as well. The significant difference in number of observations might be explained by the different source selection of the two scheduling software packages leading to different observing duration per scan, as well as the different sequence of scans leading to different slew times between scans.
For the MkWz baseline (third column in Fig. 11), the situation is similar. Here, the improvement in terms of number of observations per session is 71 % while the number of observed sources and the sky-coverage score is not increased.

INT3
Within the INT3 observing program, the station network changes frequently. Based on the GSF EOP-I record, there are three networks that were scheduled at least 5 times with VieSched++ and sked, namely IsNyWz, IsNyWnWz, and IsNyShWnWz. Figure 12 depicts the dUT1 precision of these three networks.
Based on this analysis, the reported σ dUT1 for IsNyWnWz sessions could be reduced by 25 %, from 6.1 ± 2.0 μs to 4.5 ± 1.0 μs. For the network IsNyShWnWz, the improvement is 45 %, from 6.3 ± 1.7 μs to 3.5 ± 0.5 μs. However, it is to note that the sample size for the IsNyShWnWz network is relatively small with less than ten session each.
In case of the three-station network IsNyWz the result is 12 % worse using the new scheduling strategy. The average mean formal errors increased from 6.4 ± 1.9 μs to 7.2 ± 2.3 μs. Here, it is to note that for the IsNyWz network, only three out of the seven sessions were originally scheduled with these three stations for the sked-sessions, and only four out of twelve sessions were originally scheduled with these three stations for the VieSched++-sessions. The remaining sessions were scheduled using a large network with some stations dropping out, typically due to technical problems. Therefore, the scheduling was based on more stations leading to sub-optimal results during the analysis. This means that the results of the IsNyWz network do not represent the scheduling performance adequately. In all of these cases, station Wn was also scheduled, but failed to observe successfully.
To better understand the origin of the improvements, Fig. 13 depicts the number of different sources per session, the number of observations per session and the sky-coverage score per session, similar as Fig. 11.
It can be seen that especially the number of observations per session (middle row) was increased substantially. The improvement is 34 % for IsNyWz, 53 % for IsNyWnWz, and 86 % for IsNyShWnWz. The big improvement w.r.t. number of observations for the IsNyShWnWz network can be explained by the number of stations participating per scan, as depicted in Fig. 14.
For the VieSched++ schedules, almost all scans were observed with all five stations. In the sked schedules, 60 % of the scans were observed with only two or three stations. It is to note that also in sked, it would have been possible to force the scheduling software to schedule all scans with all stations together. Thus, the difference we are seeing here is mostly a difference in the scheduling strategy and not in the scheduling software.
Surprisingly, observing all scans with five stations does not lead to decrease in the sky-coverage score as one would expect. As can be seen in Fig. 13, the average sky-coverage score is even slightly improved by 4 %. Keeping in mind that the reported σ dUT1 are 82 % higher using the sked scheduling approach compared to the VieSched++ scheduling approach, one can conclude that observing all scans with all stations leads to better dUT1 estimates in general.

Fig. 11
Normalized histograms of number of scheduled sources, number of scheduled observations, and sky-coverage score for selected INT2 baselines grouped by schedules generated using sked and schedules generated using VieSched++

Conclusion
Within this work, a detailed progress report on improving the dUT1 estimates from INT2 and INT3 sessions is provided. The average dUT1 mean formal error for INT2 as reported in the GSF EOP-I file (gsf2020a) is 12.9 μs for INT2 sessions, while it is 7.3 μs for INT3 sessions.
More detailed analysis revealed that for INT2 session, especially observations on the baseline MkWz lead to highly precise dUT1 measurements with a mean dUT1 formal error of 8.5 μs and a median of 6.4 μs based on the GSF results. In comparison to the JPL EOP2 series, the estimated dUT1 values agree well and only a small μs-bias is present. While it is evident that INT3 sessions outperform INT2 sessions w.r.t. the estimated dUT1 formal errors, the biases and scatter between the estimated dUT1 values and the JPL EOP2 reference series are at the same level. It is also revealed that for the INT2 baselines, different biases exist between the different ACs. Furthermore, these biases are also different w.r.t. IERS C04.
Regarding the latency of the released results, it is revealed that the INT3 result is typically available within few hours. The low latency for INT3 is especially noteworthy since these sessions are observed with a larger network and using a 1 Gbps observing mode. This highlights that the processing Fig. 12 Reported dUT1 precision for different INT3 networks (observed in at least 5+ sessions) scheduled with sked and VieSched++ as well as the number of observed sessions. The dashed areas mark sessions that were originally scheduled with a larger network but only analyzed using the listed stations chain from observation, correlation to analysis works very well for INT3 sessions. For INT2 sessions, the latency from the AC GSI is reported to be around one to two hours (Takagi and Hayashi 2021a, b) since a fully automated analysis pipeline is used. For AC where a manual analysis is conducted, the latency is two to three days. The longer delay in INT2 results is explained by the fact that the sessions Fig. 13 Normalized histograms of number of scheduled sources, number of scheduled observations, and sky-coverage score for selected INT3 networks grouped by schedules generated using sked and schedules generated using VieSched++ are observed on weekends, while mostly being analyzed on Mondays. Improved automation in the VLBI analysis might help to reduce the INT2 latency by allowing the analysis to be carried out without the need for human interaction.
Analysis of the single baseline results for INT3 sessions reveals that no significant improvement in dUT1 precision is gained for baseline IsWz comparing INT2 and INT3, although four times higher data rate is used. This can be explained by the differences in scheduling strategies, where for the INT3 sessions, all participating stations restrict the area of the commonly visible sky and slew times, while for INT2, the schedule is optimized particularly for the IsWz baseline.
Furthermore, some station-, source-, and baselinedependent statistics are provided. It is highlighted that the main Intensive baselines IsWz, KkWz, and MkWz perform well. The local baseline WnWz was often deselected in the analysis leading to a low fraction of used observations in the analysis. In future, the Intensive source list can be revisited and potentially improved based on the source-based statistics by removing some of the poorly performing sources in favor of others.
Finally, improvements regarding the scheduling strategy are highlighted. For almost all networks and baselines, the new scheduling strategy outperforms the old one and leads to significantly better mean formal errors during the analysis. For the INT2 baseline IsWz, the mean formal error was reduced by 11 %, while it was reduced by 32 % for KkWz and 44 % for MkWz. For the INT3 networks, IsNyWnWz showed an improvement of 25 % while IsNyShWnWz got improved by 45 %. Only the mean formal errors of IsNyWz increased by a total of 12 %, although it is to note that the majority of these sessions were scheduled with an additional fourth station and thus the schedules were not optimized for the three-station analysis. Additionally, the number of observations could be doubled in some cases, while more different sources are observed and the observations are better distributed over the sky.
Overall, this has lead to a significant improvement in the dUT1 estimates from INT2 and INT3 sessions over the last years and highlights the good cooperation between the VLBI operation center, the correlator and the analysis centers.