Introduction

Shallow nitrogen-vacancy (NV) centres in diamond have been shown to be useful as sensors of weak fluctuating magnetic signals [5, 8, 16, 18, 20, 23, 25, 35, 38] and as a potential vehicle for enabling hyperpolarisation of nuclear spins external to the diamond [5, 9, 16, 34]. Much work to-date has focused on the use (and production) of near-surface single NVs that can sometimes exist in the required negative charge state within a few nanometres of the surface in spite of the unfavourable local Fermi-level position [7]. Increasingly, however, applications such as scaled-up hyperpolarisation [13, 32] and imaging of AC magnetic fields [4, 8, 22, 36, 40, 45] demand high-density ensembles of stable near-surface NVs. When sampling a large number of NVs, the impact of surface-induced band bending becomes clear with the NV depth distribution cutting off at 6–7 nm from the surface [6, 13]. Taken in combination with the expectation that vacancies produced by near-surface implants, required to form NV centres during a subsequent annealing process, will tend to out-diffuse to the surface [28, 30], NV yields in such ensembles are much lower than their bulk counterparts.

In the bulk-like regime (mean ensemble depth \(d_{\textrm{NV}}\) order 100 nm or more), where the creation of high-density NV ensembles is comparatively well developed, it has been shown that starting with N-rich (type Ib) diamond grown via the high-pressure high-temperature (HPHT) method and implanting with arbitrary ions is successful in producing a well-localised sensing layer with quantum properties that compete well with other methods based on high-quality chemical vapour deposition (CVD) growth [10,11,12, 14]. Extending these results to the near-surface regime is attractive due to the relative cost-efficiency and accessibility of this technique. Additionally, one may wonder whether the high nitrogen density of the bulk crystal is effective in combating the near-surface band bending. However, the issue of vacancy diffusion during annealing looms as an impediment to high-yield ensemble formation: under typical annealing temperatures (800–900 °C) various studies have shown that the vacancy diffusion length may extend as high as 300 nm [1, 26, 30]. Using a random walk model, Räcke et al. [30] showed that the reduced near-surface NV yield typically observed is largely explained by vacancy out-diffusion, even without taking the surface to be a vacancy attractor. Additionally, diffusion into the diamond is also problematic as the envisaged applications require that NVs be confined within \(\approx 10\) nm of the surface, although in this case we would hope that the actual diffusion length would fall well short of the theoretical upper bound due to substitutional N acting as efficient vacancy sinks.

In this study, we examine the merits of creating near-surface NV ensembles through ion implantation of commercially-sourced type Ib HPHT diamond, in view of the factors outlined above. By implanting diamonds containing distinct growth sectors (each with a characteristic native N density), we are able to control for the effect of the bulk N density to determine the role this plays on ensemble surface proximity and yield. We also implant at multiple depths (set by the implant energies) and with two levels of vacancy production (given by the implantation dose and atomic species) to assess the practical role of vacancy diffusion in the high-N regime. The quality of the ensembles produced is assessed by making measurements of the NV yield and their quantum coherence. We conclude with a discussion of the limitations of the study and the prospects for future work.

Results

A series of type Ib HPHT diamond substrates (purchased from Delaware Diamond Knives) containing sectors with varying levels of native nitrogen were subjected to ion implantation processes to form NV ensembles. To control for as many variables as possible and ensure comparisons between sectors are valid, only two different diamonds were used (initial size \(4\times 4\times 0.1\) mm). These diamonds were then laser cut into smaller pieces to undergo different preparations. The implant parameters were chosen to create vacancy profiles peaking at approximately 3, 4, and 5 nm from the diamond surface. The vacancy profiles were predicted using stopping range in matter (SRIM) simulations [44], shown in Fig. 1(a). Neglecting charge state and vacancy diffusion considerations, we expect to produce a uniform NV layer of width \(w_{\textrm{SRIM}}\), which extends from the surface to the depth where the vacancy production decreases below 50 ppm [dotted line in Fig. 1(a)], an approximate NV creation saturation threshold previously identified in the bulk regime [12]. The first set of implants chosen to meet this criteria were 16O at a dose of \(1\times 10^{12}\) cm−2 at energies of 2.5, 4, and 6 keV, respectively. A second set of implants designed to produce an order of magnitude more vacancies with similar depth profiles were 31P implants at a dose of \(5\times 10^{12}\) cm−2 with energies of 4, 7, and 11 keV, respectively (SRIM simulations in Fig. 2). In both cases, the implant species were chosen to be electron donors to the diamond lattice in an attempt to further offset the band bending from the surface, although these donors will be much fewer in number than nitrogen atoms in high-N sectors.

For the first set of implants we also controlled for the surface preparation. The as-purchased diamonds arrived with a polished surface finish (Ra \(<5\) nm) and an oxygen reactive ion etching (RIE) process can be used to remove polishing damage. For the O implants we only performed RIE on some of the substrates to see if the process made a difference to NV yield or quantum properties. The diamonds were then annealed to form NV ensembles and acid cleaned prior to measurement (see Methods for additional sample preparation details).

NV yield

To determine the NV yield in our samples, we used a confocal microscope to measure the photoluminescence (PL) count rate per unit area (filtering with a 660–735-nm band pass filter) and translated this to an areal NV density \(\sigma _{\textrm{NV}}\) by dividing by the PL given by a single NV centre under the same excitation and collection conditions. We can then consider two yield metrics: the conversion of native nitrogen and created vacancies to NV centres (dubbed N-to-NV and V-to-NV yields, respectively). The N-to-NV yield is given by the ratio [NV]/[N], where [NV]\(=\sigma _{\textrm{NV}} / w_{\textrm{SRIM}}\) and [N] is the native N density of a given growth sector. [N] was deduced by measuring the Hahn echo \(T_2\) and taking the relationship determined by Bauch et al. [3], where the \(T_2\) was measured away from the influence of the surface where possible and assuming that we can identify sectors with the same [N] through their bulk PL [see, e.g. Figure 1(b)]. We note that [N] could be overestimated if the nitrogen bath is not the dominant source of decoherence (most relevant for the less dense sectors) and that \(\sigma _{\textrm{NV}}\) could be overestimated by the presence of background fluorescence or PL due to the neutral NV charge state.

An example xz confocal scan is shown in Fig. 1(b). A well-defined NV layer is present at the diamond surface although the resolution of the scan is not high enough to determine if the layer’s extent matches the vacancy distribution predicted by SRIM. In this image we can see two growth sectors containing variable amounts of nitrogen: the right-hand sector (estimated nitrogen density [N]=50 ppm compared to 8 ppm for the left hand sector) has significant background PL away from the surface and the PL of the near-surface sensing layer also varies with the native nitrogen density. In both cases, however, the near-surface sensing layer PL greatly exceeds the background for a given sector, indicating locally increased NV conversion as expected.

Figure 1
figure 1

Creating shallow NV layers in type Ib diamond (a) SRIM simulations of the oxygen implants conducted, taking a 7\(^\circ\) angle of incidence. Phosphorus implants were also conducted at energies to approximately match the expected vacancy depth profile, but creating an order of magnitude more vacancies. (b) Confocal xz scan of one diamond sample, showing an NV layer localised at the surface. Two sectors are visible, with the left- and right-hand regions’ nitrogen content estimated at 8 and 50 ppm, respectively.

Figure 2(a) shows the vacancy distributions predicted by SRIM for the phosphorus implants conducted, showing similar distributions to the corresponding 16O implants but with an order of magnitude higher vacancy production, allowing us to probe the dependence of NV yield on vacancy density in the shallow regime. Figure 2(b) shows the simulated implanted ion distributions for these \(5 \times 10^{12}\) cm−2 31P implants, with the corresponding integrated (from the surface to a given depth) electron donor density, taking a native nitrogen density of 90 ppm which was the most dense sector probed. This quantity, displayed on the right axis, corresponds to the number of electrons donated to the diamond lattice (from N and P atoms) and that may be available to compensate for surface acceptor defects that result in band bending (and hence allow NV to be stable closer to the surface). Here we can see that, compared to the native nitrogen concentration, the implanted 31P ions are expected to have a relatively minor impact. The nitrogen donors are abundant enough in this case, however, to contribute \(\approx 10 ^{13}\) e/cm2 within 5 nm of the surface, which is in principle enough to compensate the lower end of the typical range of surface acceptor defects, \(10 ^{13}\)\(10 ^{14}\) cm−2 [37].

Figure 2
figure 2

Phosphorus implant simulations. (a) SRIM simulations of the vacancy distributions created by the \(^{31}\)P implants conducted. (b) Implanted ion distributions predicted by the same simulations, with the total integrated electron donor densities overlaid (right axis), assuming a native nitrogen concentration of 90 ppm. The grey line shows the no-implant case, where only the native nitrogen contributes.

Figure 3
figure 3

NV yield measurements (a) NV yield (N to- V) estimated as described in the text. (b) Plot of NVs created per vacancy, taking the NV yield as in (a) and comparing against the total vacancy production predicted by SRIM, again plotted versus nitrogen concentration. Data points in (a) and (b) outlined in black are phosphorus implants as described in text.

Figure 3(a) shows the computed N-to-NV yields plotted against the inferred native N density of a given diamond sector, with the marker colouring indicating the implant energies. For simplicity, in the presentation of these results as well as those that follow, we group 2.5-keV 16O and 4-keV 31P implants as “shallow” implants, 4-keV 16O and 7-keV 31P implants as the “middle” set, and 6-keV 16O and 11-keV 31P implants as “deeper” implants.

The highest yields are close to 2.5%; however, a majority of regions have yields of less than 1%, particularly for higher-N sectors and shallower implants. As we filter the PL for the negatively charged NV centre, these yields are not necessarily reflective of the total NV conversion but rather conversion to the charge state useful for sensing and hyperpolarisation applications. The reduced yields compared to deeper implants [12] therefore could be band bending induced or due to reduced creation efficiency independent of charge state. The yields in these samples are comparable to typical N implants [13] but do not appear to offer an advantage in general.

Looking at the V-to-NV yield (“vacancy yield”), plotted in Fig. 3(b), may give a clue as to the origin of the poor conversion. Taking the vacancy creation predicted by SRIM for each implant, we find that around \(10^{-3}\) NVs are created per vacancy implanted in most cases, consistent with the modelling of Räcke et al. [30] for the case of the diamond surface acting as a vacancy sink. The spread in vacancy yield may be due to variable surface termination, motivating further study into maintaining high-quality surface termination during annealing so as to keep more vacancies within the diamond. No obvious trends were present within our data based on the two surface preparations carried out, however.

The lacklustre vacancy conversion observed for the oxygen implants motivated additional implants to be carried out, using \(5\times 10^{13}\) cm−2 31P implants at energies designed to match the vacancy production profile of the oxygen implants (data points marked with black outlines). These phosphorus implants are expected to have produced an order of magnitude more vacancies; however, we find that the N-to-NV yield is not improved, meaning that the useful vacancy creation threshold identified in previous work [12] of around 50 ppm appears to be retained in this near-surface regime, despite overall lower NV creation efficiency. The interpretation may be that in this high vacancy production regime, the formation of multi-vacancy clusters is more predominant, which either anneal out or add a source of spin noise [42], and therefore the number of vacancies available to form NVs is not much greater.

NV depth

The mean depths of the ensembles created can be measured by taking NV nuclear magnetic resonance (NMR) measurements of a hydrogen target deposited on the diamond surface (in this case viscous immersion oil) [29], see example spectrum showing the appearance of the hydrogen (1H) resonance in Fig. 4(a). For these measurements (and all to follow), we use a widefield microscope optimised for high-sensitivity NV ensemble measurements [5, 42], except where background fluorescence was problematic (in which case the confocal system was used). A permanent magnet was used to set a magnetic field of 45 mT and was aligned with one set of NV axes.

All samples studied contained natural 13C abundance (1.1%), making accurate NV depth determination using XY8 sequences difficult due to the copresence of a 13C harmonic with the fundamental 1H resonance [16]. Where possible, we use the XY16 sequence as it is less sensitive to the problematic fourth 13C harmonic [17]. Even XY16 retains some sensitivity to this harmonic and so all depths quoted should be interpreted as lower bounds of the true mean depth of the ensembles. Correlation spectroscopy [39] verified that the 13C harmonic was a relatively minor component of the resonance fit for the shallowest implants (see the FFT in the Fig. 4(a) inset), however was more significant for some of the deeper implants.

Figure 4
figure 4

NV ensemble depth measurements. (a) Example spin decoherence data obtained with an XY16-64 sequence (black dots) and fit (blue line) showing the hydrogen resonance. Inset: FFT of a correlation spectroscopy signal taken on-resonance, showing the hydrogen signal is dominant over the 13C harmonic. (b) Plot of mean ensemble NV depth \(d_{\textrm{NV}}\) versus nitrogen concentration, using the same colour coding as in Fig. 3. Depths quoted are measured using XY16-64 sequences and the error bars denote either the standard error from the fits or the spread in fit depths given by sequences ranging from 48 to 128 pulses, whichever is larger for a given data point. Note that most but not all samples studied were able to detect a hydrogen signal and those that could not are not included on this plot.

In almost all cases, it was possible to detect a hydrogen signal from immersion oil placed onto the diamond surface using the created layers, however, using two of the 16 diamonds implanted could not, indicating that a shallow layer had not been successfully created. This failure could be due to vacancy diffusion into the diamond as increased PL was still observed. It is also possible that, in these diamonds, the yield enhancement from the implantation process was too poor for the hydrogen signal detected by the shallowest NVs to rise above the noise/background given by deeper NVs. Nevertheless, the fact that the majority of samples are able to detect a strong hydrogen signal indicates that the sensing layers are confined close to the surface. From this observation we infer that vacancy diffusion into the diamond under the chosen annealing conditions is not a major factor: substitutional nitrogen is an efficient enough vacancy attractor to dramatically reduce the vacancy diffusion length during annealing to within the \(<10\) nm extent of the created layers, which is important for the success of implantation into type Ib diamond as a method for creating shallow NV layers. Hydrogen signals were detected over the full range of nitrogen densities probed. The results are summarised in the plot Fig. 4(b), with mean ensemble depths ranging from 7 to 11 nm. The depths quoted are given by a 64-pulse sequence in each case, which we take to be a measure of the peak of the NV depth distribution [13]. The error bars represent the larger of the uncertainty from the fit and the spread in depth given by measurements with different numbers of pulses (ranging from 48 to 128). Errors due to the copresence of the 13C harmonic resonance (particularly for deeper implants) and contributions from bulk NV fluorescence (for high-N sectors) are not accounted for, which would cause the underestimation and overestimation of the actual depth, respectively.

The shallowest implants (2.5 keV 16O and 4 keV 31P – represented by the burgundy points), with peak vacancy production predicted below 3 nm from the diamond surface, were measured to have depths between 6.5 and 8 nm, consistent with high-quality N implants of a similar energy [13, 29]. This result illustrates two things: firstly that vacancy diffusion into the diamond is much less than order 100 nm observed in the bulk [1, 26], which would cause a much deeper mean ensemble depth that would preclude detection of the hydrogen signal. Instead these depths are consistent with the distribution predicted by the SRIM simulation, up to a cut-off introduced by band bending (the same interpretation as for N implants [13]). Secondly, however, that these ensembles are (at best) only as shallow as N-implanted ensembles (i.e. not shallower) suggests that the high bulk N density does not significantly alter the band bending.

The 4-keV 16O and 7-keV 31P implants (vacancy distribution peaking at 4 nm – yellow points) have deeper depth distributions, with most fit depths ranging from 8 to 9 nm. The deepest set of implants, 6 keV 16O and 11 keV 31P, (lavender points) had depths measured to be similar to the 4-keV implants, between 8 and 11 nm. As the deeper implants resulted in higher yields on average, these depths may still be in a useful regime and in practice both parameters should be considered alongside one another in determining which implant is appropriate for a particular application.

PL spectra

Figure 5
figure 5

PL spectra. (a) Comparison of PL spectra obtained for a 2.5-keV 16O implant in two sectors, containing approximately 8 and 80 ppm N. A strong NV0 character is evident in the spectra (note the strong zero phonon line at 575 nm), especially for the lower-N sector. Note that in the higher-N sector the contribution from the bulk background is more significant. (b) As (a) but for a 6-keV 16O implant. A stronger NV character is evident in both spectra compared to those in (a), supporting the Fermi-level cut-off interpretation.

Differences between the ensembles produced can also been seen in PL spectra, which can allow the relative abundance of the NV and NV0 charge states to be inferred. In Fig. 5 we show some spectra obtained for representative samples under typical widefield illumination conditions. Figure 5(a) shows spectra obtained for a shallow (2.5 keV 16O) implant, with high- (80 ppm) and low-nitrogen (8 ppm) sectors highlighted. We can see that a significant NV0 signal is evident in both cases, with a strong zero phonon line visible at 575 nm. The higher-N sector emits more strongly in general but particularly in the NV band, indicating that the higher-N density does lead to better NV charge stability.

Figure 5(b) shows spectra obtained for a deeper (6 keV O) implant. The PL signal is stronger in general and for both N densities the NV0 signal is less significant compared to NV. This result corroborates the expectation that band bending leads to a cut-off depth for NV charge stability. We can now consider the possibility that NV charge ratios could fully explain the observed reductions in N-to-NV conversion yields in shallow implants, given that we filtered for the negative charge state in our previous measurements. The spectra in Fig. 3(a) for the 8-ppm sector have the highest proportion of NV0. Here, we estimate a NV:NV0 ratio slightly below 1 (considering the fluorescence of NV0 to be 0.75 times as bright as NV). This could account for a twofold increase in N-to-NV yield when compared to samples containing predominantly NV [such as in Fig. 5(b)]. Whilst this is sufficient to explain variations observed between the shallow implants plotted in Fig. 3(b), the largest measured N-to-NV yields would be estimated to be near 3%, still well short of values observed in bulk samples, and the V-to-NV yields would still be well short of the reflecting surface limit [30]. Therefore, for the surface to not affect N-to-NV yield, a significant proportion of the NV ensemble must exist in the dark NV+ charge state. Whilst the vacancy profiles calculated by SRIM could be consistent with an NV+ contribution [21], we note that deeper implants in high-N sectors such as in Fig. 5(b) have a strong NV character and still have low yields compared to bulk samples. Therefore, it appears likely that our NV creation independent of charge state is indeed less efficient in the near-surface regime.

Ensemble sensitivity

Figure 6
figure 6

Assessing NV ensemble quality. (a) Plot of Hahn echo \(T_2\) values for the shallow ensembles measured versus the nitrogen content of the sectors. Error bars are the standard errors from the fits to measured decay curves. The solid black line gives the N-limited \(T_2\) value given by the equation of Bauch et al. [3]. Right-hand panel is a zoom-in on the region shown in the left panel, with some horizontal jitter added to data points for easier viewing. (b) Plot of the figure of merit \(T_2 d_{\textrm{NV}}^{-3}\sqrt{\alpha \mathcal R}\) (see text) versus nitrogen content. Error bars are dominantly given by the uncertainty in \(T_2\) and \(d_{\textrm{NV}}\).

Although the suitability of a sample to perform a given application will ultimately be heavily dependent on the precise nature of the measurement to take place, we can consider some general figures of merit to gauge the success of the approach. Since most applications of shallow NV ensembles will be concerned with AC signals whose detection can be in principle enhanced through dynamical decoupling, we first measure the Hahn echo \(T_2\) of the shallow ensembles, summarised in Fig. 6(a). We see some evidence for surface-induced decoherence in the shallower implants, with the low-N sector \(T_2\) values being longer for deeper implants. At the highest N densities, the various samples are more tightly grouped, implying a \(T_2\) close to the N-limited value. The N-limited \(T_2\) curve determined by Bauch et al. [3] is included as the black line in Fig. 6(a) to highlight the apparent impact of the surface; however, again we stress that the determination of sector [N] may be imperfect and we assume that all sectors in a “group” (identified through bulk PL) have the same N density.

To gauge the overall sensing performance of an NV ensemble (crucially also taking into account the fluorescence of the ensemble, scaling with [NV]), a common figure of merit is the photon shot noise-limited magnetic sensitivity, which for AC fields depends on \(T_2\) [2, 33, 41]. As we are concerned here with the detection of rapidly decaying signals scaling as the cube of the distance between NV and target (e.g. a magnetic noise \(B_{\textrm{RMS}}^2 \propto d_{\textrm{NV}}^{-3}\) [29]) and our ensembles feature different mean depths, we consider instead the minimal figure of merit \(T_2 d_{\textrm{NV}}^{-3}\sqrt{\alpha {\mathcal {R}}}\), which is proportional to the signal-to-noise ratio of a measurement for a given acquisition time. Here \({\mathcal {R}} \propto\) [NV] is the photon count rate under continuous laser illumination and \(\alpha\) is the laser duty cycle for a measurement of the optimal duration \(T_2\), both setup-dependent quantities (in this case we use a widefield microscope optimised to measure NV ensembles as a benchmark, as in Ref. [12]). We plot this quantity versus [N] in Fig. 6(b), finding that the spread is partly within error but with an overall tendency for lower-N sectors to perform better. The good performance of low-N sectors, buoyed by their longer \(T_2\), is partly a consequence of considering a widefield measurement requiring a long laser pulse duration (5 \(\mu\)s here) in contrast to confocal microscopy which will have \(\alpha \approx 1/T_2\) [12], however also reflects the low yields obtained in higher-N sectors and confirms that high bulk N concentration does not appear to aid near-surface NV properties by compensating for electron traps at the surface. The shallowest implants do perform the best on average despite them being most affected by imperfections in the surface preparation which further motivates the pursuit of shallower, stable NV ensembles. We note also that the motivating application of NV-based hyperpolarisation does not rely on shot-noise-limited readout and so a figure of merit scales with [NV] rather than \(\sqrt{\mathrm{[NV]}}\) [13] and hence favours the use of more dense ensembles.

Discussion

The main limitations of N-implantation for the creation of thick sensing layers is the vacancy overproduction (e.g. peak vacancy production for a 100-keV N implant exceeds the number of implanted ions by a factor of up to 100 [12]) compared to lower-dose implants into (for example) N-rich HPHT diamond and the inability to create layers of arbitrary thickness with a single-implantation stage. In the shallow regime, neither of these issues are relevant as the nitrogen depth profiles optimal for the applications discussed are easily attainable with N-implantation and the vacancy yield generally is low. Indeed, the localisation of vacancy production to the implanted ions may be beneficial for the purpose of curbing diffusion to the surface by converting vacancies to NV centres most efficiently, although the formation of multi-vacancy clusters may still be problematic [42]. In view of the above, it would appear that N-implantation is the most suitable technique for creating near-surface NV ensembles. Beginning with a high-quality CVD diamond also carries the benefit of allowing the use of refined doping of the crystal so as to promote NV formation through mechanisms such as vacancy charging as well as Fermi-level control [19], although these techniques have yet to be applied in the high-nitrogen, near-surface regime.

Nevertheless, in this work we have demonstrated that well-confined sensing layers can be produced within 15 nm of the diamond surface via implantation of HPHT diamond. This result shows that vacancy diffusion into the diamond bulk is not the limiting factor for N-to-NV yield near the diamond surface. The low yields may then instead be understood as vacancy diffusion to the nearby surface boundary that acts as a sink. If the surface can be engineered to be vacancy reflecting during annealing (for instance, through vacancy charging [19]), in line with the simulations of Räcke et al. [30] N-to-NV yields towards the bulk values of near 10% may be achievable.

We note that this surface needs to be maintained throughout the annealing process, with maximum temperatures typically ranging from 800 to 1100 °C [42]. These temperatures overlap the removal temperatures for common termination species, with oxygen being removed above 600 °C [27] and hydrogen above 900 °C [31]. We chose a maximum temperature of 800 °C to mitigate these effects; however, even at this temperature and under high vacuum conditions of \(\sim 1\times 10^{-6}\) hPa, small amounts of oxygen present in the chamber could disrupt the surface termination. Annealing at higher temperatures comes with the benefit of improving spin properties [42]; however, the surface termination will be even less well controlled and we can expect vacancy diffusion into the diamond to be more significant in this regime, motivating further studies in this area.

The depths and yields measured in this work are broadly similar to typical values measured for shallow ensembles created by N-implantation, indicating that implantation of type Ib diamond could be a cost-effective method of creating shallow NV layers. However, the high bulk nitrogen density present in some sectors does not appear to significantly combat surface-induced band bending meaning the method does not provide any advantages over N-implantation in this regime. N-implantation of electronic-grade diamond is naturally well suited to creating well-defined shallow NV layers and will not suffer from vacancy diffusion into the diamond, even though our results suggest this is not a major concern regardless. This result also indicates that the surface acceptor density is too great to be fully compensated by ultra-near-surface N atoms at densities below 100 ppm and so a more promising immediate route for improving near-surface NV charge stability is in optimising the preparation of the diamond surface.

Conclusion

This work has shown that it is possible to create dense, well-confined, shallow NV ensembles via the ion implantation of type Ib HPHT diamond, with yields in the range of those typically achieved using N-implantation. Although we did not find strong evidence for high bulk nitrogen density improving near-surface NV charge stability, these results do show that economical production of shallow ensembles is possible using this method. Along with near-surface band bending, vacancy diffusion to the surface is likely limiting the yield by reducing NV formation efficiency and we speculate that the large spread in measured yields is due to variable surface termination in the diamond samples. A simple oxygen RIE process prior to implantation was not found to dramatically change results by itself and so focusing on achieving high-quality surface termination during annealing is a logical next step.

The annealing processes conducted here are not expected to be optimal, and the relatively unknown role they have played motivates more systematic studies that could allow improved near-surface ensemble properties. For instance, the use of a higher-temperature anneal has previously been shown to improve the spin properties of shallow ensembles [42] and the N-to-NV yield could be improved through greater control over the diamond surface termination. Annealing during the implantation process is also an appealing option that has been shown to improve NV yields in the bulk [15]. Charging vacancies during annealing by introducing shallow electron donors to the diamond crystal may also improve the vacancy yield by limiting the formation of multi-vacancy clusters and perhaps out-diffusion to the surface on electrostatic grounds [19], although in this case we note that using a diamond with high levels of pre-existing nitrogen may not be beneficial due to the requirement for activating electrical donors prior to NV creation. The areas for improvement identified in this work will hopefully allow the creation of shallow ensembles approaching bulk values to be feasible through ion implantation of both electronic-grade and type Ib diamond in the future.

Methods

Ion implantation at the doses and energies described in the main text were carried out by a commercial provider (InnovION) with a sample tilt of 7° to minimise ion channelling and thus produce the best agreement with the SRIM simulations, which assume an amorphous substrate.

We used an ICP-RIE system (Plasmalab100 ICP380) for the oxygen RIE process. A recipe containing 50-sccm oxygen flow rate, 10-W bias (RF) power, and 600-W ICP power at a 10-mTorr chamber pressure was used for etching the samples.

Following implantation, all samples were annealed in a vacuum furnace (pressure held below \(10^{-5}\) hPa) using a ramp sequence that culminated with one hour at 800 °C (2-h ramp to 400 °C, 3 h at 400 °C, 3-h ramp to 800 °C, 1 h at 800 °C, 2-h ramp to room temperature). The one-hour plateau was chosen in an attempt to maximise NV yield whilst minimising vacancy diffusion into the diamond, in practice there is expected to be a trade-off between these two factors. The diamonds were then cleaned in a boiling (heated on a hot plate at \(\approx 450 ^\circ\)C) mixture of sulphuric and nitric acid (approximately 1:1 ratio) for 30 min to achieve a standardised, oxygen-terminated surface.

NV measurements were carried out on both confocal and widefield systems. The widefield system features a sCMOS camera (Andor Zyla), an immersion oil objective (Nikon CFI S Fluor \(40 \times\), NA = 1.30) and \(\approx 200\) mW laser illumination (532 nm, Laser Quantum Opus) focused to a spot of diameter \(\approx 50~\mu\)m. Laser pulsing is achieved using an acousto-optic modulator (AA Opto-Electronic MQ180-A0, 25-VIS) and MW delivery is from a Rohde & Schwartz SMBV100A signal generator, IQ gated by an arbitrary waveform generator (Keysight P9336). A PulseBlasterESR-pro card provides TTL signals to sequence the laser and MW pulses. The confocal system uses a higher magnification immersion oil objective (Olympus UPlanSApo \(100 \times\), NA = 1.4) and PL is collected onto a fibre-coupled avalanche photodiode (Excelitas SPCM-AQRH-14-FC). Input laser power was \(\approx 1\) mW, focused to a diffraction-limited spot.

PL spectra were collected using a StellarNet GREEN-Wave spectrometer under typical widefield illumination conditions.