Introduction

Petroleum compositional analysis has been the subject of industrial research for many decades. The advances in analytical techniques continue to push the envelope of the compositional/structural details of petroleum and enabled many R,D and business applications across upstream, downstream, and chemicals sectors of the industry. For example, the structure-oriented lumping (SOL) compositional modeling [1] was based on petroleum composition described as a mixture of hydrocarbon molecules. These molecules were arranged by molecular class/homologue series directly derived from analytical measurements. Petroleum geological biomarkers, derived from their biological precursors during the geological processes, were extensively used for the assessment of reservoir conditions, oil migration, maturity and quality, etc. [2]

One of the active research areas is the comprehensive two-dimensional gas chromatography (GC × GC or 2DGC) [3, 4]. Petroleum compositional space was greatly expanded due to simultaneous separation of petroleum molecules via boiling point and polarity. In upstream applications, 2DGC has been successfully used to detect and quantify geological biomarkers in for petroleum explorations [5,6,7,8,9]. In downstream, 2DGC has now been used to provide full molecular compositions for lower boiling fractions, such as gasoline, jet fuel, and diesels [10,11,12]. It is increasingly difficult to resolve individual components as boiling point increases. 2DGC was typically used to generate lumped chemical type information for higher boiling fractions, such as whole crude, lube base stocks, and vacuum gas oils (VGO) [13,14,15,16,17]. By combining with elemental selective detectors, such as sulfur and nitrogen chemiluminescence detector (SCD/NCD), certain heteroatom compound types can also be resolved and quantified [11, 12, 18]. In general, 2DGC applications of petroleum analysis are limited to samples with nominal boiling point below 1000 °F (or ~ 537 °C)

As boiling point increases, complete resolution of petroleum molecules becomes challenging. Coupling 2DGC with mass spectrometry (2DGC-MS) proved to be very powerful to enhance compositional details [4, 7, 19,20,21,22,23,24,25,26,27,28,29]. Most of the 2DGC-MS characterization was done by electron impact ionization (EI). EI mass spectrometry fragmentation is very useful to identify chemical structures of the separated species and to resolve isomers/isobars. However, the extensive fragmentation by EI also makes it very challenging to quantify hydrocarbon molecules for compositional analysis. The use of soft ionization, such as field ionization [25, 30,31,32,33,34] and single-photon ionization [35,36,37] to minimize fragmentation, has been explored and successfully used in petroleum analysis by GC-ToF MS. Because of the lacking of fragmentation, mass spectrometry further separate chemical components by molecular weight, which served as another dimension of separation. When combined with GC or 2DGC, two- and three-dimensional separations have been achieved and illustrated in the recent publications [32, 34, 37, 38]. Recently, a combination of FI and EI was used as a universal biomarker analysis in geochemistry applications [23, 24]. Most recently, 2DGC was also coupled with multiple ionization methods, such as electron ionization, photon ionization, and chemical ionization for petroleum base oil analysis [26]. Most of the research focuses on the analysis of selected chemical types of interests. Complete accounting of all molecules in heavy petroleum distillates is challenging and needs a new data analysis approach.

The work described here invoked the use of nominal mass classes, accurate mass analysis, and Kendrick mass defect (KMD) in conjunction with 2DGC separation to determine full chemical compositions of vacuum gas oil distillates at molecular level. In this approach, all masses were first separated into nominal mass classes. KMD plots (KMD versus molecular weight) within each nominal mass class were generated for easy recognition of homologues series. KMD windows of identified homologues were then imposed to the 2DGC data for complete resolution and full accounts of petroleum molecules (petroleum types and carbon numbers). Byer et al [39] had shown that difficult-to-resolve isobaric mass overlap (e.g., C3/SH4, ΔM ~ 3.4 mDa) can be resolved by 2DGC with moderate mass resolution. Here we demonstrated resolution of hydrocarbons and sulfur-containing molecules across the whole carbon number range of VGO without using ultra-high-resolution mass spectrometry such as Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS) [40,41,42]. The latter is more difficult to couple with GC or 2DGC for high-speed analysis. In addition, it has not been successful to combine Field ionization with FTICR-MS for saturate hydrocarbon analysis because of the short life span of parent ions generated under FI conditions [42].

Experimental

Instrumentation

The instrument used in this work has been discussed in previous publications [34, 38]. In brief, the GC × GC TOF MS system consists of an Agilent 7890 gas chromatograph (Agilent Technology, Wilmington, DE) with a flame ionization detector (FID) and a split/splitless inlet system. The two-dimensional capillary column system is a combination of low polarity first column (BPX-5, 30 m, 0.25 mm i.d., 1.0 μm film) and a mid-polarity second column (BPX-50, 3 m, 0.10 mm i.d., 0.10 μm film) (SGE Analytical Science., Austin, TX). A ZX1 looped jet thermal modulation assembly (Zoex Corp., Houston, TX) is installed between the two columns. In this work, the effluent of GC × GC is split into two streams, one connected to a FID, the other one connected to the ion source of a JEOL MS via transfer line.

A 0.2-μL sample was injected with 50:1 split at 300 °C in constant flow mode of 2.0 mL per minute helium. The oven was programmed from 45 to 315 °C at 3 °C/min for a total run time of 90 min. The hot jet is kept at 120 °C above the oven temperature and then kept constant at 390 °C. The MS transfer line and ion source were set at 350 °C and 150 °C, respectively. The modulation period was 10 s.

The MS system is a JMS-T100GCV 4G (JEOL, Tokyo, Japan), time-of-flight mass spectrometer (TOFMS) system with a field ionization (FI) source. The JMS system has an average mass resolution (R) of ~ 5000–10,000 and a mass accuracy of ~ 5–10 ppm. Here mass resolution is defined as R = MMFWHM. The high resolution enabled differentiation of isobaric compound types by accurate masses. TOF MS with moderate mass resolution (R ~ 5–10 K) can differentiate H12 from C (ΔM ~ 94 mDa) and S from C2H8 (∆M ~ 90 mDa). However, it cannot resolve C3 from SH4 (∆M ~ 3.4 mDa). Since petroleum homologue series (member of compounds are separated by 14 or the mass of a CH2 unit) have identical KMDs (see Table 1), the latter can be used to separate and identify petroleum compound types.

Table 1 Nominal Mass Class (Z*), Hydrogen Deficiency (Z), Double Bond Equivalent (DBE), and Kendrick Mass Defects (KMD) of Representative Petroleum Compound Types and Core Structures

A mixture of compounds containing heptacosa, pentafluorobenzene, hexafluorobenzene, pentafluoro-iodobenzene, pentafluorochlorobenzene, perfluorotrimethylcyclohexane, xylene, and acetone was used to calibrate a mass range from 50 to 800 Da. Once a homologue series was identified, an internal calibration can be performed to further improve mass accuracy.

Data Analysis

In hydrocarbon analysis, it is convenient to convert mass scale from IUPAC (C12 = 12.0000) to a Kendrick mass scale [43,44,45] (CH2 = 14.0000) because petroleum contains homologues with a repeating unit of CH2. In Kendrick scale, members of a homologue series are forced to have an identical mass defect. Equations (1) and (2) are used to convert mass scale from IUPAC to Kendrick mass scale and to calculate the Kendrick mass defects (KMD).

$$ \mathsf{Kendrick}\ \mathsf{Mass}=\mathsf{IUPAC}\ \mathsf{Mass}\times \mathsf{14}/\mathsf{14.01565} $$
(1)
$$ \mathsf{KMD}=\left(\mathsf{Nominal}\ \mathsf{Mass}-\mathsf{Kendrick}\ \mathsf{Mass}\right)\times \mathsf{1000} $$
(2)

Here the nominal mass is calculated as the integer number of mono-isotopic mass.

A new data analysis strategy is implemented for compositional analysis. The 2DGC-FI-TOF MS data were imported and processed by an internally developed data analysis and visualization routine. The routine calculates the first and second dimension retention times, the nominal mass classes (Z*), Kendrick mass, and Kendrick mass defect (KMD) for all components detected. Resolution, identification, and quantification of petroleum molecules were performed via multiple steps. First, all masses were separated into nominal mass classes (see discussion later). This separation allows easy recognition/identification of homologue series via KMD plots. KMD windows were used to isolate the homologue series. Finally, any mass-overlapping isobaric homologues (e.g., C3/SH4 overlap) were separated by the 2DGC retention times. The intensities of fully resolved components were summed to generate normalized concentrations (total ion intensities normalized to 100%) and carbon number distributions of identified homologues.

Two-dimensional GC-FID images were processed by PhotoShop (Adobe System Inc., San Jose, CA).

Samples

Crude oil is commonly distilled into various boiling point fractions, such as naphtha, middle distillate, and vacuum gas oil for detailed chemical and compositional analysis. Distillation reduces the molecular weight range and the variety of chemical types in the sample. In this study, VGO was further distilled into narrow cuts with a nominal 50 °F window. Three VGO cuts with nominal boiling points of 650–702 °F, 702–752 °F, and 752–777 °F were selected and analyzed.

Results and Discussions

2DGC-FID and 2DGC-FI-TOF MS Images

The 2DGC-FID and 2DGC-FI-TOF MS images of the three VGO distillation cuts are compared in Figure 1. The 1st dimension GC separates petroleum molecules roughly by boiling point or carbon number and the 2nd dimension separate them by polarity. In petroleum applications, the 2nd dimension separation is dominated by aromatic ring types (or unsaturations). The higher the aromatic ring classes, the longer the 2nd retention time. Figure 1 shows that as boiling points of the cuts increase, the 2DGC images moved toward the upper right and became broadened, indicating the increase in molecular size, polarity, and compositional complexity. VGO molecules eluted very closely in the 2DGC space (both 1st and 2nd dimension), making it difficult to resolve individual components by 2DGC-FID only. As will be shown in the later sections, soft ionization mass spectrometry can help to resolve the overlaps and greatly improve resolutions of petroleum compounds. Although the 2DGC-FID image pattern tracks that of TOF MS, we did notice that 2DGC-TOF MS image becomes narrower as boiling point increases. We attribute this to the limited dynamic range of MS relative to that of FID and lower FI sensitivity for higher boiling point fractions.

Figure 1
figure 1

2DGC images of three vacuum gas oil distillation cuts: 650–702 °F, 702–752 °F, and 752–777 °F. Top row: Data obtained by 2DGC-FID. Bottom row: Data obtained by 2DGC-FI-TOF MS

Divide 2DGC-MS Data by Nominal Mass Classes

Before we analyze the 2DGC-MS data, it is useful to introduce the concept of nominal mass classes. Petroleum molecules are commonly expressed in general chemical formula CnH2n+ZX, where n is the carbon number and X is a heteroatom or a combination of heteroatoms. Z (number) is typically referred as hydrogen deficiency. Nominal mass is defined as the integer formula weight calculated using C (12), H (1), S (32), N (14), and O (16), respectively. All nominal masses of petroleum molecules can be grouped into 14 nominal mass classes (or Z* class). Mathematically, Z* can be calculated from the nominal mass as follows:

$$ \mathit{\mathsf{M}}=\mathsf{modulus}\ \left(\mathsf{nominal}\ \mathsf{mass}/\mathsf{14}\right) $$
(3)
$$ \mathit{\mathsf{Z}}\ast =\mathit{\mathsf{M}}-\mathsf{14}\ \mathsf{if}\ \mathit{\mathsf{M}}\ge \mathsf{4}\ \mathsf{and} $$
(4)
$$ \mathit{\mathsf{Z}}\ast =\mathit{\mathsf{M}}\ \mathsf{if}\ \mathit{\mathsf{M}}<\mathsf{4} $$
(5)

In this definition, Z* equals to the hydrogen deficiency (Z-number) for the first 7 hydrocarbon classes (Z = 2, 0, − 2, − 4, − 6, − 8, − 10) as shown in Table 1. Z* wrap-around occurs when Z is below − 10. Z* and Z are also different for heteroatom containing-molecules. Consequently, each Z* class can contain multiple homologue series of different Z-numbers. For example, Z* = 2 class contains paraffins (Z = 2), alkylated naphthalenes (Z = − 12), dicyclic sulfides (Z = − 2 S), and dibenzothiophenes (Z = − 16 S) etc.

FI is a soft ionization method that generates mono-isotopic (12C) molecule ions and corresponding 13C isotope ions of the analytes. The molecule ions of hydrocarbons and sulfur-containing compounds present as even masses. Their mono 13C isotopes and nitrogen-containing compounds present as odd masses. Only even mass ions were analyzed in this work because the samples contain trace levels of nitrogen compounds. The even nominal masses (mono-isotopic molecular ions) are grouped into 7 nominal mass classes (Z* = 2, 0, − 2, − 4, − 6, − 8, and − 10, respectively).

The pre-separation of the masses by Z* simplified the 2DGC-MS data analysis and visualization. Figure 2 shows the 2DGC-MS plots of the 650–702 °F cut for the individual nominal mass classes. To generate these plots, components with the same Z* number were isolated and plotted by their 1st and 2nd dimension retention times. The chemical compound classes in the full 2DGC-MS image (Figure 1) cannot be easily analyzed because of the presence of large number of components and their elution has significant overlaps. The display of the 2DGC-MS image by nominal mass classes revealed presence of multiple clusters of species. These are petroleum homologues of different core structures. For example, in the Z* = 2 group, normal paraffins can be easily recognized because of their distinguished pattern (single mono-isotopic molecule ions). In addition to n-paraffins, three additional compound groups with higher degrees of unsaturation (eluting at longer 2nd retention time scale) can be recognized. As discussed before, these are due to the presence of multiple homologues that share the same nominal mass series (or the same Z*). Similarly, one can plot the 2DGC-MS data for all the nominal mass classes, resulting separation of 20+ clusters of molecule ions. Although an experienced analytical chemist may be able to guess the identities of these clusters based on the 2nd dimension elution and their nominal mass class. Positive identifications require accurate mass analysis.

Figure 2
figure 2

2DGC-FI-TOF MS images of the 650–702 °F cut separated by nominal mass classes: Z* = 2, − 2, − 4, − 6, − 8, and − 10, respectively

Recognition of Petroleum Homologues by Kendrick Mass Defect Analysis

The mass spectral profiles of the three VGO distillation cuts can be found in the support information Figure S1. As expected, FI generates molecular ion mass spectra (top row) with no detectable fragmentation. As boiling point increases, mass distribution shifted slightly to the right. The mass spectra are very complicated as illustrated in the KMD plots (bottom row) which plot KMD of all species against corresponding m/z. There are severe overlaps in KMDs which range from ~ − 30 to ~ 250, indicating the presence of a wide range of chemical compounds with different degrees of unsaturation and heteroatom contents.

Pre-separation of mass peaks into nominal mass classes enabled easy recognition of homologues with unique KMD values. Figure 3 shows KMD plots of the 650–702 °F cut where KMDs are plotted against nominal masses (or molecular weights because all species are singly charged in FI) for the seven nominal mass classes. Each nominal mass class box contains multiple homologue series. For example, in the Z* = 2 box, there are three homologues with KMD values around − 13, 80, and 173, respectively. About 19 homologous in total can be recognized. The inset in the right bottom corner illustrates the KMD distributions of the three homologues in the Z* = 2 box. The peak widths indicate the spread of KMD values. The ∆KMD at full width at half maximum is about ± 10 KMD unit. In this work, a KMD window of ± 20 KMD unit is used to select mass peaks for further 2DGC analysis.

Figure 3
figure 3

Kendrick mass defect plots of the 650–702 °F cut separated by nominal mass classes: Z* = 2, − 2, − 4, − 6, − 8, and − 10, respectively. Inset shows KMD distributions of Z* = 2 box

General chemical formula (CnH2n+ZSs) may be assigned by matching the theoretical KMDs of petroleum compound types. Table 1 lists nominal mass class (Z*), hydrogen deficiency (Z), double bond equivalent (DBE), and Kendrick mass defects (KMD) of representative petroleum compound types and core structures. For convenience, we normally shorthand the description of petroleum types by their Z-number and heteroatoms. For example, alkylated benzenes (homologue series) were shorthanded as “− 6 HC” and alkylated benzothiophenes were shorthanded as “− 10 S.” Only a few of hydrocarbon (HC) compound types (with Z ranging from + 2 to − 6) can be assigned without ambiguity by TOF MS. Many compound types cannot be uniquely assigned because of C3/SH4 overlap (∆M ~ 3.4 mDa). For example, the mass peaks with KMD ~ 80 in Z* = 2 group can be assigned either as naphthalenes (− 12 HC, KMD = 80.4) or dicyclic sulfides (0 S, KMD = 77). TOF MS alone cannot resolve these two homologues.

Full Composition Analysis by Combining 2DGC and KMD Window Selection

More compositional information was revealed when one impose selected KMD windows in data analyses. Figure 4 shows the image plot of Z* = − 6 class which was further resolved by selective display of KMD around 40, 130, and 220, respectively. At KMD of 40 ± 20, general chemical formula CnH2n−6 (or “− 6 HC”) was assigned. Two clusters of molecules were revealed by the 2DGC separation. The bottom cluster is mainly attributed to alkyl benzenes. The top cluster is the sterane biomarkers which exhibited scattered distribution due to the unique structures retained from their biological precursors. Although the two groups of molecules have the same degree of unsaturation (identical chemical formula), biomarkers always eluted later than the non-biomarker molecules due to their bulky conformations. It should be noted that these compounds cannot be easily recognized without the KMD separation (as shown in Figure 2, Z* = − 6 box).

Figure 4
figure 4

Z* = − 6 nominal mass class further separated by KMDs of 40 ± 20, 130 ± 20, and 220 ± 20, respectively

Two more groups of compounds were observed in the 2DGC separations of the KMD 130 window. These are identified as benzothiophenes, “− 10 S,” and acephenanthrenes, “− 20 HC.” The two groups were well resolved by the polarity separation as acephenanthrenes are 3-ring aromatics which elute much later than do benzothiophenes (2-ring aromatics) in the second dimensional separation. Note acephenanthrenes have wrap arounds in the second dimension separation, appearing in both top and bottom sides of the image.

In the KMD of 220 ± 20 window, we observed another group of compounds eluting in the 3-aromatic ring region. They were identified as dithiophenobenzenes (− 14 S2). It should be noted that these species would co-elute with the acephenanthrenes (− 20 HC) shown in the KMD 130 window.

The same approach can be repeated for all nominal mass classes to resolve petroleum types. The results are shown in Figure 5. About 30+ molecular types were recognized. They were divided into hydrocarbon types (Z = 2 to − 22) and sulfur compound types (Z = 0 to − 16). As Z decreases, molecules become more unsaturated and eluted progressively later in the second dimensional GC separation.

Figure 5
figure 5

2DGC-FI-TOF MS images of the 650–702 °F cut of seven nominal mass classes separated by KMDs, using a process similar to that in Figure 4

Starting from Z = − 10, both sulfur and hydrocarbon molecules were observed. Sulfur molecules always elute earlier in the 2nd dimension because they are more saturated than their isobaric hydrocarbon counter parts. These molecules could not be resolved in one-dimensional GC-FI-TOF MS analysis [31] when mass is greater than 300. They were completely resolved by the 2DGC separation. It is interesting to note that there are three clusters of molecule ions in Z = − 14/− 4 S box. The two “− 4 S” isomers were attributed thiophenes and tricyclic sulfides.

Compositions of VGO Distillation Cuts

The normalized concentrations of the three VGO cuts (shown in Figure 1) from the 2DGC-FI-TOF MS measurements are summarized in Table 2. Since ionization efficiencies of petroleum compounds vary with core structures and molecular weights, the numbers do not represent absolute composition. Quantification may be achieved by applying response factor correction and by integration with 2DGC-FID results. A total of 16 hydrocarbon types and 14 sulfur types were listed and compared for the three boiling point cuts. Hydrocarbon types range from paraffins, 1–3 ring cyclic paraffins, and 1 to 4 ring aromatics. Sulfur compounds include 1–3 ring cyclic sulfides and 1–3 ring aromatic thiophenes. The relative concentration of sulfur compounds and condensed aromatic species increase with boiling point of the cuts as expected.

Table 2 Petroleum Compound Types in the Three VGO Distillation Cuts (Abundances of All Compound Types Normalized 100%)

Molecular weight or carbon number distributions of the petroleum compound types can be obtained by summation of the separated 2DGC-MS data shown in Figure 5. Figure 6 shows MW distributions of “− 4 HC” and “− 6 HC” petroleum compound types, respectively. The top distributions are the non-biomarker molecules and bottoms are biomarker molecules of the same general chemical formula. The mass distribution of non-biomarker molecules is relatively smooth (close to Gaussian) as we expected for petroleum distillation cuts. The distribution shifted toward the right side as boiling point of the cut increases as expected. The carbon numbers peaked at C21, C22, and C23 for the three boiling point cuts, respectively.

Figure 6
figure 6

Molecular weight distributions of Z = − 4 and − 6 hydrocarbon classes in the 650–702 °F cut. Top: non-biomarker molecules. Bottom: biomarker molecules

The mass distributions of the bio-marker molecules (bottom chart) are very spiky (with high intensities at certain carbon numbers), indicating favorable skeleton structures retained from their biological precursors. We observed core skeleton tricyclic and tetracyclic (steranes) structures. Both start with carbon number of 19. Tricyclic biomarker distributions peaked around C21, C23, C24, C28, and C29, respectively. Steranes peaked around C21, C22, C27, and C29, respectively. C27–C30 steranes were commonly found in crude oil and have been extensively applied in the biomarker analysis for geochemistry applications [24]. In this work, they were found to be concentrated in the higher boiling cuts. It is interesting to note that abundant C19, C21, and C22 steranes were also observed. They are more concentrated in the lower boiling cuts. These short side chain biomarkers could be originated from cracking of the original biomarkers (e.g., C27–C29 steranes) in the diagenesis process.

Conclusions

We have shown that combined two-dimensional GC separation, soft ionization, and high-resolution mass spectrometry greatly enhanced compositional analysis of heavy petroleums. A new data analysis strategy is implemented for compositional analysis. By nominal mass class separation, selective KMD window, and 2DGC separation, 30+ petroleum homologues series were fully resolved. Difficult-to-resolve mass overlaps (e.g., C3/SH4, ΔM ~ 3.4 mDa) were clearly separated, revealing sulfide and aromatic sulfur compositions that are hard to separate from hydrocarbon signals in the one-dimensional GC-TOF MS analysis. Two series of biological markers were also revealed with unique carbon number distributions.