Saybolt color prediction for condensates and light crude oils

Leam, Jia Jia; Khor, Cheng Seong; Dass, Sarat C.

doi:10.1007/s13202-020-01031-y

Saybolt color prediction for condensates and light crude oils

Original Paper-Production Engineering
Open access
Published: 12 November 2020

Volume 11, pages 253–268, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Petroleum Exploration and Production Aims and scope Submit manuscript

Saybolt color prediction for condensates and light crude oils

Download PDF

2871 Accesses
2 Citations
Explore all metrics

Abstract

Saybolt color determination is one of the techniques used to evaluate the quality of petroleum products as an indicator of the degree of refinement. As color is a property readily observed by operators, conventional procedures require operators to determine Saybolt color either through direct visual observation or through Saybolt chromometers. These methods are subjective due to the variability in perception of colors across different observers and may be influenced by external factors such as the level of illuminance. Digital oil color analyzers, on the other hand, cost almost four times as much as Saybolt chromometers. An alternative approach to color measurement is to develop a correlation model between Saybolt color with the physical and chemical properties of condensates and light crude oils from Malaysian oil and gas fields. This work applies several multiple linear regression techniques (such as stepwise regression) performed both manually and using the R software (version 3.6.1) to obtain statistically significant results. The step, regsubsets and glmulti functions from R are explored to develop the correlation model which predicts Saybolt color using only identified key properties, overcoming the possible drawbacks associated with conventional laboratory analysis. The models developed through these different techniques are analyzed and compared based on criteria indicated through the coefficient of multiple determination, R² and F-tests to infer on suitable regression approaches. Results obtained from these regression methods for models with and without interaction terms report deviations of less than 5% for 75% of the samples used for validation.

The Relationship Between the Color Characteristics of the RGB Colorimetric System and the Physicochemical Properties of Petroleums and high Boiling Hydrocarbon Distillates

Article 05 September 2016

Monitoring of Adulteration and Purity in Coconut Oil Using Raman Spectroscopy and Multivariate Curve Resolution

Article 18 November 2017

Characterizing Red Radish Pigment Off-Odor and Aroma-Active Compounds by Sensory Evaluation, Gas Chromatography-Mass Spectrometry/Olfactometry and Partial Least Square Regression

Article 29 March 2017

Introduction

Color observations of petroleum products are standardized through two international standards developed by the American Society for Testing and Materials (ATSM), namely ASTM D 156 and ASTM D 1500. The two standards cover different ranges of color. Highly refined petroleum products use the ASTM D 156 scale, also known as the Saybolt color scale which ranges from −16 (darkest) to +30 (lightest) (ASTM International 2003). For colors darker than −16 of the ASTM D 156 scale, the ASTM D 1500 scale is used, ranging from 0.5 (lightest) to 8 (darkest) (ASTM International 2008). Petroleum products for which colors fall outside of the established range are deemed contaminated. Conventional ways of measuring color are through direct and indirect visual observation. Direct observation involves comparing the color of oil samples directly with color standards, whereas indirect observation utilizes chromometers (Khor et al. 2020).

The measurement of Saybolt color using a Saybolt chromometer is carried out in the presence of a constant light source. The oil level in the sample tube is adjusted in a way so that the short-wavelength (violet) portion of the light energy reaching the eye is equal to that passing through the standard disk and the empty tube. Since surface tension, refractive index and specific dispersion of oil determine the angle at which light hits the wall from the oil surface, these attributes directly affect Saybolt color (Diller et al. 1943).

The American Petroleum Institute (API) gravity expresses the density of petroleum liquids in comparison to water where high API gravity represents low density. While condensates and light oils have low viscosity and high API gravity, degraded oils are heavier and more viscous. As crude oils get heavier, API gravity decreases, the absorption spectra moves to the red region, and fluorescence emissions become weaker (Hagemann and Hollerbach 1986). Furthermore, different types of hydrocarbons behave differently: Aromatics absorb visible or near-infrared light, while aliphatic compounds are only excited by high ultraviolet light. Hence, light hydrocarbons are colorless as they do not absorb light in the visible spectrum. Heavier or degraded crude oils with high concentrations of complex aromatic molecules are distinctively darker since they absorb light effectively in the visible light region (Steffens et al. 2011).

Regression modeling has been applied in the petroleum industry to develop correlations and pose models to predict physical properties; see, for example, Tomren and Barth (2014) and Douglas et al. (2018). The former work (Tomren and Barth 2014) involves formulating partial least squares calibration models to estimate properties such as viscosity, acid number and asphaltene content of crude oils and condensates based on information from gas chromatography (GC) and infrared spectroscopy (IR). However, the applicability range of the models might be limited and not readily extended to a wide range of petroleum sources. The latter work (Douglas et al. 2018) aims to predict hydrocarbon concentrations in contaminated soil in which different regression techniques are compared. Due to nonlinearity of soil spectral responses, higher prediction accuracy is observed using the random forest machine learning technique compared to partial least squares regression.

A main contribution of this work is to develop a Saybolt color correlation model for devising a fast and potentially cost-efficient method of estimating the color compared to laboratory-based measurements of the same. To the best of our knowledge, there is no correlation model developed for an automated determination of Saybolt color since the practice remains dependent on laboratory analysis. Arguably, the novelty of the paper lies in its attempt to develop such an empirical correlation model for Saybolt color to support measurement of this physical property as a standard quality indicator in the oil and gas industry. This color property has become more prominent in recent years due to increased interest in petroleum condensates as refinery feedstock resulting from shale gas extraction activities (IHS Markit 2018). With previous studies reporting the correlation between color with petroleum product properties, this work aims to demonstrate that regression modeling and analysis can be used to develop such a correlation model for predictive purpose.

Problem statement

It is reported that direct visual observations used for color determination of petroleum are highly subjective due to the variability in color perception across different observers (Rodriguez et al. 2017). On the other hand, measurements using Saybolt chromometers are affected by environmental factors: Varying illuminance levels can be obtained from different light sources such as fluorescent lamps and halogen lamps. Moreover, compounds such as olefins in crude oils and condensates are prone to oxidation, thus resulting in darkening and aging of samples which affect Saybolt color analysis (Rodriguez et al. 2017; Speight 2015). Digital oil color analyzers, on the other hand, cost almost four times as much as Saybolt chromometers (Clarkson Laboratory and Supply Inc 2019; IndiaMart 2019). Hence, an alternative approach is to rely on mathematical models for determining Saybolt color.

Best subset regression methodology

The aim of this paper is to develop best regression models for Saybolt color based on four properties of oil samples, namely (1) refractive index (R), (2) density (D), (3) kinematic viscosity at 75 °C (V₁) and (4) kinematic viscosity at 100 °C (V₂). To achieve this, we utilize the methodology of best subset regression where multiple statistical hypotheses tests are performed either to add or to remove regressor terms from a full model. The full model consists of all possible regressor terms constructed from considering all possible powers and interactions (i.e., main- and higher-order interactions) among the original four attributes {R, D, V₁, V₂}. The general mathematical formulation of a full model based on a response variable y and m = 4 explanatory variables, x₁, x₂, ⋯, x_m, is given by:

$$y={\beta}_0+\sum_{p=1}^M\sum_{i=1}^m{\beta}_{ip}{x}_i^p+\sum_{p,q=1,p\ne q}^M\sum_{i,j=1,i<j}^m{\beta}_{ijpq}{x}_i^p{x}_j^q+\kern0.5em \varepsilon,$$

(1)

where y is the observed Saybolt color; ${x}_i^p$is the regressor i (i ∈ {R, D, V₁, V₂}) raised to the power p with 1 ≤ p ≤ M where M is the highest power considered; ${x}_j^q$ is similar to ${x}_i^p$ with j ∈ {R, D, V₁, V₂} and q such that 1 ≤ q ≤ M; β₀ is the constant intercept term; β_ip and β_ijpq are, respectively, the regression coefficient corresponding to ${x}_i^p$ and ${x}_i^p{x}_j^q$; and ε is the random error assumed to arise from a normal distribution with mean zero and constant, but unknown variance σ². Best subset regression analysis is performed using a dataset of oil samples of size $n,\left({x}_1^{(k)},{x}_2^{(k)},{x}_3^{(k)},{x}_4^{(k)},{y}^{(k)}\right),k=1,2,\dots, n.$We develop two main full models in this paper, both with M = 2, without and with pairwise interactions between the explanatory variables.

To arrive at the best subset regression model, we implement the stepwise regression technique which adds or removes regressors from the current model one at a time (Montgomery et al. 2012) and tests for the significance of the added/removed term. The technique can be classified into forward selection, backward elimination and bidirectional elimination methods (Rawlings et al. 2006). In forward selection, the initial model starts with zero regressors. Subsequently, regressor terms from the full model in Eq. (1) are fitted into the current model, and the regressor with the best correlation with Saybolt color is selected for inclusion in the current regression model. The forward selection procedural flowchart is shown in Fig. 1. Backward elimination works in the opposite direction where a regressor is removed from the full model (1) if the corresponding test of significance for this regressor falls below a pre-specified threshold as shown in Fig. 2. A combination of the forward and backward methods constitutes the bidirectional elimination method. To perform statistical tests on the added/removed terms, two different F-tests, namely the partial and overall F-tests, are used to evaluate the significance (Draper and Smith 1998). The flowcharts of procedures for both bidirectional elimination methods are presented in Figs. 3 and 4.

We perform the forward, backward and bidirectional elimination methods by manually creating columns in Excel that represent all the regressor terms in Eq. (1). The second approach that we implement is more automated and similar hypotheses tests are carried out via functions accessed through statistical packages in R (version 3.6.1) (R Core Team 2013). The R function step from the stats package uses the Akaike information criteria (AIC) to estimate the relative quality of model fit to the dataset for a given set of regressors. Procedures for forward selection and backward elimination using step are shown in Figs. 5 and 6, respectively. On the other hand, the regsubsets function from the leaps package evaluates all possible models for a given set of regressors and returns the model with the highest adjusted R² (Lumley 2014). Another set of functions (or classes) used in this work is from the glmulti package which automatically considers all possible generalized linear models arising from all possible subsets of a given collection of regressors from the full model. As an exhaustive screening tool, glmulti ranks the subset regression models according to a specific information criterion: It gives three choices, namely AIC, Bayesian information criterion (BIC) and corrected AIC (AICc). The first-ranked (i.e., highest ranked) model has the lowest value of such information criterion (Calcagno and de Mazancourt 2010).

Results and analysis

Regression modeling based on physical properties

The four properties of refractive index (R), density (D), kinematic viscosity at 75 °C (V₁) and kinematic viscosity at 100 °C (V₂), as well as Saybolt color measurements, are obtained from assay reports for the whole (i.e., bulk) and product fractions (i.e., cuts) of condensates and light crude oils from Malaysian oil and gas fields located mainly in offshore Sabah and Sarawak (e.g., of the types named Kimanis, Marjoram, Bintulu and Kawasari). The dataset consists of n = 15 samples. The scatterplots obtained for analyzing pairwise variable relationships, as shown in Fig. 7, indicate the absence of linear trends between Saybolt color and the four variables. These scatterplots suggest that there is a high degree of nonlinear relationships between Saybolt color and the potential regressors. Henceforth, we consider higher-order powers and interaction terms for the four variables. The catterplots of R vs. D and V₁ vs. V₂ show strong linear relationships (high collinearity). Thus, we expect the best subset regression model in our experiments to select either R or D, but not both, and similarly for V₁ and V₂.

The two full models considered with M = 2 have the following explicit forms:

$$S={\beta}_0+{\beta}_1R+{\beta}_2D+{\beta}_3{V}_1+{\beta}_4{V}_2+{\beta}_5{R}^2+{\beta}_6{D}^2+{\beta}_7{V}_1^2+{\beta}_8{V}_2^2+\varepsilon$$

(2)

as the full model without interaction terms and

$${\displaystyle \begin{array}{l}S={\beta}_0+{\beta}_1R+{\beta}_2D+{\beta}_3{V}_1+{\beta}_4{V}_2+{\beta}_5{R}^2+{\beta}_6{D}^2+{\beta}_7{V}_1^2+{\beta}_8{V}_2^2+{\beta}_9 RD+{\beta}_{10}{RV}_1+\\ {}{\beta}_{11}{RV}_2+{\beta}_{12}{DV}_1+{\beta}_{13}{DV}_2+{\beta}_{14}{V}_1{V}_2+\varepsilon \end{array}}$$

(3)

as the model with all pairwise interaction terms. Note that higher powers of explanatory variables are deliberately missing when considering the interaction terms in Eq. (3). This is because of the small sample size (n = 15) of the dataset used in this study. The full model (3) with 15 unknown parameters has zero residual degrees of freedom and hence cannot be tested for statistical significance. We consider the full model in (3) only as a strict upper bound for all pairwise interaction terms to be considered in the subset regression models. We find that the best regression model typically includes only one or two of the pairwise interaction terms this model.

The stepwise regression results are summarized in Table 1 for both types of full models (2) and (3). Note that in the case of the latter model, a zero degree of freedom does not permit backward elimination to be applied (Hu 2016). For both types of full models (2) and (3), the overall F-statistic and adjusted R² values obtained using the forward selection procedure are higher than those obtained using the other techniques. This outcome is supported by Berk (1978) and Dempster et al. (1977): Forward selection produces model subsets with smaller residual variances compared to backward and bidirectional eliminations because only regressors which improve the model significantly are added. In the case where sample size is small, backward elimination which starts by fitting all candidate regressors into a model can result in overfitting with huge rounding errors (Draper and Smith 1998).

Table 1 Evaluation of best models with and without pairwise interaction developed using stepwise regression

Saybolt color prediction for condensates and light crude oils

Abstract

Similar content being viewed by others

The Relationship Between the Color Characteristics of the RGB Colorimetric System and the Physicochemical Properties of Petroleums and high Boiling Hydrocarbon Distillates

Monitoring of Adulteration and Purity in Coconut Oil Using Raman Spectroscopy and Multivariate Curve Resolution

Characterizing Red Radish Pigment Off-Odor and Aroma-Active Compounds by Sensory Evaluation, Gas Chromatography-Mass Spectrometry/Olfactometry and Partial Least Square Regression

Introduction

Problem statement

Best subset regression methodology

Results and analysis

Regression modeling based on physical properties

Regression modeling based on physical and chemical properties

Model validation

Concluding remarks

Code availability (software application or custom code)

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest/Competing interests

Availability of data and material (data transparency)

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation