Skip to main content

Advertisement

Log in

Machine learning approaches to identify lithium concentration in petroleum produced waters

  • Original Paper
  • Published:
Mineral Economics Aims and scope Submit manuscript

Abstract

Prices for battery-grade lithium have increased substantially since 2020, which is propelling the search for additional sources of this important element. Battery-grade lithium is predominately recovered from continental brines. Most crude oil and natural gas wells recover briny formation water, which may represent an additional source. Chemical analysis of these waters has been shown to indicate the presence of varying concentrations of lithium and related elements. This paper briefly reviews developments and literature supporting the presence of lithium in petroleum reservoir brines. It also describes the coverage and distribution of lithium data analyses in the United States Geological Survey National Produced Waters Geochemical Database (PWGD). It then addresses the question as to whether a lithium concentration can be accurately predicted using constituents of ion chemistry in produced brines from specific geologic formations. Four machine learning algorithms are employed to classify the commercial potential of lithium in oil field brines using data from oil wells recovering formation water from the Smackover Formation. The calibrated classification models are further applied to new (out-of-sample) data from the Marcellus Formation in the Appalachian Basin. Among the approaches considered, the predictive performance and wider applicability of the gradient boosted tree and the deep neural network models are determined to be the most promising. Finally, we discuss how the calibrated models could be applied to assure the quality of the data reported from chemical laboratory analysis and for imputation when lithium values are missing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

Data are accessible at the United States Geological Survey National Produced Waters Geochemical Database.  See the following data release Blondes, M. S., Gans, K. D., Engle, M. A., Kharaka, Y. K., Reidy, M. E., Saraswathula, V., Thordsen, J. J., Rowan, E. L., and Morrissey, E. A. (2018). U.S. Geological Survey National Produced Waters Geochemical Database (ver. 2.3, January 2018): U.S. Geological Survey data release, https://doi.org/10.5066/F7J964W8.

Notes

  1. A resource whose location, grade, quality, and quantity are known or can be estimated from specific geologic evidence.

  2. The provinces are based on the American Association of Petroleum Geologists’ classification scheme (Meyer et al. 1991).

  3. For 112 Smackover Formation samples from unique wells with complete predictor records used in this analysis the mean lithium concentration is 178 mg/l.

  4. It was reported by the Wall Street Journal (Morene and Eaton 2023) that ExxonMobil had spent $100 million for Smackover Formation mineral rights for 120,000 acres near the town of Magnolia, Arkansas, for a lithium recovery project.

  5. Smackover brine wells associated with a bromine recovery project had median monthly production of about 470 hundred thousand barrels (S&P Global 2023).

  6. Additional predictors of calcium and magnesium concentration were examined but added no new information to these five predictors.

  7. The theory that the lithium was sourced from rocks of Alleghenian origin and released to the Smackover via the Norphlet Formation is reported in Daitch (2018).

  8. The classification problem was also analyzed with a logistic regression model. The classification performance was inferior to all machine learning algorithms examined in this paper.

  9. An activation or neuron in a neural network is a mathematical function that collects and classifies information according to a specific architecture. Activation functions determine a neuron should be activated by computing the weighted function of inputs values (from the adjacent layer) and adding a bias term. The effect is to introduces non-linearity into the neuron output (James et al. 2021).

  10. For cross-validation, the training data are divided into multiple folds, with one of the folds designated as a validation set, and the model is trained on the remaining folds. This process is repeated multiple times, each time a different fold is used as the validation set. The performance measures from each of the validation folds are averaged to estimate the model’s predictive performance when new data are applied.

  11. The term “maximum accuracy” is used here to align how other researchers use it and software reports it. If c is correctly classified (as either true or false) and n is the sample size, it is max accuracy in percent = 100 × c/n. The average error rate is the average of the error rate of those true that were classified as false (ft) divided by total number true (nt) and those that were classified as false that were true (tf) divided by the total number of false (nf), so average error rate is equals 100 × 0.5 × [(ft/nt) + (tf/nf)].

  12. The specificity measure is important when one is concerned with detecting below the cutoff and there is a high cost associated with misclassifying a sample at or above the cutoff. Sensitivity is important when cost of a false positive (labeling as positive when is not true) is low.

References

Download references

Funding

This work was funded by the U.S. Geological Survey Energy Resources Program. (drop apostrophe S and Alicia Lindauer, Program Coordinator). Research and preparation of paper were done as part of official duties assigned by the U.S. Geological Survey, U.S. Government. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Author information

Authors and Affiliations

Authors

Contributions

Attanasi and Freeman participated in formulation of the problem and data preparation. Attanasi, Coburn, and Freeman completed data analysis, writing, and editing the manuscript.

Corresponding author

Correspondence to E. D. Attanasi.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Conflicts of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Fig. 9
figure 9

Box plots of the concentration of predictor variables associated with the Marcellus formation lithium concentration data. The F denotes samples where the lithium concentration was below 200 mg/l and T denotes the samples where lithium concentration was at least 200 mg/l

9

Fig. 10
figure 10

Box plots of the concentration of predictor variables associated with the Marcellus formation lithium concentration data. The F denotes samples where the lithium concentration was below 70 mg/l and T denotes the samples where lithium concentration was at least 70 mg/l

10

Table 7 Pearson product moment correlations among lithium (Li), boron (B), bromine (Br), chloride (Cl), potassium (K), and strontium (Sr) concentrations in the Smackover and Marcellus produced water samples

7,

Table 8 Values of hyperparameters used in the machine learning models

8,

Table 9 Algorithm for random forest (RF) for regression (after Hastie et al. (2009))

9,

Table 10 Algorithm for gradient boosting tree (GBT) with regression squared error loss. (After Hastie et al. 2009; Malohlava and Candel 2022)

10,

Table 11 Algorithm for extreme gradient boosting (XGBoost) [after Nielsen (2016); Hastie et al. (2009))

11, and

Table 12 Descriptive algorithm for deep neural network (DNN) (description relies heavily on Stanford University (n.d.) Multilayer neural network at http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/ and James et al. (2021))

12

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Attanasi, E.D., Coburn, T.C. & Freeman, P.A. Machine learning approaches to identify lithium concentration in petroleum produced waters. Miner Econ (2024). https://doi.org/10.1007/s13563-023-00409-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13563-023-00409-8

Keywords

JEL Classification

Navigation