Skip to main content

Advertisement

Log in

GEOstats: an excel-based data analysis program applying basic principles of statistics for geological studies

  • Software Article
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

Statistical evaluation of the data collected from field and laboratory is an important task in the earth sciences. The aim of a geological study is to reveal a scientific result from the earth, but the conclusions must be based on the analytical inferences, as in natural and engineering sciences. Therefore, the importance of data analysis increases depending on the improvement of technological methods in geology. The purpose of data analysis in geology is to examine the rate at which a feature changes within the population. The geological data may be lithological, textural (e.g. grain size and shape), structural (e.g. bedding, fault, foliation, jointing or lineation etc.) and chemical (e.g. major and trace elements, isotope ratios etc.) measurements of the rock, mineral, fossil, soil or water specimens. GEOstats is an excel-based data analysis program that provides graphical and numerical results, and data simulation/statistical modeling (e.g. simple regression analysis, box plot, Q–Q plot, XYZ plot, sample distribution and classification) of samples representing a population for geologists and other researchers as well. The program can perform both simple data analysis (e.g. basic statistical calculations) of numerical results and highly complex multivariate analysis such as cluster analysis, principal component analysis and common factor analysis . GEOstats is also able to carry out the error bars analysis by using relative standard deviation values for different confidence intervals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

GEOstats is available to download at GitHub (https://github.com/mstgndz/GEOstats). Disk occupancy of GEOstats is approximately 14.5 MB, and it works on the Microsoft Windows versions with Excel.

Code availability

Codes used in GEOstats can be imported from GitHub at the public repositories: (1) Eigen values for a symmetric matrix (https://github.com/YoichiroUrita/Math) and (2) k-means algorithm for clustering (https://github.com/gpolic/kmeans-excel). The VBA codes (3.36 and 16.4 KB, respectively) have been provided as open access by their authors in 2018.

References

  • Abt K (1987) Descriptive data analysis: a concept between confirmatory and exploratory data analysis. Methods Inf Med 26(02):77–88

    Article  Google Scholar 

  • Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc B 44(2):139–177

    Google Scholar 

  • Aitchison J, Egozcue J (2005) Compositional data analysis: where are we and where should we be heading? Math Geol 37(7):829–850

    Article  Google Scholar 

  • Asan K (2020) Whole-rock elemental and Sr-Nd isotope geochemistry and petrogenesis of the Miocene Elmadağ Volcanic Complex, Central Anatolia (Ankara, Turkey). Geosciences 10(9):348

    Article  Google Scholar 

  • Behrens JT (1997) Principles and procedures of exploratory data analysis. Psychol Methods 2(2):131–160

    Article  Google Scholar 

  • Bernard K (2020) Epithermal clast coating inside the rock avalanche-debris flow deposits from Mount Meager Volcanic Complex, British Columbia (Canada). J Volcanol Geotherm Res 402:1–22

    Article  Google Scholar 

  • Bernard K, van Wyk de Vries B, Thouret J-C (2019) Fault textures in volcanic debris-avalanche deposits and transformations into lahars: the Pichu Pichu thrust lobes in south Peru compared to worldwide avalanche deposits. J Volcanol Geotherm Res 371:116–136

    Article  Google Scholar 

  • Blott SJ, Pye K (2001) GRADISTAT: a grain size distribution and statistics package for the analysis of unconsolidated sediments. Earth Surf Process Landforms 26:1237–1248

    Article  Google Scholar 

  • Chayes F (1960) On correlation between variables of constant sum. J Geophys Res 65(12):4185–4193

    Article  Google Scholar 

  • Cortés JA (2009) On the Harker variation diagrams: a comment on “the statistical analysis of compositional data. where are we and where should we be heading?” by Aitchison and Egozcue (2005). Math Geosci 41:817–828

    Article  Google Scholar 

  • Cortés JA, Palma JL, Wilson M (2007) Deciphering magma mixing: the application of cluster analysis to the mineral chemistry of crystal populations. J Volcanol Geotherm Res 165:163–188

    Article  Google Scholar 

  • Cox KG, Bell J, Pankhurst R (1979) The interpretation of Igneous Rocks. George Allen & Unwin, London, p 450

    Book  Google Scholar 

  • Folk RL, Ward WC (1957) Brazos River bar: a study in the significance of grain size parameters. J Sediment Petrol 27:3–27

    Article  Google Scholar 

  • Fowler AC, Scheu B (2016) A theoretical explanation of grain size distributions in explosive rock fragmentation. Proc R Soc A 472:20150843

    Article  Google Scholar 

  • Hammer Ø, Harper DAT (2006) Paleontological data analysis. Blackwell Publishing, Oxford, p 351

    Google Scholar 

  • Hammer Ø, Harper DAT, Ryan PD (2001) Past: paleontological statistics software package for education and data analysis. Palaeontol Electron 4:1–9

    Google Scholar 

  • Hosseini ST, Asghari O, Haroni HA (2020) Multivariate anomaly modeling of primary geochemical halos by U-spatial statistic algorithm development: a case study from the Sari Gunay epithermal gold deposit, Iran. Ore Geol Rev 127:3845

    Article  Google Scholar 

  • https://github.com/gpolic/kmeans-excel.

  • https://github.com/YoichiroUrita/Math.

  • https://www.ibm.com/products/spss-statistics.

  • https://www.mathworks.com.

  • https://www.r-project.org.

  • https://www.tibco.com.

  • https://www.xlstat.com.

  • Jensen J, Lake LW, Corbett PWM, Goggin D (2000) Statistics for petroleum engineers and geoscientists, 2nd edn. Prentice Hall PTR, Hoboken, p 362

    Google Scholar 

  • Konicki KM, Holman RA (2000) The statistics and kinematics of transverse sand bars on an open coast. Mar Geol 169:69–101

    Article  Google Scholar 

  • Lohmar S, Robin C, Gourgand A, Clavero J, Parada MA, Moreno H, Ersoy O, López-Escobar L, Naranjo J-A (2010) Evidence of magma-water interaction during the 13,800 years BP explosive cycle of the Licán ignimbrite, Villarrica volcano (southern Chile). Andean Geol 34:233–248

    Google Scholar 

  • MacLeod N (2007) PalaeoMath 101: Part 12—Groups III: cluster analysis. palaeontology. Newsletter 1–14

  • Maiz I, Arambarri I, Garcia R, Millan E (2000) Evaluation of heavy metal availability in polluted soils by two sequential extraction procedures using factor analysis. Environ Pollut 110(1):3–9

    Article  Google Scholar 

  • Mann PS (1995) Introductory statistics, 2nd edn. Wiley, New York, p 784

    Google Scholar 

  • Mariño-Paredes J, Morgavi D, Di Vito M, de Vito S, Sandivero F, Dueffels K, Beckmann G, Perugini D (2017) Syneruptive sequential fragmentation of pyroclastics from fractal modeling of grain size distributions of fall deposits: the Cretaio Tephra eruption (Ischia Island, Italy). J Volcanol Geotherm Res 345:161–171

    Article  Google Scholar 

  • Marshall G, Jonker L (2010) An introduction to descriptive statistics: a review and practical guide. Radiography 16:e1–e7

    Article  Google Scholar 

  • Miall AD (1977) Ordinal data and the gamma statistic in geology. J Sediment Res 47(2):794–799

    Google Scholar 

  • Morgenthaler S (2009) Exploratory data analysis. Wiley Interdiscip Rev Comput Stat 1(1):33–44

    Article  Google Scholar 

  • Nadoll P, Mauk JL, Hayes TS, Koenig AE, Box SE (2012) Geochemistry of magnetite from hydrothermal ore deposits and host rocks of the Mesoproterozoic Belt Supergroup, United States. Soc Econom Geol 107:1275–1292

    Article  Google Scholar 

  • Nick TG (2007) Descriptive statistics. In: Ambrosius WT (ed) Topics in biostatistics. Methods in molecular biology™. Humana Press, Totowa, p 528

    Google Scholar 

  • Reid MK, Spencer KL (2009) Use of principal components analysis (PCA) on estuarine sediment datasets: the effect of data pre-treatment. Environ Pollut 157:2275–2281

    Article  Google Scholar 

  • Reimann C, Filzmoser P, Garrett RG (2002) Factor analysis applied to regional geochemical data: problems and possibilities. Appl Geochem 17:185–206

    Article  Google Scholar 

  • Roberts NM, Tikoff B, Davis JR, Stetson-Lee T (2019) The utility of statistical analysis in structural geology. J Struct Geol 125:64–73

    Article  Google Scholar 

  • Rollinson H (1992) Another look at the constant sum problem in geochemistry. Mineral Mag 56:469–475

    Article  Google Scholar 

  • Roux D (2010) VBA pour Excel—Bibliothèque mathématique avec applications pratiques. Ellipses, London, p 594

    Google Scholar 

  • Scott L, Neumann FH, Brook GA, Bousman CB, Norström E, Metwally AA (2012) Terrestrial fossil-pollen evidence of climate change during the last 26 thousand years in Southern Africa. Quater Sci Rev 32:100–118

    Article  Google Scholar 

  • Shikazono N, Ogawa Y, Utada M, Ishiyama D, Mizuta T, Ishikawa N, Kubota Y (2008) Geochemical behavior of rare earth elements in hydrothermally altered rocks of the Kuroko mining area, Japan. J Geochem Explor 98:65–79

    Article  Google Scholar 

  • Smith G, Rowley P, Williams R, Giordano G, Trolese M, Silleni A, Parsons DR, Capon S (2020) Grain size variations in ignimbrites and implications for the transport of pyroclastic flows. Nat Commun 11:1–11

    Google Scholar 

  • Swan ARH, Sandilands M (1995) Introduction to geological data analysis. Blackwell Science, Oxford, p 446

    Google Scholar 

  • Swan ARH, Sandilands M (2012) Practical statistics for geoscientists. Online edition: Practical Statistics for Geoscientists.pdf (anu.edu.au), 180 pp

  • Temur S (2003) Jeolojide veri analizleri. Çizgi Kitapevi Yayınları, Konya, 170 pp

  • Tukey JW (1977) Exploratory Data Analysis. Pearson, 712 pp

  • van den Boogaart KG, Tolosana-Delgado R (2008) ”Compositions”: a unified R package to analyze compositional data. Comput Geosci 34:320–338

    Article  Google Scholar 

  • Yu L, Rozemeijer J, Van Breukelen BM, Ouboter M, van Der Vlugt C, Broers HP (2018) Groundwater impacts on surface water quality and nutrient loads in lowland polder catchments: monitoring the greater Amsterdam area. Hydrol Earth Syst Sci 22:487–508

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank to the editor Dr. Hassan A. Babaie and two anonymous reviewers for their critical and constructive comments that improved the quality of this paper and software. The first author also thanks to Campus France for Scholarships to Foreign Students in France and Turkey’s Council of Higher Education for 100/2000 PhD Scholarship.

Author information

Authors and Affiliations

Authors

Contributions

M.G.: Software and Programming, Methodology, Conceptualization, Writing. K.A.: Methodology, Conceptualization, Writing, Supervision.

Corresponding author

Correspondence to Mesut Gündüz.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Communicated by: H. Babaie

Supplementary Information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gündüz, M., Asan, K. GEOstats: an excel-based data analysis program applying basic principles of statistics for geological studies. Earth Sci Inform 15, 705–712 (2022). https://doi.org/10.1007/s12145-021-00710-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-021-00710-6

Keywords

Navigation