Skip to main content

Compositional Data Analysis in E-Tourism Research

  • 33 Accesses

Abstract

Compositional Data (CoDa) contain information about the relative importance of parts of a whole, which the researcher deems more interesting than overall size or volume. In web mining, for instance, the relative frequency of a term is normally given more importance than absolute frequency, which mostly tells about web size, in other words, the sheer volume of online content. Many research questions in e-tourism are either related to the distribution of a whole or relative importance: How do the most salient contents in hotel Facebook accounts relate to hotel characteristics? What are the dominant topics on TripAdvisor comments about fish freshness in seafood restaurants? How does the relative popularity of search terms in Google relate to destination market share?

In CoDa, most of the basic statistical notions, such as center, variation, association, and distance, are flawed unless they are re-expressed by means of logarithms of ratios. The appeal of log-ratios is that once they are computed, standard statistical methods can be used. On the other hand, since one part can only increase in relative terms if some other(s) decrease, statistics need to be multivariate.

This chapter uses an example based on TripAdvisor hotel reviews from one of the most visited cities worldwide, Barcelona, focusing on what users complain about, to illustrate the main multivariate exploratory and descriptive tools in CoDa, including imputation of zeros prior to computing the log-ratios, multivariate outlier detection, principal component analysis, cluster analysis, and multivariate data visualization tools. The use of CoDaPack, a popular CoDa freeware, is described in a step-by-step fashion.

Keywords

  • Compositional data
  • CoDa
  • Content analysis
  • TripAdvisor reviews
  • Cluster analysis
  • Biplot

To appear in: Zheng X, Fuchs M, Gretzel U, Höpken W (eds) Handbook of E-Tourism. Springer Nature.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  • Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc B Methodol 44(2):139–177

    Google Scholar 

  • Aitchison J (1983) Principal component analysis of compositional data. Biometrika 70(1):57–65

    Google Scholar 

  • Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman and Hall, London

    Google Scholar 

  • Aitchison J (2001) Simplicial inference. In: Marlos AGV, Richards DSP (eds) Algebraic methods in statistics and probability: AMS special session on algebraic methods in statistics. Contemporary mathematics series. American Mathematical Society, Providence, pp 1–22

    Google Scholar 

  • Aitchison J, Barceló-Vidal C, Martín-Fernández JA, Pawlowsky-Glahn V (2000) Logratio analysis and compositional distances. Math Geol 32(3):271–275

    Google Scholar 

  • Aitchison J, Greenacre M (2002) Biplots of compositional data. J R Stat Soc C App 51(4):375–392

    Google Scholar 

  • Bacon-Shone J (2003) Modelling structural zeros in compositional data. In: Thió-Henestrosa S, Martín-Fernández JA (eds) Proceedings of CoDaWork’03, the 1st compositional data analysis workshop

    Google Scholar 

  • Barceló-Vidal C, Martín-Fernández JA (2016) The mathematics of compositional analysis. Aust J Stat 45(4):57–71

    Google Scholar 

  • Batista-Foguet JM, Ferrer-Rosell B, Serlavós R, Coenders G, Boyatzis RE (2015) An alternative approach to analyze ipsative data. Revisiting experiential learning theory. Front Psychol 6:1742

    Google Scholar 

  • Blasco-Duatis M, Coenders G (2020) Sentiment analysis of the agenda of the Spanish political parties on Twitter during the 2018 motion of no confidence. A compositional data approach. Revista Mediterránea de Comunicación 11(2):185–198

    Google Scholar 

  • Blasco-Duatis M, Coenders G, Sáez M, Fernández-García N, Cunha I (2019) Mapping the agenda-setting theory, priming and the spiral of silence in Twitter accounts of political parties. Int J Web Based Commun 15(1):4–24

    Google Scholar 

  • Blasco-Duatis M, Sáez-Zafra M, Fernández-García N (2018) Compositional representation (CoDa) of the agenda-setting of the political opinion makers in the main Spanish media groups in the 2015 General Election. Commun Soc 31(2):1–24

    Google Scholar 

  • Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (2006) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London

    Google Scholar 

  • Carreras-Simó M, Coenders G (2020) Principal component analysis of financial statements. A compositional approach. Rev Métodos Cuant Econ Empresa 29:18–37

    Google Scholar 

  • Coenders G, Ferrer-Rosell B (2020) Compositional data analysis in tourism. Review and future directions. Tour Anal 25(1):153–168

    Google Scholar 

  • Coenders G, Martín-Fernández JA, Ferrer-Rosell B (2017) When relative and absolute information matter. Compositional predictor with a total in generalized linear models. Stat Model 17(6): 494–512

    Google Scholar 

  • Daunis i Estadella J, Thió i Fernández de Henestrosa S, Mateu i Figueras G (2011) Two more things about compositional biplots: quality of projection and inclusion of supplementary elements. In: Egozcue JJ, Tolosana-Delgado R, Ortego MI (eds) Proceedings of the 4th international workshop on compositional data analysis

    Google Scholar 

  • Di Palma MA, Gallo M (2019) External information model in a compositional perspective: evaluation of Campania adolescents’ preferences in the allocation of leisure-time. Soc Indic Res 146(1–2):117–133

    Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V (2019) Compositional data: the sample space and its structure. TEST 28(3):599–638

    Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300

    Google Scholar 

  • Ferrer-Rosell B (2021) Compositional analysis of tourism-related data. In: Correia A, Dolnicar S (eds) Women’s voices in tourism research. Contribution to knowledge and letters to future generations 2021. The University of Queensland, Brisbane, pp 182–188

    Google Scholar 

  • Ferrer-Rosell B, Coenders G (2017) Airline type and tourist expenditure: are full service and low cost carriers converging or diverging? J Air Transp Manag 63:119–125

    Google Scholar 

  • Ferrer-Rosell B, Coenders G (2018) Destinations and crisis. Profiling tourists’ budget share from 2006 to 2012. J Destin Mark Manag 7:26–35

    Google Scholar 

  • Ferrer-Rosell B, Coenders G, Martínez-Garcia E (2015) Determinants in tourist expenditure composition – the role of airline types. Tour Econ 21(1):9–32

    Google Scholar 

  • Ferrer-Rosell B, Coenders G, Martínez-Garcia E (2016a) Segmentation by tourist expenditure composition. An approach with compositional data analysis and latent classes. Tour Anal 21(6):589–602

    Google Scholar 

  • Ferrer-Rosell B, Coenders G, Mateu-Figueras G, Pawlowsky-Glahn V (2016b) Understanding low cost airline users’ expenditure patterns and volume. Tour Econ 22(2):269–291

    Google Scholar 

  • Ferrer-Rosell B, Marine-Roig E (2020) Projected versus perceived destination image. Tour Anal 25(2–3):227–237

    Google Scholar 

  • Ferrer-Rosell B, Martin-Fuentes E, Marine-Roig E (2019) Do hotels talk on Facebook about themselves or about their destinations? In: Pesonen J, Neidhardt J (eds) Information and communication technologies in tourism 2019. Springer, Cham, pp 344–356

    Google Scholar 

  • Ferrer-Rosell B, Martin-Fuentes E, Marine-Roig E (2020) Diverse and emotional: Facebook content strategies by Spanish hotels. Inform Technol Tour 22(1):53–74

    Google Scholar 

  • Ferrer-Rosell B, Martin-Fuentes E, Vives-Mestres M, Coenders G (2021) When size does not matter: compositional data analysis in marketing research. In: Nunkoo R, Teeroovengadum V, Ringle C (eds) Handbook of research methods for marketing management. Edward Elgar, Cheltenham, pp 73–90

    Google Scholar 

  • Filzmoser P, Garrett RG, Reimann C (2005) Multivariate outlier detection in exploration geochemistry. Comput Geosci 31(5):579–587

    Google Scholar 

  • Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3):233–248

    Google Scholar 

  • Filzmoser P, Hron K, Templ M (2018) Applied compositional data analysis with worked examples in R. Springer, New York

    Google Scholar 

  • Fry T (2011) Applications in economics. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis. Theory and applications. Wiley, New York, pp 318–326

    Google Scholar 

  • Gabriel KR (1971) The biplot-graphic display of matrices with application to principal component analysis. Biometrika 58(3):453–467

    Google Scholar 

  • Godichon-Baggioni A, Maugis-Rabusseau C, Rau A (2019) Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data. J Appl Stat 46(1):47–65

    Google Scholar 

  • Greenacre M (2017) ‘Size’and ‘shape’ in the measurement of multivariate proximity. Methods Ecol Evol 8(11):1415–1424

    Google Scholar 

  • Greenacre M (2018) Compositional data analysis in practice. Chapman and Hall/CRC Press, New York

    Google Scholar 

  • Hruzová K, Rypka M, Hron K (2017) Compositional analysis of trade flows structure. Aust J Stat 46(2):49–63

    Google Scholar 

  • Hu N, Zhang T, Gao B, Bose I (2019) What do hotel customers complain about? Text analysis using structural topic model. Tour Manag 72:417–426.

    Google Scholar 

  • Joueid A, Coenders G (2018) Marketing innovation and new product portfolios. A compositional approach. J Open Innov Technol Mark Complex 4:19

    Google Scholar 

  • Kogovšek T, Coenders G, Hlebec V (2013) Predictors and outcomes of social network compositions. A compositional structural equation modeling approach. Soc Netw 35(1):1–10

    Google Scholar 

  • Kwok L, Yu B (2013) Spreading social media messages on Facebook: an analysis of restaurant business-to-consumer communications. Cornell Hosp Q 54:84–94

    Google Scholar 

  • Linares-Mustarós S, Coenders G, Vives-Mestres M (2018) Financial performance and distress profiles. From classification according to financial ratios to compositional classification. Adv Account 40:1–10

    Google Scholar 

  • Lovell D, Pawlowsky-Glahn V, Egozcue JJ, Marguerat S, Bähler J (2015) Proportionality: a valid alternative to correlation for relative data. PLoS Comput Biol 11(3):e1004075

    Google Scholar 

  • Marine-Roig E, Ferrer-Rosell B (2018) Measuring the gap between projected and perceived destination images of Catalonia using compositional analysis. Tour Manag 68:236–249

    Google Scholar 

  • Martín-Fernández JA, Barceló-Vidal C, Pawlowsky-Glahn V (1998) A critical approach to non-parametric classification of compositional data. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin, pp 49–56

    Google Scholar 

  • Martín-Fernández JA, Palarea-Albaladejo J, Olea RA (2011) Dealing with zeros. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis. Theory and applications. Wiley, New York, pp 47–62

    Google Scholar 

  • Martín-Fernández JA, Hron K, Templ M, Filzmoser P, Palarea-Albaladejo J (2015) Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Model 15(2):134–158

    Google Scholar 

  • Martin-Fuentes E (2016) Are guests of the same opinion as the hotel star-rate classification system? J Hosp Tour Manag 29:126–134

    Google Scholar 

  • Morais J, Thomas-Agnan C, Simioni M (2018) Using compositional and Dirichlet models for market share regression. J Appl Stat 45(9):1670–1689

    Google Scholar 

  • Ortells R, Egozcue JJ, Ortego MI, Garola A (2016) Relationship between popularity of key words in the Google browser and the evolution of worldwide financial indices. In: Martín-Fernández JA, Thió-Henestrosa S (eds) Compositional data analysis. Springer proceedings in mathematics and statistics, vol 187. Springer, Cham, pp 145–166

    Google Scholar 

  • Palarea-Albaladejo J, Martín-Fernández JA (2008) A modified EM alr-algorithm for replacing rounded zeros in compositional data sets. Comput Geosci 34(8):902–917

    Google Scholar 

  • Palarea-Albaladejo J, Martín-Fernández JA (2015) zCompositions – R package for multivariate imputation of left-censored data under a compositional approach. Chemomet Intell Lab 143: 85–96

    Google Scholar 

  • Pawlowsky-Glahn V, Buccianti A (2011) Compositional data analysis. Theory and applications. Wiley, New York

    Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modelling and analysis of compositional data. Wiley, Chichester

    Google Scholar 

  • Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlations which may arise when indices are used in the measurements of organs. Proc R Soc Lond 60:489–498

    Google Scholar 

  • Roberts ME, Stewart BM, Airoldi EM (2016) A model of text for experimentation in the social sciences. J Am Stat Assoc 111(515):988–1003

    Google Scholar 

  • Russell MA (2014) Mining the social web: data mining Facebook, Twitter, LinkedIn, Google+ , GitHub, and more. O’Reilly, Sebastopol

    Google Scholar 

  • Sanz-Sanz E, Martinetti D, Napoleone C (2018) Operational modeling of peri-urban farmland for public action in Mediterranean context. Land Use Policy 75:757–771

    Google Scholar 

  • Song H, Seetaram N, Ye S (2019) The effect of tourism taxation on tourists’ budget allocation. J Destin Mark Manag 11:32–39

    Google Scholar 

  • Thió-Henestrosa S, Martín-Fernández JA (2005) Dealing with compositional data: the freeware CoDaPack. Math Geol 37(7):773–793

    Google Scholar 

  • Van den Boogaart KG, Tolosana-Delgado R (2013) Analyzing compositional data with R. Springer, Berlin

    Google Scholar 

  • Van Eijnatten FM, van der Ark LA, Holloway SS (2015) Ipsative measurement and the analysis of organizational values: an alternative approach for data analysis. Qual Quant 49(2):559–579

    Google Scholar 

  • Vives-Mestres M, Martín-Fernández JA, Kenett R (2016) Compositional data methods in customer survey analysis. Qual Reliab Eng Int 32(6):2115–2125

    Google Scholar 

  • Voltes-Dorta A, Jiménez JL, Suárez-Alemán A (2014) An initial investigation into the impact of tourism on local budgets: a comparative analysis of Spanish municipalities. Tour Manag 45:124–133

    Google Scholar 

  • Yoo KH, Lee W (2017) Facebook marketing by hotel groups: impacts of post content and media type on fan engagement. In: Sigala M, Gretzel U (eds) Advances in social media for travel, tourism and hospitality: new perspectives, practice and cases 2017. Taylor and Francis, London, pp 131–146

    Google Scholar 

  • Zhou X, Ferrer-Rosell B, Coenders G (2017) Use of social media as e-marketing tool. Comparison of Weibo posts of big and small hotels in China. In: Correia A, Kozak M, Gnoth J, Fyall A (eds) The art of living together. 7th advances tourism marketing conference. CEFAGE – Universidade do Algarve, Faro, pp 127–131

    Google Scholar 

Download references

Acknowledgements

The authors acknowledge the support of the Catalan Government for the accreditation as Consolidated Research Groups TURESCO (2017 SGR 49) and COSDA (2017 SGR 656). This work was supported by the Spanish Ministry of Economy, Industry and Competitiveness (Grant id.: TURCOLAB ECO2017-88984-R) and by the Spanish Ministry of Science, Innovation and Universities and FEDER (Grant id.: CODAMET RTI2018-095518-B-C21). Finally, authors also acknowledge the comments by Santi Thió-Henestrosa.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Berta Ferrer-Rosell .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this entry

Verify currency and authenticity via CrossMark

Cite this entry

Ferrer-Rosell, B., Coenders, G., Martin-Fuentes, E. (2022). Compositional Data Analysis in E-Tourism Research. In: Xiang, Z., Fuchs, M., Gretzel, U., Höpken, W. (eds) Handbook of e-Tourism. Springer, Cham. https://doi.org/10.1007/978-3-030-05324-6_136-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05324-6_136-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05324-6

  • Online ISBN: 978-3-030-05324-6

  • eBook Packages: Springer Reference Business and ManagementReference Module Humanities and Social Sciences