Abstract
Our primary objective is evaluation of quality of process. This is addressed through semantic mapping of process. We note how this is complementary to the primacy of output results or products. We use goaloriented discourse as a case study. We draw benefit from how social and political theorist, Jürgen Habermas, uses what was termed “communicative action”. An orientation in Habermas’s work, that we use, is analysis of communication or discourse. For this, we take Twitter social media. In our case study, we map the discourse semantically, using the correspondence analysis platform for such latent semantic analysis. This permits qualitative and quantitative analytics. Our case study is a set of eight carefully planned Twitter campaigns relating to environmental issues. The aim of these campaigns was to increase environmental awareness and behaviour. Each campaign was launched by an initiating tweet. Using the data gathered in these Twitter campaigns, we sought to map them, and hence to track the flow of the Twitter discourse. This mapping was achieved through semantic embedding. The semantic distance between an initiating act and the aggregate semantic outcome is used as a measure of process effectiveness.
Similar content being viewed by others
References
Barker, M., Barker, D.I., Bormann, N.F., Neher, K.E.: Social Media Marketing. A Strategic Approach. Cengage Learning, Andover (2012)
Bakliwal, A., Foster, J., van der Puil, J., O’Brien, R., Tounsi, L., Hughes, M.: Sentiment analysis of political tweets: towards an accurate classifier. In: Proceedings of the Workshop on Language in Social Media, LASM 2013, Association for Computational Linguistics, pp. 49–58 (2013)
Benzécri, J.P.: L’Analyse des Données, Tome I Taxinomie, Tome II Correspondances, 2nd edn. Dunod, Paris (1979)
Benzécri, J.P.: Correspondence Analysis Handbook. Dekker, Basel (1994)
Blake, J.: Overcoming the “ValueAction Gap” in environmental policy: tensions between national policy and local experience. Local Environ. 4(3), 257–278 (1999)
Blasius, J., Greenacre, M. (Eds.): Visualization and Verbalization of Data. Chapman & Hall/CRC Press, Boca Raton, FL (2014)
Bull, R., Petts, J., Evans, J.: Social learning from public engagement: dreaming the impossible? J. Environ. Plann. Manag. 51(5), 701–716 (2008)
Bull, R., Petts, J., Evans, J.: The importance of context for effective public engagement: learning from the governance of waste. J. Environ. Plann. Manag. 8(53), 991–1009 (2010)
Chew, C., Eysenbach, G.: Pandemics in the age of Twitter: content analysis of tweets during the 2009 H1N1 outbreak. PLoS One 5(11), e14118 (2010)
Collins, J., Thomas, G., Willis, R., Wilsdon, J.: Carrots, sticks and sermons: influencing public behaviour for environmental goals. A Demos/Green Alliance report produced for DEFRA, Demos/Green Alliance, pp. 55 www.demos.co.uk/files/CarrotsSticksSermons.pdf (2003). Retrieved 13 April 2014
Finnis, J., Chan, S., Clements, R.: Let’s get real. How to evaluate online success? Report from the culture 24 action research project. Brighton, pp. 40 http://www.keepandshare.com/doc/3148918/culture24howtoevaluateonlinesuccess2pdfseptember1920111115am25meg?da=y (2011). Retrieved 13 April 2014
Futerra: The Rules of the Game.: The principles of climate change communication. Futerra Sustainability Communications Ltd., London, pp. 5 www.futerra.co.uk/downloads/RulesOfTheGame.pdf (2005). Retrieved 13 April 2014
Howto.gov: Social Media Metrics for Federal Agencies, U.S. General Services Administration. http://www.howto.gov/socialmedia/usingsocialmediaingovernment/metricsforfederalagencies (2013). Retrieved 13 April 2014
Hutto, C.J., Gilbert, E.: VADER: a parsimonious rulebased model for sentiment analysis of social media text. In: ICWSM 2014, Proceedings of the Eighth International Conference on Weblogs and Social Media. Ann Arbor, Michigan, June 1–4 (2014)
Le Roux, B., Rouanet, H.: Geometric Data Analysis: From Correspondence Analysis to Structured Data Analysis. Kluwer Academic, Dordrecht (2004)
Lorenzoni, I., NicholsonCole, S., Whitmarsh, L.: Barriers perceived to engaging with climate change among the UK public and their policy implications. Glob. Environ. Change 17(3–4), 445–459 (2007)
Matušík, M.B.: Jürgen Habermas, philosophy and social theory. http://www.britannica.com/EBchecked/topic/250787/JurgenHabermas/281673/Philosophyandsocialtheory. Retrieved 21 March 2014
McKee, R.: Story: Substance, Structure, Style, and the Principles of Screenwriting. Methuen, London (1999)
Murtagh, F.: Correspondence Analysis and Data Coding with R and Java. Chapman & Hall/CRC, Boca Raton, FL (2005)
Murtagh, F., Ganz, A.: Pattern recognition in narrative: tracking emotional expression in context. Preprint, http://arxiv.org/abs/1405.3539 (2015)
Murtagh, F., Contreras, P.: Big data scaling through metric mapping: exploiting the remarkable simplicity of very high dimensional spaces using Correspondence Analysis, in preparation (2015)
Pearce, W., Holmberg, K., Hellsten, I., Nerlich, B.: Climate change on Twitter: topics, communities and conversations about the 2013 IPCC Working Group 1 report, PLoS One, 9 (4), e94785 (2014)
Pennebaker, J.W.: The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury Press, New York (2011)
Pianosi, M., Bull, R., Rieser, M.: Impact, influence and reach: lessons in measuring the impact of social media, preprint, pp. 36 (2013)
Séguéla, J., Saporta, G.: A comparison between latent semantic analysis and correspondence analysis. Presentation, CARME, Correspondence Analysis and Related Methods Conference, Rennes (2011). http://carme2011.agrocampusouest.fr/slides/Seguela_Saporta.pdf
Shove, E.: Beyond the ABC: climate change policy and theories of social change. Environ. Plann. 42, 1273–1285 (2010)
Verplanken, B., Walker, I., Daves, A., Jurasek, M.: Context change and travel mode choice: combining the habit discontinuity and selfactivation hypotheses. J. Environ. Plann. Manag. 53(8), 991–1009 (2008)
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Our 8 campaign initiating tweets
The following are the campaign initiating tweets, in full. For campaign 4, the two initiating tweets were merged together. DMU stands for De Montfort University.

Campaign 1:
Introducing #climatechange! Is the climate changing? What are the observed changes? Are humans causing it? Discuss http://t.co/cMUOmbEt #dmuCC

Campaign 2:
Do you feel #climatechange is a distant issue? Read and listen to the climate witnesses in the UK http://t.co/FLWaTqTb

Campaign 3:
Goodmorning #DMU!! How was your weekend? Did you participate in the #marathon? We are talking about electricity this week! #dmuelectricity

Campaign 4:
Goodmorning #DMU!! How was your weekend? We are talking about gas and heating this week! #dmuenergy Wishing you all a nice #ecomonday!

Campaign 4:
Connect with us to discover what #DMU is already doing to cut its #gas use and tell us what you think we could all do to make it better!

Campaign 5:
Goodmorning #DMU!! We talk about #sustainable food this week. We have a question for you! What do you think does Sustainable Food mean?

Campaign 6:
Here I am, fueled with caffeine! This week we will be talking in particular of #transport. How do you get from home to #DMU? #dmutransport

Campaign 7:
New post! #Sustainable #Water  Are you familiar with the concept of #WaterSecurity? http://t.co/T9QYVlTJ #DMU #climate #sustainabledmu

Campaign 8:
@SustainableDMU #MeatFreeMonday seems to have latched itself into my brain! Not a big meat eater but like having a dedicated veggie day!
As discussed in Sects. 4.1 and 4.2, a set of 339 terms was ultimately selected as the set of all employable words used in the discourse. The terms retained for these particular initiating tweets, with frequency of occurrence, are as follows. For campaigns 1 through 8, we see that we have, respectively, summed frequencies of occurrence of terms: 4,4,7,14,10,6,7,5.

Campaign 1:
climate climatechange dmucc http [1 occurrence each]

Campaign 2:
climate climatechange http read [1 occurrence each]

Campaign 3:
dmu electricity goodmorning participate talking week weekend [1 occurrence each]

Campaign 4:
cut dmu dmuenergy ecomonday gas goodmorning heating nice talking tell week weekend [dmu, gas: 2 occurrences; otherwise 1 occurrence]

Campaign 5:
dmu food goodmorning mean question sustainable talk week [food, sustainable: 2 occurrences; otherwise 1 occurrence]

Campaign 6:
dmu dmutransport home talking transport week [1 occurrence each]

Campaign 7:
climate dmu http post sustainable sustainabledmu water [1 occurrence each]

Campaign 8:
day meat meatfreemonday sustainabledmu veggie [1 occurrence each]
The campaign 4 tweet was a merged one (from original tweets 303, 304). In campaign 4, the term “gas” is both word and hashtag. It is easy to go back to the original tweets and see the hashtags, or the tweeters. We keep the “http” part of the URL since it informs us that a web address is in the tweet.
Appendix 2: Correspondence analysis
Correspondence Analysis provides access to the semantics of information expressed by the data. The way it does this is to define semantically each observation (a tweet here), or row vector, as the average of all attributes (term here) that are related to it. Similarly it defines semantically each attribute, or column vector, as the average of all observations that are related to it.
This semantic mapping analysis is as follows:

1.
The starting point is a matrix that crosstabulates the dependencies, e.g. frequencies of joint occurrence, of an observations crossed by attributes matrix.

2.
By endowing the crosstabulation matrix with the \(\chi ^2\) (chi squared) metric on both observation set (rows) and attribute set (columns), we can map observations and attributes into the same space, endowed with the Euclidean metric.

3.
Interpretation is through (i) projections of observations, attributes onto factors; (ii) contributions by observations, attributes to the inertia of the factors; and (iii) correlations of observations, attributes with the factors. The factors are ordered by decreasing importance.
Correspondence analysis is not unlike principal components analysis in its underlying geometrical bases. While principal components analysis is particularly suitable for quantitative data, correspondence analysis is appropriate for the following types of (nonnegative valued) input data: frequencies, contingency tables, probabilities, categorical data, and mixed qualitative/categorical data.
The factors are defined by a new orthogonal coordinate system endowed with the Euclidean distance. The factors are determined from the eigenvectors of a positive semidefinite matrix (hence with nonnegative eigenvalues). This matrix which is diagonalized (i.e. subjected to singular value decomposition) encapsulates the requirement for the new coordinates to successively best fit the given data.
The “standardizing” inherent in correspondence analysis (a consequence of the \(\chi ^2\) distance) treats rows and columns in a symmetric manner. One byproduct is that the row and column projections in the new space may both be plotted on the same output graphic presentations (the principal factor plane given by the factor 1 and factor 2 coordinates; or other pairs of factors).
From frequencies of occurrence to clouds of profiles, each profile with an associated mass
From the initial frequencies data matrix, a set of probability data, \(f_{ij}\), is defined by dividing each value by the grand total of all elements in the matrix. In Correspondence Analysis, each row (or column) point is considered to have an associated weight. The row weight is the row sum, divided by the overall data matrix total. The column weight is the column sum, divided by the overall data matrix total.
Next row profiles are defined as the row frequencies divided by the row weight (also termed the mass). Similarly we have column profiles. The \(\chi ^2\) distance between profiles is a weighted Euclidean distance. It is an appropriate distance for what are, initially here, categorical data.
We thus look on our row points (our tweets) as a cloud of points endowed with the \(\chi ^2\) distance. Similarly our column points (our words) are a cloud of points that are also endowed with the \(\chi ^2\) distance.
Just like in classical mechanics, we consider the inertia of these clouds. To begin with we have their total inertia, that is the inertia about their centre of gravity. The centre of gravity is the weighted mean. The way that the cloud of row points, and the cloud of column points, have been defined, is that the inertias of these two clouds are identical.
Output: cloud of points endowed with the Euclidean metric in factor space
Decomposing the moment of inertia of the cloud of row points (the cloud of tweets) and the cloud of column points (the cloud of words) furnishes the principal axes of inertia, defined from a singular value decomposition. The inertia about the principal axes is given by the eigenvalues. The principal axes themselves are defined from the eigenvectors. The principal axes are termed factors. Latent variables, or latent semantic axes, are also terms that can be used.
There is the following invariance relationship. The \(\chi ^2\) distance between two rows (two tweets), or between two columns (two columns), is identical to the Euclidean distance between the two rows, or respectively the two columns, in the factor space. The latter, the factor space, allows us to display the data.
The projection of row points and column points on the factors express the information. The total information content of either row set, or column set, is the cloud inertia. Associated with the factors is the information in our data, arranged by decreasing importance. The information importance is measured by inertia about the axes, or factors.
In addition to projections on the factorial axes, in Correspondence Analysis, we also consider the contributions to the inertia, and the correlations (of rows, or of columns, with the factors).
Analysis of the dual spaces, and supplementary elements
The factors in the two spaces, of rows/observations and of columns/attributes, are inherently related. Each row (tweet) coordinate in the factor space is defined by the barycentre (or centre of gravity) of the coordinates of the column (word) coordinates; and vice versa. Not only can we pass from one cloud to the other, but the two clouds (of rows, and of columns) are displayable on the same graphic output. This is because the two clouds that are endowed with the \(\chi ^2\) distance to start with, are projected into (or embedded in) the factor space. The factor space, as noted above, is endowed with the Euclidean distance. The Euclidean distance is particularly appropriate for display or visualization.
Qualitatively different elements (i.e. row or column profiles), or ancillary characterization or descriptive elements may be placed as supplementary elements. This means that they are given zero mass in the analysis, and their projections are determined using the transition formulas. This amounts to carrying out a correspondence analysis first, without these elements, and then projecting them into the factor space following the determination of all properties of this space.
In summary
Correspondence analysis is thus the inertial decomposition of the dual clouds of weighted points. It is a latent semantic decomposition, where the role of the term frequency and inverse document frequency (TFIDF) weighting scheme is instead through the use of (i) profiles and masses, (ii) with the \(\chi ^2\) distance. See Séguéla and Saporta (2011) for a discussion of both methods, correspondence analysis and latent semantic indexing. Further background description on correspondence analysis can be found in (Benzécri 1979, 1994), (Le Roux and Rouanet 2004; Murtagh 2005).
Rights and permissions
About this article
Cite this article
Murtagh, F., Pianosi, M. & Bull, R. Semantic mapping of discourse and activity, using Habermas’s theory of communicative action to analyze process. Qual Quant 50, 1675–1694 (2016). https://doi.org/10.1007/s1113501502287
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1113501502287