Abstract
The chaptersĀ of this book are concerned with learning of the evolution of ideas (theories, concepts, methods, and application domains) and of the history of a discipline, by means of the temporal evolution of word occurrences in papers published by scientific journals. The work carried out for each of the areas involved in the project (philosophy, sociology, psychology, linguistics, statistics) pursued different objectives: to obtain a first overview of the relationship between time and contents in order to observe latent temporal patterns; to identify relevant keywords; to cluster keywords portraying similar temporal patterns; to identify latent dynamics of cluster keywords; and to identify relevant topics as groups of related words. The contributions identified and analysed the main subject matters that, at the time of publication, were considered relevant by mainstream journals and offer new viewpoints to read and understand the evolution of a discipline. The interdisciplinary debate triggered by this research work is innovative because quantitative methods for text analysis have been used in areas of human and social sciences, which are traditionally studied through qualitative approaches, and also represents a positive experience since new paths have been explored by pooling together the qualitative and quantitative research methods, traditions, and expertise of different disciplines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C. C., & Zhai, C. (2012). Mining text data. New York: Springer.
Angelini, A., Canditiis, D. D., & Pensky, M. (2012). Clustering time-course microarray data using functional bayesian infinite mixture model. Journal of Applied Statistics, 39(1), 129ā149.
Baayen, R. H. (2001). Word frequency distributions. Dordrecht: Kluwer Academic Publishers.
Beaudouin, V. (2016). Statistical analysis of textual data: BenzĆ©cri and the French School of Data Analysis. Glottometrics, 33, 56ā72.
Berry, M. W. (Ed.). (2004). Survey of text mining. Clustering, classification, and retrieval. New York: Springer-Verlag.
Berry, M. W., & Kogan, J. (2010). Text mining: Applications and theory. Chichester: Wiley Online Library.
Bhattacharya, S., & Basu, P. K. (1998). Mapping a research area at the micro level using co-word analysis. Scientometrics, 43(3), 359ā372.
Blei, D. M., Ng, A. Y., & Jordan, M. (2003). Latent Dirichlet allocation. The Journal of Machine Learning Research, 3, 993ā1022.
Bolasco, S. (2005). Statistica testuale e text mining: alcuni paradigmi applicativi. Quaderni di Statistica, 7, 17ā53.
Bolasco, S. (2013). L'analisi automatica dei testi. Fare ricerca con il text mining. Roma: Carocci.
CahlĆk, T., & JiÅina, M. (2006). Law of cumulative advantages in the evolution of scientific fields. Scientometrics, 66(3), 441ā449.
Chavalarias, D., & Cointet, J. P. (2008). Bottom-up scientific field detection for dynamical and hierarchical science mapping, methodology and case study. Scientometrics, 75(1), 37ā50.
Chavalarias, D., & Cointet, J. P. (2013). Phylomemetic patterns in science evolutionĀ ā The rise and fall of scientific fields. PLoS One, 8(2), e54847.
Cobo, M., LĆ³pez-Herrera, A., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146ā166.
Cobo, M., LĆ³pez-Herrera, A., Herrera-Viedma, E., & Herrera, F. (2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology, 63(8), 1609ā1630.
Coffey, N., Hinde, J., & Holian, E. (2014). Clustering longitudinal profiles using P-splines and mixed effects models applied to time-course gene expression data. Computational Statistics & Data Analysis, 71, 14ā29.
Cortelazzo, M. A., & Tuzzi, A. (Eds.). (2007). Messaggi dal Colle. I discorsi di fine anno dei presidenti della Repubblica. Venezia: Marsilio Editori.
Cretchley, J., Rooney, D., & Gallois, C. (2010). Mapping a 40-year history with leximancer: Themes and concepts in the journal of cross-cultural psychology. Journal of Cross-Cultural Psychology, 41(3), 318ā328.
Dister, A., LongrĆ©e, D., & Purnelle, G. (Eds.). (2012). JADT 2012 Actes des 11es JournĆ©es internationales dāanalyse statistique des donnĆ©es textuelles. LiĆØge/Bruxelles: LASLAĀ ā SESLA.
Diwersy, S., & Luxardo, G. (2016). Mettre en Ʃvidence le temps lexical dans un corpus de grandes dimensions: l'exemple des dƩbats du Parlement europƩen. In D. Mayaffre, C. Poudat, L. Vanni, V. Magri, & P. Follette (Eds.), JADT 2016 - proceedings of 13th international conference on statistical analysis of textual data. Nice: Pressess de Fac Imprimeur France.
Giacofci, M., Lambert-Lacroix, S., Marot, G., & Picard, F. (2013). Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics, 69(1), 31ā40.
Greenacre, M. J. (1984). Theory and application of correspondence analysis. London: Academic Press.
Greenacre, M. J. (2007). Correspondence analysis in practice. London: Chapman & Hall.
Gries, S. T., & Hilpert, M. (2008). The identification of stages in diachronic data: Variability-based neighbour clustering. Corpora, 3(1), 59ā81.
Gries, S. T., & Hilpert, M. (2012). Variability-based neighbor clustering: A bottom-up approach to periodization in historical linguistics. In T. Nevalainen & E. Traugott (Eds.), The Oxford handbook of the history of English (pp. 134ā144). Oxford: Oxford University Press.
Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America (PNAS), 101(Supplement 1), 5228ā5235.
GuĆ©rin-Pace, F., Saint-Julien, T., & Lau-Bignon, A. W. (2012). The words of LāEspace gĆ©ographique: A lexical analysis of the titles and keywords from 1972 to 2010. Espace gĆ©ographique, 41(1), 4ā31.
Hall, D., Jurafsky, D., & Manning, C. D. (2008). Studying the history of ideas using topic models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 363ā371.
Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical learning: Data mining, inference and prediction (2nd ed.). New York: Springer-Verlag.
Hilpert, M., & Gries, S. T. (2009). Assessing frequency changes in multi-stage diachronic corpora: Applications for historical corpus linguistics and the study of language acquisition. Literary and Linguistic Computing, 24(4), 385ā401.
Jacques, J., & Preda, C. (2014a). Model-based clustering for multivariate functional data. Computational Statistics & Data Analysis, 71, 92ā106.
Jacques, J., & Preda, C. (2014b). Functional data clustering: A survey. Advances in Data Analysis and Classification, 8(3), 231ā255.
James, G. M., & Sugar, C. A. (2003). Clustering for sparsely sampled functional data. Journal of the American Statistical Association, 98, 397ā408.
Johnstone, I. M., & Titterington, D. M. (2009). Statistical challenges of high-dimensional data. Philosophical Transactions of the Royal Society A, 367(1906), 4237ā4253.
Kao, A., & Poteet, S. R. (Eds.). (2007). Natural language processing and text mining. London: Springer-Verlag.
Kelih, E., Knight, R., MaÄutek, J., & Wilson, A. (Eds.). (2016). Issues in quantitative linguistics 4. Studies in quantitative linguistics (Vol. 23). LĆ¼denscheid: RAM-Verlag.
Kƶhler, R. (2011). Laws of languages. In P. C. Hogan (Ed.), The Cambridge encyclopedia of the language science (pp. 424ā426). Cambridge: Cambridge University Press.
Kƶhler, R. (2012). Quantitative syntax analysis. Berlin: De Gruyter.
Kƶhler, R., & Galle, M. (1993). Dynamic aspects of text characteristics. In L. HrebĆcek & G. Altmann (Eds.), Quantitative text analysis (pp. 46ā53). Trier: Wissenschaftlicher.
Koplenig, A. (2017). A data-driven method to identify (correlated) changes in chronological corpora. Journal of Quantitative Linguistics, 24(4), 289ā318.
Lebart, L., Morineau, A., & Warwick, K. M. (1984). Multivariate descriptive statistical analysis: Correspondence analysis and related techniques for large matrices. Applied probability and statistics. Chichester: Wiley.
Lebart, L., Salem, A., & Berry, L. (1998). Exploring textual data. Boston: Kluwer Academic Publication.
Lee, S. X., & McLachlan, G. J. (2013). Model-based clustering and classification with non-normal mixture distributions. Statistical Methods & Applications, 22(4), 427ā454.
LĆ©on, J., & Loiseau, S. (Eds.). (2016). History of quantitative linguistics in France. LĆ¼denscheid: RAM-Verlag.
Maggioni, M. A., Gambarotto, F., & Uberti, T. E. (2009). Mapping the evolution of āClustersā: A meta-analysis. FEEM working paper no. 74.2009.
Mayaffre, D., Poudat, C., Vanni, L., Magri, V., & Follette, P. (Eds.). (2016). JADT 2016 - Proceedings of 13th International Conference on Statistical Analysis of Textual Data, Nice 7-10 giugno 2016. Nice: Pressess de Fac Imprimeur France.
Michel, J. B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., The Google Books Team, et al. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331(6014), 176ā182.
Mikros, G. K., & MaÄutek, J. (Eds.). (2015). Sequences in language and text. Berlin/Boston: Walter De Gruyter.
Moretti, F. (2013). Distant reading. London: Verso/New Left Books.
Murtagh, F. (2005). Correspondence analysis and data coding with java and R. London: Chapman & Hall/CRC.
Murtagh, F. (2010). The correspondence analysis platform for uncovering deep structure in data and information, sixth Boole lecture. Computer Journal, 53(3), 304ā315.
Murtagh, F. (2017). Big data scaling through metric mapping: Exploiting the remarkable simplicity of very high dimensional spaces using correspondence analysis. In F. Palumbo, A. Montanari, & M. Vichi (Eds.), Data science - innovative developments in data analysis and clustering (pp. 295ā306). Cham: Springer.
Naumann, S., Grzybek, P., VulanoviÄ, R., & Altmann, G. (Eds.). (2012). Synergetic linguistics. Text and language as dynamic systems. Vienna: Praesens Verlag.
NĆ©e, Ć., Daube, J.-M., Valette, M., & Fleury, S. (Eds.). (2014). Actes des 12e JournĆ©es internationales d'analyse statistique des donnĆ©es textuelles (JADT 2014), 3ā6 juin 2014, Paris (Actes Ć©lectroniques).
ObradoviÄ, I., Kelih, E., & Kƶhler, R. (Eds.). (2013). Methods and applications of quantitative linguistics: Selected papers of the VIIIth International Conference on Quantitative Linguistics (QUALICO), Belgrade, Serbia, April 16ā19, 2012, Akademska Misao, Belgrado, Serbia.
PawÅowski, A. (2006). Chronological analysis of textual data from the WrocÅaw Corpus of Polish. PoznaÅ Studies in Contemporary Linguistics, 41, 9ā29.
PawÅowski, A. (2016). Chronological corpora: Challenges and opportunities of sequential analysis. The example of ChronoPress corpus of Polish. Digital Humanities (pp. 311ā313).
PawÅowski, A., Krajewski, M., & Eder, M. (2010). Time series modelling in the analysis of homeric verse. Eos, 97(2), 79ā100.
Popescu, I.-I., Macutek, J., & Altmann, G. (2009). Aspects of word frequencies. Studies in quantitative linguistics. Ludenscheid: RAM.
Popescu, I.-I. (2009). Word frequency studies. Berlin: Mouton De Gruyter.
Popescu, O., & Strapparava, C. (2014). Time corpora: Epochs, opinions and changes. Knowledge-Based Systems, 69, 3ā13.
Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719ā745.
Ramsay, J., & Silverman, B. W. (2005). Functional data analysis (Springer series in statistics). New York: Springer.
Ratinaud, P., & Marchand, P. (2012). Application de la mĆ©thode ALCESTE Ć de āgrosā corpus et stabilitĆ© des āmondes lexicauxā: analyse du āCableGateā avec IRaMuTeQ. In Actes des 11eme JournĆ©es internationales dāAnalyse statistique des DonnĆ©es Textuelles (pp. 835ā844). LiĆØge, Belgique.
Ray, S., & Mallick, B. (2006). Functional clustering by bayesian wavelet methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(2), 305ā332.
Reinert, M. (1983). Une methode de classification descendante hierarchique: application a lāanalyse lexicale par context. Les Cahiers de lāAnalyse des DonnĆ©es, 8(2), 187ā198.
Reinert, M. (1990). ALCESTE: Une mĆ©thodologie d'analyse des donnĆ©es textuelles et une application: AurĆ©lia de GĆ©rard de Nerval. Bulletin de MĆ©thodologie Sociologique, 26, 24ā54.
Reinert, M. (1993). Les āmondes lexicauxā et leur ālogiqueā Ć travers lāanalyse statistique dāun corpus de rĆ©cits de cauchemars. Language et SociĆ©tĆ©, 66, 5ā39.
Rodriguez, A., Dunson, D. B., & Gelfand, A. E. (2009). Bayesian nonparametric functional data analysis through density estimation. Biometrika, 96(1), 149ā162.
Sahami, A., & Srivastava, M. (Eds.). (2009). Text mining: Theory and applications. London: Taylor and Francis.
Salem, A. (1988). Approches du temps lexical. Statistique textuelle et sĆ©ries chronologiques. Mots. Les langages du politique, 17, 105ā114.
Salem, A. (1991). Les sĆ©ries textuelles chronologiques. Histoire & Mesure, VI-1(2), 149ā175.
Sanger, J., & Feldman, R. (2007). The text mining handbook: Advanced approaches in analyzing unstructured data. Cambridge: Cambridge University Press.
Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68(3), 595ā610.
Sullivan, D. (2001). Document warehousing and text mining: Techniques for improving business operations. Wiley: Marketing and Sales.
Tibshirani, R., Wainwright, M., & Hastie, T. (2015). Statistical learning with sparsity: The lasso and generalizations. New York: Chapman and Hall/CRC.
Trevisani, M., & Tuzzi, A. (2015). A portrait of JASA: The history of statistics through analysis of keyword counts in an early scientific journal. Quality and Quantity, 49, 1287ā1304.
Trevisani, M., & Tuzzi, A. (2018). Learning the evolution of disciplines from scientific literature. A functional clustering approach to normalized keyword count trajectories. Knowledge-Based Systems, 146, 129ā141.
Tuzzi, A. (2012). Reinhard Kƶhlerās scientific production: Words, numbers and pictures. In S. Naumann, P. Grzybek, R. VulanoviÄ, & G. Altmann (Eds.), Synergetic linguistics. Text and language as dynamic systems (pp. 223ā242). Vienna: Praesens Verlag.
Tuzzi, A., BenesovĆ”, M., & Macutek, J. (Eds.). (2015). Recent contributions to quantitative linguistics. Berlin: De Gruyter.
Tuzzi, A., & Kƶhler, R. (2015). Tracing the history of words. In A. Tuzzi, M. BenesovĆ”, & J. Macutek (Eds.), Recent contributions to quantitative linguistics (pp. 203ā214). Berlin: DeGruyter.
Van Den Besselaar, P., & Heimeriks, G. (2006). Mapping research topics using word-reference co-occurrences: A method and an exploratory case study. Scientometrics, 68(3), 377ā393.
Wang, J. L., Chiou, J. M., & Mueller, H. G. (2016). Functional data analysis. Annual Review of Statistics and Its Application, 3(1), 257ā295.
Wang, L., Kƶhler, R., & Tuzzi, A. (Eds.). (2018). Structure, Function and Process in Texts. LĆ¼denscheid: RAM-Verlag.
Weiss, S. M., Indurkhya, N., Zhang, T., & Damerau, F. (2005). Text mining: Predictive methods for analyzing unstructured information. New York: Springer.
Yin, Y., & Wang, D. (2017). The time dimension of science: Connecting the past to the future. Journal of Informetrics, 11, 608ā621.
Zhang, Y., Chen, H., Lu, J., & Zhang, G. (2017). Detecting and predicting the topic change of knowledge-based systems: A topic-based bibliometric analysis from 1991 to 2016. Knowledge Based System, 133(Supplement C), 255ā268.
Zhang, Y., Zhang, G., Chen, H., Porter, A. L., Zhu, D., & Lu, J. (2016). Topic analysis and forecasting for science, technology and innovation: Methodology with a case study focusing on big data research. Technological Forecasting and Social Change, 105, 179ā191.
Acknowledgements
To the members of the research team and co-authors of this book, which I had the honour to lead and coordinate, go all my respect and gratitude for having chosen to follow me in this challenging adventure and to join the small group of brave researchers who for some time shared my interest in this matter. I would like to recognize the open minds of our most senior colleagues, and their vision and desire to get involved on truly exceptional, unfamiliar terrain, and I am very satisfied with the work of my younger colleagues for the desire to learn which they have shown, and for the great enthusiasm that they dedicated to the project and for having become the real āresearch engineā of the group.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1.1 A Brief Overview on Correspondence Analysis
Correspondence Analysis (CA) is an Explorative Data Analysis (EDA) that has proven useful in studying the conjoint distribution of two (or more) categorical variables. CA portrays the existing structure of association between two (or more) variables by means of simple plots that position the categories of the variables on a plane.
The quantitative perspective adopted by the contributionsĀ of this volume are based on words and word counts, i.e. they are based on the observation of occurrences of relevant keywords over time. In this perspective, CA can be exploited to achieve a content mapping as it is useful to represent the system of relationships among years (e.g. volumes of the journals), among words (e.g. relevant keywords), and between years and words. Although CA is not able to describe all relevant linguistic features of a set of texts, it contributes to highlight latent patterns. For example, in our case, it makes it possible to verify whether the volumes of a journal expressed a clear temporal pattern in their main contents.
In the simplest version, CA works on a two-way contingency table in which the rows represent keywords (e.g. m word-types w1, ā¦, wm) and columns represent the volumes of the journal (e.g. p time-points t1, ā¦, tp). Each cell of this (lexical) contingency table represents the number nij of occurrences of the i-th keyword (the i-th row) in the volume published at the j-th time-point (the j-th column) (Table 1.1).
CA provides the best simultaneous representation of row profiles and column profiles on each axis (and on each plane generated by a pair of axes). The purpose of the CA is to translate the similarities between categories (words and volumes) in a graph in which the most similar categories are placed in adjacent positions in the space defined by the Cartesian axes. If you look at the words, it is fairly intuitive to think that the similarity between two words depends on how much the occurrences in the two rows of the table āresemble each otherā, that is, how similar they are in terms of presence, absence, or occurrence in the journal volumes: if two words tend to be used in the same volumes and with similar frequency, they have a similar profile over time. Two words with an identical profile will have no distance between them, that is, they will be represented on a graph as two overlapping points.
The intuitive notion of similarity between the profiles of two words wi and wk is translated into a distance (chi-square distance) that can be calculated for each pair of words:
\( {d}_{ik}^2=\sum \limits_{j=1}^p\frac{n}{n_{.j}}{\left(\frac{n_{ij}}{n_{i.}}-\frac{n_{kj}}{n_{k.}}\right)}^2 \)
All the reasoning can be repeated by taking into consideration the similarity between pairs of volumes and considering the profiles of the two columns. Two volumes of the journal (time-points tj and tk) resemble each other if they have a similar lexical profile, i.e. if they include the same words with a similar relative frequency (Fig. 1.1).
The distance between two time-points tj and tk is given as:
\( {d}_{jk}^2=\sum \limits_{i=1}^m\frac{n}{n_{i.}}{\left(\frac{n_{ij}}{n_{.j}}-\frac{n_{ik}}{n_{.k}}\right)}^2 \)
From another viewpoint, the rows and the columns of this matrix are considered as vectors, i.e. as points in a multidimensional space, and the distance between two vectors is measured through a weighted Euclidian distance that compares the corresponding lexical profiles taking into account the size of the subcorpora (volumes) at each time-point and the occurrences of each word in the corpus as a whole.
Following the calculation of the pairwise distance for words and for volumes, the next step is to transform the space generated by the original variables in a Euclidean space generated by new orthogonal variables (components or axes). The multidimensional space generated by the matrix is reduced to orthogonal dimensions (axes) that are displayed as Cartesian axes. The number of dimensions of this new space (i.e. the number of orthogonal axes) is equal to the number of linearly independent variables (rank of the matrix) that, in our context, is the number of time-points minus one (p ā 1, more generally min(m, p) ā 1).
The starting point of this transformation are the square matrix mĀ ĆĀ m which contains the pairwise distances between words and the square matrix pĀ ĆĀ p with the pairwise distances between volumes. The calculation of the coordinates of each axis is based on the singular value decomposition (SVD). The orthogonal factorial axes are sorted according to the amount of inertia collected (according to degree of association), i.e. they are in order of relevance: the first is the most important axis and the one which collects the highest portion of the information contained in the contingency table, the second axis is the one which collects the highest portion of information not explained by the first axis and so on. The Cartesian plane constructed with the first two factorial axes is the two-dimensional space which best represents the structure of association shown in the contingency table on a low-dimensional Euclidean space, and so on.
Unlike other analyses that move from the analysis of a matrix cases Ć variables, in CA the contingency table can be read in two ways: as m row vectors in the p-1 dimensions space generated by the columns, i.e. m words in the space of p time-pointsĀ (volumes), and as p column vectorsĀ in the m-1 dimensions space generated by the rows, i.e. p time-points in the space of m words. From this observation, there is the immediate possibility to obtain two graphs separately: one with the words and one with the volumes. For the geometric properties of the two spaces (duality), the dimensions are the same and the two graphs overlap. This makes it possible to observe the system of relations between all the categories in play; although we must be very careful in the interpretation of the joint graphical representation of the two variables. In order to briefly summarize the elements for reading the graphs obtained from CA, we should remember that the position where a word or a volume is found assumes a role only in the globally created context of the graph, i.e. it doesnāt have any meaning by itself, but it does have meaning in comparison with the positions taken by all the other points found in the solution with respect to the barycentre at the origin of the axes. If two words are close on the graph, it means that they have similar profiles and, analogously, if two volumes are close they have similar lexical profiles. The mutual position assumed by a word and a volume cannot be evaluated in a direct manner and must be evaluated with reference to the positions assumed by all the other elements. In this sense, it is useful to use the quadrants of the Cartesian plane and, thanks to the axes, the proximity can be evaluated by taking into account the angle formed by the axes (the more similar the angle formed with the axes is, the more they can be considered associated). The words or the volumes that contributed the most to the solution and which, therefore, can be considered the most important in the reconstructed context of the graph, are those which are distant from the origin of the axes. The densification of modalities in an area of the graph that stands out from the rest as a cluster might be interpreted as a semantic area and for this purpose one often choses to partition into clusters. The clusters of words or volumes should be homogeneous as much as possible within the group and, as much as possible, heterogeneous within groups. In the analysis of the lexical contingency table, a cluster analysis based on the CA groups together the volumes based on the lexical similarity (which is usually also visible in terms of proximity of the points on the graph).
1.1.2 An Example
To understand the functioning of the CA, an application example of a very simplified fictional corpus might be useful. Suppose you have 11 texts that include the topics of a journal of the statistical field and constitute a small text corpus:
-
text01 regression analysis; linear regression
-
text02 regression model; linear and non-linear model
-
text03 generalized linear model; parameter estimation
-
text04 sampling methods; random sampling; survey design and sampling methods
-
text05 survey design; finite populations
-
text06 methods for sampling elusive populations
-
text07 Normal distribution
-
text08 z-scores and Normal distribution
-
text09 Gamma distribution
-
text10 p-value: Normal distribution and Gammaāexponential family
-
text11 regression analysis; Normal distribution
There are 53 word-tokens and 25 word-types in the corpus. Taking into account only the words that are repeated at least twice, namely distribution (5 occurrences) and, linear, Normal, regression, and sampling (4), methods and model (3), analysis, design, Gamma, populations, and survey (2), we can construct a contingency table words Ć texts (Table 1.2), in which we see, for example, that the word survey was used once each by texts 04 and 05.
The CA of the contingency table results in 10 factorial axes. The first two axes collect 55% of the information (explained inertia) and the first factorial plane is shown in Fig. 1.2.
Figure 1.2 shows very well the three latent patterns present in the texts that refer to linear model (regression, analysis), sampling methods (survey design, populations), and distribution (Normal, Gamma). Texts 01, 02, and 03 can be found together in the area of linear model (second quadrant, upper left) while texts 07, 08, 09, and 10 in the area of distribution (third quadrant, bottom left). Text 11 is somewhere between linear models and distributions areas because it includes both topics. In the area of sampling methods (first quadrant, on the left), there are the texts 04, 05, and 06. It is interesting to note the conjunction and which is found near the origin of the axes because it has been used in different contexts (though slightly more often used by those who talked about distributions).
Rights and permissions
Copyright information
Ā© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Tuzzi, A. (2018). Introduction: Tracing the History of a Discipline Through Quantitative and Qualitative Analyses of Scientific Literature. In: Tuzzi, A. (eds) Tracing the Life Cycle of Ideas in the Humanities and Social Sciences. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-97064-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-97064-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97063-9
Online ISBN: 978-3-319-97064-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)