Abstract
Text data constitutes a significant part of all data generated on the Internet, including the social network users’ comments and posts. Each website offers its users different functionalities. LinkedIn mainly focuses on the labor market as well as professional and business contacts, and Facebook offers the possibility of creating groups as well as photo and message sharing with friends, while Twitter allows short text message posting and tracking. One type of information researchers would like to obtain about the users of these portals is their age. Such information is crucial from the perspective of marketing, social and economic research. Each of the social networks, however, has different rules regarding the privacy policy and the publishing of information about the date of birth. This poses a problem for the researchers who would like to obtain such information. The aim of the research presented is to attempt characterization of the words typically used in the messages published by Twitter users. This social networking site was chosen due to the possibility of downloading data without additional user consent. Text mining methods and techniques were used to carry out the research, which was mainly focused on the analysis of individual words and collocations occurring in the users’ tweets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The term “Silent Generation” first appeared on Novemebr 5, 1951 in the “Time” magazine in the article The younger generation.
- 2.
X is the number of years that has been declared by the users.
- 3.
The difference in the number of users results from the interval between the tweet download and the metadata. During that time, the usernames could be changed, user accounts could be deleted or blocked, which resulted in the smaller number of users in the database.
References
Aggarwal CC, Zhai C (2012) Mining text data. In Springer Science+Business Media, LLC 2012. https://doi.org/10.1007/978-1-4614-3223-4
Baker FB, Hubert LJ (1975) Measuring the power of hierarchical cluster analysis. J Am Statist Assoc 70(349):31–38
Balicki A (2009) Statystyczna analiza wielowymiarowa i jej zastosowania społeczno-ekonomiczne. Wydawnictwo Uniwersytetu Gdańskiego, Gdańsk
Brosdahl DJ, Carpenter JM (2011) Shopping orientations of US males: a generational cohort comparison. J Retail Consum Serv 18(6):548–554. https://doi.org/10.1016/j.jretconser.2011.07.005
Chamberlain BP, Humby C, Deisenroth MP (2017) Probabilistic inference of twitter users age based on what they follow. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 10536 LNAI, pp 191–203. https://doi.org/10.1007/978-3-319-71273-4_16
Costanza DP, Badger JM, Fraser RL, Severt JB, Gade PA (2012) Generational differences in work-related attitudes: a meta-analysis. J Bus Psychol 27(4):375–394. https://doi.org/10.1007/s10869-012-9259-4
Diestel R (2017) The basics. In: Graph theory-graduate texts in mathematics, vol 173. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53622-3_1
Dilthy W (1924) Gesammelte Schriften 5: 37. Polish edition: Dilthy W (1924) Rozwój problemu pokolenia (trans: Wyka K). Warszawa
Dunn JC (1974) Well-separated clusters and optimal fuzzy partitions. J Cybern 4(1):95–104. https://doi.org/10.1080/01969727408546059
Fisher TF, Crabtree JL (2009) Generational cohort theory: have we overlooked an important aspect of the entry-level occupational therapy doctorate debate? Am J Occup Ther 63(5):656–660. https://doi.org/10.5014/ajot.63.5.656
Florek K, Łukaszewicz J, Perkal J, Steinhaus H, Zubrzycki S (1951) Taksonomia wrocławska. Przegląd Antropologiczny 17:193–211
Goodman LA, Kruskal WH (1954) Measures of association for cross classifications. J Am Statist Assoc 49(268):732–764
Gower JC (1967) A comparison of some methods of cluster analysis. Biometrics 23(4):623–638
Hellberg S (1972) Computerized iemmatization without the use of a dictionary: a case study from swedish lexicology. Computers and the Humanities, 6(4):209–212. https://doi.org/10.1007/BF02404268
Hubert LJ (1974) Approximate evaluation techniques for the single-link and complete-link hierarchical clustering procedures. J Am Statist Assoc 69(347):698–704
Hull DL (1970) Contemporary systematic philosophies. Annu Rev pf Ecol Systemat 1:19–54. https://doi.org/10.1146/annurev.es.01.110170.000315
Jambu M (1978) Classification automatiqe pour lˋanalyse des donnees, vol 1. Dunod, Paris
Kruskal JB (1964) Nonmetric multidimensional scaling: a numerical method. Psychometrika 29(2):115–129
Lance GN, Williams WT (1966) A generalized sorting strategy for computer classifications. Nature 212, 218, Letters to Nature
Lovins JB (1968) Development of a stemming algorithm*. Mechanical translation and computational linguistics
Luhn HP (1957) A statistical approach to mechanized encoding and searching of literary information. IBM J Res Dev 1(4):309–317. https://doi.org/10.1147/rd.14.0309
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability 1. University of California Press, Berkeley, pp 281–297
Macky K, Gardner D, Forsyth S (2008) Generational differences at work: introduction and overview. J Manag Psychol 23(8):857–861. https://doi.org/10.1108/02683940810904358
McQuitty LL (1960) Hierarchical linkage analysis for the isolation of types. Educ Psychol Measur 20(1):55–67
McQuitty LL (1966) Similarity analysis by reciprocal pairs for discrete and continuous data. Educ Psychol Measur 26(4):825–831
McQuitty LL (1967) Expansion of similarity analysis by reciprocal pairs for discrete and continuous data. Educ Psychol Measur 27(2):253–255
Migdał-Najman K, Najman K (2013) Samouczące się sztuczne sieci neuronowe w grupowaniu i klasyfikacji danych. Teoria i zastosowania w ekonomii, Wydawnictwo Uniwersytetu Gdańskiego, Gdańsk
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179. https://doi.org/10.1007/BF02294245
Mills AJ, Plangger K (2015) Social media strategy for online service brands. Serv Ind J 35(10):521–536. https://doi.org/10.1080/02642069.2015.1043277
Mirkin BG (1996) Mathematical classification and clustering. Kluwer Academic Publishers, Dordrecht, The Netherlands
Mojena R (1977) Hierarchical grouping methods and stopping rules: an evaluation. Comput J 20:359–363. https://doi.org/10.1093/comjnl/20.4.359
Pociecha J, Podolec B, Sokołowski A, Zając K (1988) Metody taksonomiczne w badaniach społeczno-ekonomicznych. Wydawnictwo Naukowe PWN, Warszawa
Pratama BY, Sarno R (2016) Personality classification based on Twitter text using Naive Bayes, KNN and SVM. In: Proceedings of 2015 international conference on data and software engineering, ICODSE 2015, pp 170–174. https://doi.org/10.1109/icodse.2015.7436992
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Ruth N, Bolton A, Parasuraman A (2013) Understanding generation Y and their use of social media: a review and research agenda. J Serv Manag 24(3):245–267
Ryder NB (1965) The cohort as a concept in the study of social change. Am Sociol Rev 30(6):843–861. https://doi.org/10.2307/2090964
Salton G, Yang CS (1973) On the specification of term values in automatic indexing. Cornell University
Shannon CE (1951) Prediction and entropy of printed english. Bell System Technical Journal, 30(1):50–64.https://doi.org/10.1002/j.1538-7305.1951.tb01366.x
Sneath PHA (1957) The application of computers to taxonomy. J Gen Microbiol 17(1):201–226
Sneath PH, Sokal RR (1963) Priciples of numerical taxonomy. Freeman, San Fancisco, London
Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. University of Kansas, Scientific Bulletin 38:1409–1438
Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. TAXON Wiley 11(2):33–40. https://doi.org/10.2307/1217208
Spärck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Documentation 28(1):11–21. https://doi.org/10.1108/00220410410560573
Strauss W, Howe N (1991) Generations. The history of America’s future, 1584 to 2069. William Morrow and Company, Inc., New York
Tuteja SK, Bogiri N (2017) Email Spam filtering using BPNN classification algorithm. In: International conference on automatic control and dynamic optimization techniques, ICACDOT 2016. Institute of Electrical and Electronics Engineers Inc., pp 915–919. https://doi.org/10.1109/icacdot.2016.7877720
Wallis M (1959) Koncepcje biologiczne w humanistyce. In: Kotarbiński T (ed) Fragmenty filozoficzne vol. 2. Warszawa
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Statist Assoc 58(301):236–244
Watanabe NM, Kim J, Park J (2021) Social network analysis and domestic and international retailers: an investigation of social media networks of cosmetic brands. J Retail Consum Serv 58:102301. https://doi.org/10.1016/j.jretconser.2020.102301
Wątroba W (2017) Transgresje międzypokoleniowe późnego kapitalizmu. Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu, Wrocław
Wątroba W (2019) Transgresywność systemów wartości pokoleń we współczesnym kapitalizmie. Folia Oeconomica, Acta Universitatis Lodziensis 5(344):139–157. https://doi.org/10.18778/0208-6018.344.09
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Majkowska, A., Migdał-Najman, K., Najman, K., Raca, K. (2021). Identification of the Words Most Frequently Used by Different Generations of Twitter Users. In: Jajuga, K., Najman, K., Walesiak, M. (eds) Data Analysis and Classification. SKAD 2020. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-75190-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-75190-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75189-0
Online ISBN: 978-3-030-75190-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)