The emergence of digital social networks has transformed society, social groups, and institutions in terms of the communication and expression of their opinions. Determining how language variations allow the detection of communities, together with the relevance of specific vocabulary (proposed by the National Council of Accreditation of Colombia (Consejo Nacional de Acreditación - CNA) to determine the quality evaluation parameters for universities in Colombia) in digital assemblages could lead to a better understanding of their dynamics and social foundations, thus resulting in better communication policies and intervention where necessary. The approach presented in this paper intends to determine what are the semantic spaces (sociolinguistic features) shared by social groups in digital social networks. It includes five layers based on Design Science Research, which are integrated with Natural Language Processing techniques (NLP), Computational Linguistics (CL), and Artificial Intelligence (AI). The approach is validated through a case study wherein the semantic values of a series of “Twitter” institutional accounts belonging to Colombian Universities are analyzed in terms of the 12 quality factors established by CNA. In addition, the topics and the sociolect used by different actors in the university communities are also analyzed. The current approach allows determining the sociolinguistic features of social groups in digital social networks. Its application allows detecting the words or concepts to which each actor of a social group (university) gives more importance in terms of vocabulary.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Dumbill E. A revolution that will transform how we live, work, and think: An interview with the authors of big data. Big data. 2013;1(2):73–7.
Meyerhoff M. Introducing sociolinguistics. Taylor & Francis Group: Routledge; 2015.
Meyerhoff M. Introducing sociolinguistics. Routledge; 2018.
Scott J. Social network analysis: developments, advances, and prospects. Social network analysis and mining. 2011;1(1):21–6.
Zeinab Kafi, Khalil Motallebzadeh. An introduction to sociolinguistics. International Journal of Society, Culture & Language. 2016;4(2):134–40.
Bryden J, Funk S, Jansen VA. Word usage mirrors community structure in the online social network twitter. EPJ Data Science, 2013;2(1):3.
Ríos SA, Muñoz R. Dark web portal overlapping community detection based on topic models. In Proceedings of the ACM SIGKDD workshop on intelligence and security informatics. 2012. p. 1–7.
Nguyen D. A Seza Doğruöz, Carolyn P Rosé, and Franciska de Jong. Computational sociolinguistics: A survey Computational linguistics. 2016;42(3):537–93.
Reynolds WN, Salter WJ, Farber RM, Corley C, Dowling CP, Beeman WO, et al. Sociolect-based community detection. In 2013 IEEE International Conference on Intelligence and Security Informatics. 2013. p. 221–226, IEEE.
Mansouri F, Abdelalim S, Ikram EA. A modeling framework for the moroccan sociolect recognition used on the social media. In Proceedings of the 2nd international Conference on Big Data, Cloud and Applications. ACM. 2017. p. 34.
Gibson KR. Tool use, language and social behavior in relationship to information processing capacities. Tools, language and cognition in human evolution. 1993. p. 251-269.
K Adnan, R Akbar. An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data. 2019;6(1):91.
Louwerse MM. Semantic variation in idiolect and sociolect: Corpus linguistic evidence from literary texts. Computers and the Humanities. 2004;38(2):207–21.
Paradis RD, Davenport D, Menaker D, Taylor SM. Detection of groups in non-structured data. Procedia Computer Science. 2012;12:412–7.
A Hussain, E Cambria. Semi-supervised learning for big social data analysis. Neurocomputing. 2018;275:1662–733.
Li L, Wu L, Evans JA. Social centralization and semantic collapse: Hyperbolic embeddings of networks and text. CoRR, abs/2001.09493, 2020.
Balaanand M, Karthikeyan N, Karthik S, Varatharajan R, Manogaran G, Sivaparthipan C. An enhanced graph-based semi-supervised learning algorithm to detect fake users on twitter. The Journal of Supercomputing. 2019;75(9):6085–105.
Cavallari S, Cambria E, Cai H, Chang KC, Zheng VW. Embedding both finite and infinite communities on graphs [application notes]. IEEE Computational Intelligence Magazine. 2019;14(3):39–50.
H Fani, E Jiang, E Bagheri, F Al-Obeidat, W Du, M Kargar. User community detection via embedding of social network structure and temporal content. Information Processing & Management. 2020;57(2):102056.
Park C, Han J, Yu H. Deep multiplex graph infomax: Attentive multiplex network embedding using global information. Knowledge-Based Systems. 2020. p.105861.
Liu P, Zhang L, Gulla JA. Real-time social recommendation based on graph embedding and temporal context. International Journal of Human-Computer Studies. 2019;121:58–72.
Tkachenko N, Guo W. Conflict detection in linguistically diverse on-line social networks: A russia-ukraine case study. In Proceedings of the 11th International Conference on Management of Digital EcoSystems, MEDES ’19. Association for Computing Machinery. New York, NY, USA. 2019. p. 23-28.
E Cambria. Affective computing and sentiment analysis. IEEE intelligent systems. 2016;31(2):102–7.
Poria S, Chaturvedi I, Cambria E, Bisio F. Sentic lda: Improving on lda with semantic similarity for aspect-based sentiment analysis. In 2016 international joint conference on neural networks (IJCNN). 2016. p. 4465–4473, IEEE.
Hevner A, Chatterjee S. Design research in information systems: theory and practice. Springer Science & Business Media. 2010;2.
González RA, Pomares A. La investigación científica basada en el diseño como eje de proyectos de investigación en ingeniería. Reunión Nacional ACOFI. 2012. p. 12–14.
Kietzmann JH, Hermkens K, McCarthy IP, Silvestre BS. Social media? get serious! understanding the functional building blocks of social media. Business horizons. 2011;54(3):241–51.
Española RA. Banco de datos (CREA). Corpus de referencia del español actual. 2015. p. 2011–10.
Spitkovsky VI, Alshawi H, Chang AX, Jurafsky D. Unsupervised dependency parsing without gold part-of-speech tags. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Edinburgh, Scotland, UK. 2011. p. 1281–1290.
Khurshid A, Gillam L, Tostevin L. University of surrey participation in trec8: Weirdness indexing for logical document extrapolation and retrieval (wilder). In The Eighth Text REtrieval Conference (TREC-8). Gaithersburg, Maryland. 1999. p. 1–8.
Joseph K, Carley KM, Hong JI. Check-ins in blau space applying blau macrosociological theory to foursquare check-ins from new york city. ACM Transactions on Intelligent Systems and Technology (TIST). 2014;5(3):1–22.
Park Y, Alam MH, Ryu WJ, and Sangkeun Lee. Bl-lda: Bringing bigram to supervised topic model. In 2015 International Conference on Computational Science and Computational Intelligence (CSCI). 2015. p. 83–88, IEEE.
Camacho D, Panizo-LLedot A, Bello-Orgaz G, Gonzalez-Pardo A, Cambria E. The four dimensions of social network analysis: An overview of research methods, applications, and software tools. Information Fusion. 2020;63:88–120.
Varelo AR. Hacia un modelo de aseguramiento de la calidad en la educación superior en colombia: estándares básicos y acreditación de excelencia. Educación superior, calidad y acreditación. CNA., 2003.
Beeferman D, Berger A, Lafferty J. Statistical models for text segmentation. Machine learning. 1999;34(1–3):177–21010.
Damani OP, Ghonge S. Appropriately incorporating statistical significance in pmi. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 2013. p. 163–169.
Arora S, Li Y, Liang Y, Ma T, Risteski A. A latent variable model approach to pmi-based word embeddings. Transactions of the Association for Computational Linguistics. 2016;4:385–99.
Ahmad K, Gillman L, Tostevin L. Weirdness indexing for logical document extrapolation and retrieval. In Proceedings of the Eighth Text Retrieval Conference (TREC-8). 2000. p. 1–8.
We would like to thank the Center for Excellence and Appropriation in Big Data and Data Analytics (CAOBA), Pontificia Universidad Javeriana, and the Ministry of Information Technologies and Telecommunications of the Republic of Colombia (MinTIC). The models and results presented in this challenge contributed to the building of the research capabilities of CAOBA. Also, the author Edwin Puertas gives thanks to the Universidad Tecnológica de Bolívar.
Conflicts of Interest
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.
About this article
Cite this article
Puertas, E., Moreno-Sandoval, L.G., Redondo, J. et al. Detection of Sociolinguistic Features in Digital Social Networks for the Detection of Communities. Cogn Comput 13, 518–537 (2021). https://doi.org/10.1007/s12559-021-09818-9
- Community discovery
- Natural language processing
- Social networks
- Community detection.