Encyclopedia of Social Network Analysis and Mining

2018 Edition
| Editors: Reda Alhajj, Jon Rokne

Microtext Processing

  • Richard Khoury
  • Raphaël Khoury
  • Abdelwahab Hamou-Lhadj
Reference work entry
DOI: https://doi.org/10.1007/978-1-4939-7131-2_353




Natural Language Processing


The term “microtext” was proposed by US Navy researchers (Dela Rosa and Ellen 2009) to describe a type of written text document that has three characteristics: (a) it is very short, typically one or two sentences, and possibly as little as a single word; (b) it is written in an informal manner and unedited for quality and thus may use loose grammar, a conversational tone, vocabulary errors, and uncommon abbreviations and acronyms; and (c) it is semi-structured in the NLP sense, in that it includes some metadata such as a time stamp, an author, or the name of a field it was entered into. Microtexts have become omnipresent in today’s world: they are notably found in online chat discussions; online forum posts; user comments posted on online material such as videos, pictures, and news stories; Facebook newsfeeds and Twitter updates; Internet...

This is a preview of subscription content, log in to check access.


  1. Baldwin T, Chai JY (2011) Beyond normalization: pragmatics of word form in text messages. In: 5th international joint conference on natural language processing, Chiang Mai, 8–13 Nov 2011Google Scholar
  2. Barbosa L, Feng, J (2010) Robust sentiment detection on Twitter from biased and noisy data. In: Proceedings of the 23rd international conference on computational Linguistics, Beijing, pp 36–44Google Scholar
  3. Baron N, Ling R (2007) Text messaging and IM: linguistic comparison of American college data. J Lang Psychol Stud 26:291–298CrossRefGoogle Scholar
  4. Chen L, Wang W, Sheth AP (2012) Are Twitter users equal in predicting elections? A study of user groups in predicting 2012 U.S. Republican Presidential Primaries. In: SocInfo 2012, Lausanne. LNCS 7710. Springer, pp 379–392Google Scholar
  5. Cormack GV, Gómez Hidalgo JM, Puertas Sánz E (2007) Spam filtering for short messages. In: Proceedings of the 16th ACM conference on information and knowledge management (ACM CIKM’07), Lisbon, pp 313–320Google Scholar
  6. Cvijikj IP, Michahelles F (2011) Monitoring trends on Facebook. In: Ninth IEEE international conference on dependable, autonomic and secure computing, Zurich, 12–14 Dec 2011, pp 895–202Google Scholar
  7. Dela Rosa K, Ellen J (2009) Text classification methodologies applied to micro-text in military chat. In: Proceedings of the international conference on machine learning and applications, Miami Beach, pp 710–714Google Scholar
  8. Dong H, Hui SC, He Y (2006) Structural analysis of chat messages for topic detection. Online Inf Rev 30(5):496–516CrossRefGoogle Scholar
  9. Ellen J (2011) All about microtext: a working definition and a survey of current microtext research within artificial intelligence and natural language processing. In: ICAART (1), Rome, pp 329–336Google Scholar
  10. Ferrara K, Brunner H, Whittemore G (1991) Interactive written discourse as an emergent register. Writ Commun 8:8–34CrossRefGoogle Scholar
  11. Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. Technical report, StanfordGoogle Scholar
  12. Healy M, Delany S, Zamolotskikh A (2005) An assessment of case-based reasoning for short text messages. In: Creaney N (ed) Proceedings of the 16th Irish conference on artificial intelligence and cognitive science, pp 257–266Google Scholar
  13. Kobus, Catherine, François Yvon, and Géraldine Damnati. “Normalizing SMS: are two metaphors better than one?.” In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 441–448. Association for Computational Linguistics, 2008.Google Scholar
  14. Kolenda T, Hansen LK, Larsen J (2001) Signal detection using ICA: application to chat room topic spotting. In: Proceedings of the third international conference on independent component analysis and blind source separation, San Diego, pp 540–545Google Scholar
  15. Liu F, Weng F, Wang B, Liu Y (2011) Insertion, deletion, or substitution?: normalizing text messages without pre-categorization nor supervision. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 2. Portland, pp 71–76Google Scholar
  16. Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of the seventh conference on international language resources and evaluation, Valletta. European LanguageGoogle Scholar
  17. Paolillo JC (1999) The virtual speech community: social network and language variation on IRC. In: Proceedings of the 32nd annual Hawaii international conference on system sciences, MauiGoogle Scholar
  18. Petrovic S, Osborne M, Lavrenko V (2010) The Edinburgh Twitter corpus. In: Proceedings of the NAACL HLT workshop on computational linguistics in a world of social media, Los Angeles, pp 25–26Google Scholar
  19. Ritterman J, Osborne M, Klein E (2009) Using prediction markets and Twitter to predict a swine flu pandemic. In: 1st international workshop on mining social media – 13th conference of the Spanish association for artificial intelligenceGoogle Scholar
  20. Takahashi T, Tomioka R, Yamanishi K (2011) Discovering emerging topics in social streams via link anomaly detection. In: 11th IEEE international conference on data mining, Tokyo, 11–14 Dec 2011, pp 1230–1235Google Scholar
  21. Wang AH (2010) Don’t follow me – spam detection in Twitter. In: Proceedings of the international conference on security and cryptography (SECRYPT 2010), Athens, pp 142–151Google Scholar
  22. Wu T, Khan FM, Fisher TA, Shuler LA, Pottenger WM (2002) Posting act tagging using transformation-based learning. In: The proceedings of the workshop on foundations of data mining and discovery, IEEE international conference on data mining (ICDM’02), Dec 2002Google Scholar

Copyright information

© Springer Science+Business Media LLC, part of Springer Nature 2018

Authors and Affiliations

  • Richard Khoury
    • 1
  • Raphaël Khoury
    • 2
  • Abdelwahab Hamou-Lhadj
    • 3
  1. 1.Department of Computer Science and Software EngineeringUniversité LavalQuébec CityCanada
  2. 2.Departement of Computer Science and MathematicsUniversité du Québec à ChicoutimiChicoutimiCanada
  3. 3.Department of Electrical and Computer EngineeringConcordia UniversityMontrealCanada

Section editors and affiliations

  • Fakhreddine Karray
    • 1
  1. 1.Department of Electrical and Computer Engineering, Centre for Pattern Analysis and Machine Intelligence (CPAMI)University of WaterlooWaterlooCanada