Content Mining of Microblogs

  • M. Özgür CingizEmail author
  • Banu Diri
Part of the Lecture Notes in Social Networks book series (LNSN)


Emergence of Web 2.0, internet users can share their contents with other users using social networks. In this chapter microbloggers’ contents are evaluated with respect to how they reflect their categories. Migrobloggers’ category information, which is one of the four categories that are economy sport, entertainment or technology, is taken from application. 3337 RSS news feeds, whose category labels are same with microbloggers’ contributions, are used as training data for classification. Unlike the similar studies if a feature of microblog doesn’t appear in RSS news feeds as a feature, this feature is omitted so abbreviations and nonsense words in microblogs can be eliminated. In this study two types of users’ contributions are taken as test data. These users are normal microbloggers and bots. Classification results show that bots provide more categorical content than normal users.


Support Vector Machine Feature Selection Method Decision Boundary Normal User Text Document 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Degirmencioglu EA (2010) Exploring area-specific microblogger social networks. Master’s thesis, Bogazici University, TurkeyGoogle Scholar
  2. 2.
    Yurtsever E (2010) Sweettweet: a semantic analysis for microblogging environments. Master’s thesis, Bogazici University, TurkeyGoogle Scholar
  3. 3.
    Akman DS (2010) Revealing microblogger interests by analyzing contributions. Master’s thesis, Bogazici University, TurkeyGoogle Scholar
  4. 4.
    Aslan O (2010) An analysis of news on microblogging systems. Master’s thesis, Bogazici University, TurkeyGoogle Scholar
  5. 5.
    Pilavcılar IF (2007) Metin Madenciliğiyle Metin Sınıflandırma. Master’s thesis, Yıldız Teknik UniversitesiGoogle Scholar
  6. 6.
    Güç B (2010) Information filtering on micro-blogging services. Master’s thesis, Swiss Federal Institute of Technology, ZurichGoogle Scholar
  7. 7.
    Chua S (2008) The role of parts-of-speech in feature selection. In: The international conference on data mining and applications-IAENGGoogle Scholar
  8. 8.
    Masuyama T, Nakagawa H (2002) Applying cascaded feature selection to SVM text categorization. In: The DEXA workshops, pp 241–245Google Scholar
  9. 9.
    Lan M, Sung S, Low H (2005) A comparative study on term weighting schemes for text categorization. In: IEEE international conference on neural networks-IJCCN’05Google Scholar
  10. 10.
    Leopold E, Kindermann J (2002) Text categorization with support vector machines. How to represent texts in input space? Mach Learn 46(1–3):423–444zbMATHGoogle Scholar
  11. 11.
    Yang Y, Pedersen J (1997) A compartive study on feature selection in text categorization. In: The proceedings of ICML-97Google Scholar
  12. 12.
    Zheng Z, Srihai R (2003) Optimally combining positive and negative features for text categorization. In: ICML workshopGoogle Scholar
  13. 13.
    Yang Y, Liu X (1999) A re-examination of text categorization methods. In: The proceedings of SIGIR-99, 22nd ACM international conference on research and development in information retrieval Berkeley, US, pp 42–49Google Scholar
  14. 14.
    Joachims T (1998) Text categorization with support vector machiness: learning with many relevant features. In: The European conference on machine learning (ECML)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Computer Engineering DepartmentYıldız Teknik ÜniversitesiIstanbulTurkey

Personalised recommendations