Tweet Expansion Method for Filtering Task in Twitter

  • Payam Karisani
  • Farhad Oroumchian
  • Maseud Rahgozar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9283)


In this article we propose a supervised method for expanding tweet contents to improve the recall of tweet filtering task in online reputation management systems. Our method does not use any external resources. It consists of creating a K-NN classifier in three steps. In these steps the tweets labeled related and unrelated in the training set are expanded by extracting and adding the most discriminative terms, calculating and adding the most frequent terms, and re-weighting the original tweet terms from training set. Our experiments in RepLab 2013 data set show that our method improves the performance of filtering task, in terms of F criterion, up to 13% over state-of-the-art classifiers such as SVM. This data set consists of 61 entities from different domains of automotive, banking, universities, and music.


Twitter Classification Filtering Content expansion 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amigó, E., Carrillo de Albornoz, J., Chugur, I., Corujo, A., Gonzalo, J., Martín, T., Meij, E., de Rijke, M., et al.: Overview of RepLab 2013: evaluating online reputation monitoring systems. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 333–352. Springer, Heidelberg (2013)Google Scholar
  2. 2.
    Amigó, E., Carrillo-de-Albornoz, J., Chugur, I., Corujo, A., Gonzalo, J., Meij, E., de Rijke, M., Spina, D.: Overview of RepLab 2014: author profiling and reputation dimensions for online reputation management. In: Kanoulas, E., Lupu, M., Clough, P., Sanderson, M., Hall, M., Hanbury, A., Toms, E. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 307–322. Springer, Heidelberg (2014)Google Scholar
  3. 3.
    Spina, D., Gonzalo, J., Amigó, E.: Discovering filter keywords for company name disambiguation in twitter. Expert Systems with Applications 40(12), 4986–5003 (2013)CrossRefGoogle Scholar
  4. 4.
    Hoffman, T.: Online reputation management is hot—but is it ethical. Computerworld, p. 2, February 2008Google Scholar
  5. 5.
    Meij, E., Weerkamp, W., de Rijke, M.: Adding semantics to microblog posts. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining. ACM (2012)Google Scholar
  6. 6.
    Saleiro, P., Rei, L., Pasquali, A., Soares, C., Teixeira, J., Pinto, F., Nozari, M., Félix, C., Strecht, P.: POPSTAR at RepLab 2013: name ambiguity resolution on twitter. In: CLEF 2013 Eval. Labs and Workshop Online Working Notes (2013)Google Scholar
  7. 7.
    Lavrenko, V., Bruce Croft, W.: Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2001)Google Scholar
  8. 8.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval, vol. 463. ACM Press, New York (1999)Google Scholar
  9. 9.
    Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics, 79–86 (1951)Google Scholar
  10. 10.
    Allan, J., Connell, M.E., Bruce Croft, W., Fang-Fang F., Fisher, D., Li, X.: Inquery and trec-9, DTIC Document (2000)Google Scholar
  11. 11.
    Amigó, E., Gonzalo, J., Verdejo, F.: A general evaluation measure for document organization tasks. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2013)Google Scholar
  12. 12.
    Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Computation 13(3), 637–649 (2001)CrossRefMATHGoogle Scholar
  13. 13.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  14. 14.
    Lavelli, A., Sebastiani, F., Zanoli, R.: Distributional term representations: an experimental comparison. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. ACM (2004) Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Payam Karisani
    • 1
  • Farhad Oroumchian
    • 2
  • Maseud Rahgozar
    • 1
  1. 1.Database Research Group, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer EngineeringUniversity of TehranTehranIran
  2. 2.University of Wollongong in DubaiDubaiUnited Arab Emirates

Personalised recommendations