New Generation Computing

, Volume 35, Issue 4, pp 451–472 | Cite as

Predicting the Relevance of Social Media Posts Based on Linguistic Features and Journalistic Criteria

  • Alexandre Pinto
  • Hugo Gonçalo OliveiraEmail author
  • Álvaro Figueira
  • Ana Oliveira Alves
Special Feature


An overwhelming quantity of messages is posted in social networks every minute. To make the utilization of these platforms more productive, it is imperative to filter out information that is irrelevant to the general audience, such as private messages, personal opinions or well-known facts. This work is focused on the automatic classification of public social text according to its potential relevance, from a journalistic point of view, hopefully improving the overall experience of using a social network. Our experiments were based on a set of posts with several criteria, including the journalistic relevance, assessed by human judges. To predict the latter, we rely exclusively on linguistic features, extracted by Natural Language Processing tools, regardless the author of the message and its profile information. In our first approach, different classifiers and feature engineering methods were used to predict relevance directly from the selected features. In a second approach, relevance was predicted indirectly, based on an ensemble of classifiers for other key criteria when defining relevance—controversy, interestingness, meaningfulness, novelty, reliability and scope—also in the dataset. The first approach achieved a F 1-score of 0.76 and an Area under the ROC curve (AUC) of 0.63. But the best results were achieved by the second approach, with the best learned model achieving a F 1-score of 0.84 with an AUC of 0.78. This confirmed that journalistic relevance can indeed be predicted by the combination of the selected criteria, and that linguistic features can be exploited to classify the latter.


Relevance assessment Social mining Information extraction Natural language processing Automatic text classification 



This work was financed by the ERDF European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project REMINDS–UTAP-ICDT/EEI-CTP/0022/2014.


  1. 1.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). doi: 10.1109/TPAMI.2013.50 CrossRefGoogle Scholar
  2. 2.
    Dalip, D.H., Gonçalves, M.A., Cristo, M., Calado, P.: A general multiview framework for assessing the quality of collaboratively created content on web 2.0. J. Assoc. Inform. Sci. Technol.68(2), 286–308 (2017). doi: 10.1002/asi.23650 Google Scholar
  3. 3.
    Fernandes, K,, Vinagre, P., Cortez, P.: A proactive intelligent decision support system for predicting the popularity of online news. In: Progress in Artificial Intelligence, LNCS, vol 9273, pp. 535–546. Springer, (2015)Google Scholar
  4. 4.
    Figueira, A., Sandim, M., Fortuna, P.: An approach to relevancy detection: contributions to the automatic detection of relevance in social networks. In: New Advances in Information Systems and Technologies, pp. 89–99. Springer, (2016)Google Scholar
  5. 5.
    Frain, A., Wubben, S.: SatiricLR: a language resource of satirical news articles. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA), Paris, France (2016)Google Scholar
  6. 6.
    Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.: Part-of-speech tagging for twitter: annotation, features, and experiments. Proceedings of 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - vol. 2, pp. 42–47. Portland, Oregon (2011)Google Scholar
  7. 7.
    Guerini, M., Strapparava, C., Özbal, G.: Exploring text virality in social networks. In: International AAAI Conference on Web and Social Media (2011)Google Scholar
  8. 8.
    Irani, D., Webb, S., Pu, C., Li, K.: Study of trend-stuffing on twitter through text classification. In: Proceedings of 7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS) (2010)Google Scholar
  9. 9.
    Lee, K., Palsetia, D., Narayanan, R., Ali, Md., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 251–258 (2011)Google Scholar
  10. 10.
    Hong, L., Davison, B.: Predicting popular messages in twitter. Proceedings of the 20th International Conference Companion on World Wide Web, pp. 57–58. Hyderabad, India (2011)Google Scholar
  11. 11.
    Lops, P., de Gemmis, M., Semeraro, G.: Content-based recommender systems: State of the art and trends. Recommender systems handbook, pp. 73–105. Springer, US (2011)Google Scholar
  12. 12.
    Nakov, P., Rosenthal, S., Kozareva, Z., Stoyanov, V., Ritter, A., Wilson, T.: SemEval-2013 Task 2: Sentiment Analysis in Twitter. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 312–320. ACL Press, Atlanta, Georgia, USA (2013)Google Scholar
  13. 13.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Petrovic, S., Osborne, M., Lavrenko, V.: RT to win! predicting message propagation in twitter. In: Proceedings of the 5th International Conference on Weblogs and Social Media, pp. 17–21. The AAAI Press, Barcelona, Catalonia, Spain (2011)Google Scholar
  15. 15.
    Bandari, R., Asur, S., Huberman, B.: The pulse of news in social media: forecasting popularity. Proceedings of the 6th International AAAI Conference on Web and Social Media, pp. 26–33. Dublin, Ireland (2012)Google Scholar
  16. 16.
    Ritter, A., Clark, S., Etzioni, O.: Named entity recognition in tweets: an experimental study. Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Edinburgh, Scotland (2011)Google Scholar
  17. 17.
    Ritter, A., Mausam, Etzioni, O., Clark, S.: Open domain event extraction from twitter. In: Proceedings of 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, KDD’12, pp. 1104–1112 (2012)Google Scholar
  18. 18.
    Rose, A.: Facebook is suffering an irrelevance crisis. (2015). Accessed 06 Novemb 2015
  19. 19.
    Bird, S.: NLTK: The natural language toolkit. In: Proceedings of the COLING/ACL on Interactive Presentation Sessions, Sydney, Australia, COLING-ACL ’06, pp. 69–72 (2006)Google Scholar
  20. 20.
    Saracevic, T.: Why is relevance still the basic notion in information science? (Despite Great Advances in Information Technology). In: Proceedings of the International Symposium on Information Science, Zadar, Croatia (2015)Google Scholar
  21. 21.
    Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY, USA, SIGIR ’10, pp. 841–842 (2010). doi: 10.1145/1835449.1835643
  22. 22.
    Szabo, G., Huberman, B.A.: Predicting the popularity of online content. Communications of the ACM 53(8), 80–88 (2010). doi: 10.1145/1787234.1787254 CrossRefGoogle Scholar
  23. 23.
    Tatar, A., de Amorim M.D., Fdida, S., Antoniadis, P.: A survey on predicting the popularity of web content. J. Internet Serv. Appl. 5(1), 8:1–8:20 (2014). doi: 10.1186/s13174-014-0008-y CrossRefGoogle Scholar
  24. 24.
    Wu, M.: If 99.99% of Big Data is Irrelevant, Why Do We Need It?. (2012). Accessed 06 Novemb 2015
  25. 25.
    Xu, C., Tao, D., Xu, C.: A survey on multi-view learning. CoRR abs/1304.5634. (2013).
  26. 26.
    Yu, B., Chen, M., Kwok, L.: Toward predicting popularity of social marketing messages. Social Computing, Behavioral-Cultural Modeling and Prediction. Lecture Notes in Computer Science, vol. 6589, pp. 317–324. Springer, Berlin Heidelberg (2011)Google Scholar
  27. 27.
    Zeng, Y.C., Wu, S.H.: Modeling the helpful opinion mining of online consumer reviews as a classification problem. Proceedings of the IJCNLP 2013 Workshop on Natural Language Processing for Social Media (SocialNLP), pp. 29–35. Asian Federation of Natural Language Processing, Nagoya, Japan (2013)Google Scholar
  28. 28.
    Zhu, X., Goldberg, A.B.: Introduction to semi-supervised learning. Synth. Lectures Artif. Intell. Mach. Learn. 3(1), 1–130 (2009)CrossRefzbMATHGoogle Scholar

Copyright information

© Ohmsha, Ltd. and Springer Japan 2017

Authors and Affiliations

  1. 1.CISUC, Department of Informatics EngineeringUniversity of CoimbraCoimbra Portugal
  2. 2.CRACS / INESC TECUniversity of PortoPorto Portugal
  3. 3.IPC, Polytechnic Institute of CoimbraCoimbra Portugal

Personalised recommendations