Advertisement

A Probabilistic Approach for Events Identification from Social Media RSS Feeds

  • Chiraz Trabelsi
  • Sadok B. Yahia
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7827)

Abstract

Social Media RSS feeds are the most up-to-date and inclusive releases of information on current events used by the new social media sites such as Twitter and Flickr. Indeed, RSS feeds are considered as a powerful realtime means for real-world events sharing within the social Web. By identifying these events and their associated social media resources, we can greatly improve event browsing and searching. However, a thriving challenge of events identification from such releases is owed to an efficient as well as a timely identification of events. In this paper, we are mainly dealing with event identification from heterogenous social media RSS feeds. In this respect, we introduce a new approach in order to get out these events. The main thrust of the introduced approach stands in achieving a better tradeoff between event identification accuracy and swiftness. Specifically, we adopted the probabilistic Naive Bayes model within the exploitation of stemming and feature selection techniques. Carried out experiments over two real-world datasets emphasize the relevance of our proposal and open many issues.

Keywords

Event Identification Social Media RSS feeds Naive Bayes Model 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C., Subbian, K.: Event detection in social streams. In: Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012, pp. 624–635. SIAM/Omnipress, Anaheim (2012)Google Scholar
  2. 2.
    Allan, J. (ed.): Topic Detection and Tracking: Event-Based Information Organization. Kluwer Academic Publishers (2002)Google Scholar
  3. 3.
    Baeza-Yates, R., Berthier, R.N.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)Google Scholar
  4. 4.
    Becker, H., Naaman, M., Gravano, L.: Learning similarity metrics for event identification in social media. In: Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, WSDM 2010, pp. 291–300. ACM (2010)Google Scholar
  5. 5.
    Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: Real-world event identification on twitter. In: Proceedings of the 5th International Conference on Weblogs and Social Media, ICWSM 2011. The AAAI Press, Barcelona (2011)Google Scholar
  6. 6.
    Bekkerman, R., Mccallum, A., Huang, G.: Automatic categorization of email into folders:benchmark experiments on enron and sri corpora. Technical Report IR-418, University of Massachusetts, Amherst, USA (2004)Google Scholar
  7. 7.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The weka data mining software: an update. SIGKDD Explorations 11, 10–18 (2009)CrossRefGoogle Scholar
  8. 8.
    Hall, M.A.: Correlation-based Feature Selection for Machine Learning. Doctoral thesis, The University of Waikato (April 1999)Google Scholar
  9. 9.
    Hogenboom, A., Hogenboom, F., Frasincar, F., Kaymak, U., van der Meer, O., Schouten, K.: Detecting economic events using a semantics-based pipeline. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011, Part I. LNCS, vol. 6860, pp. 440–447. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Hogenboom, F., Hogenboom, A., Frasincar, F., Kaymak, U., van der Meer, O., Schouten, K., Vandic, D.: SPEED: A semantics-based pipeline for economic event detection. In: Parsons, J., Saeki, M., Shoval, P., Woo, C., Wand, Y. (eds.) ER 2010. LNCS, vol. 6412, pp. 452–457. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Java, A., Finin, T., Nirenburg, S.: Semnews: A semantic news framework. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1939–1940. AAAI Press (2006)Google Scholar
  12. 12.
    Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2004, pp. 25–29 (2004)Google Scholar
  13. 13.
    Li, R.D.W., Abdin, K., Moore, A.: Approaching real-time network traffic classification. Technical Report RR-06-12, Department of Computer Science, Queen Mary, University of London, London (2006)Google Scholar
  14. 14.
    Papka, R., Allan, J., Lavrenko, V.: On-line new event detection and tracking. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 37–45. ACM (1998)Google Scholar
  15. 15.
    Reuter, T., Cimiano, P., Drumond, L., Buza, K., Schmidt-Thieme, L.: Scalable event-based clustering of social media via record linkage techniques. In: Proceedings of the 5th International Conference on Weblogs and Social Media, ICWSM 2011. The AAAI Press, Barcelona (2011)Google Scholar
  16. 16.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 851–860. ACM, New York (2010)CrossRefGoogle Scholar
  17. 17.
    Song, Y., Kolcz, A., Giles, C.L.: Better naive bayes classification for high-precision spam detection. Softw. Pract. Exper. 39(11), 1003–1024 (2009)CrossRefGoogle Scholar
  18. 18.
    Yang, Y., Pierce, T., Carbonell, J.: A study on retrospective and on-line event detection. In: Proceedings of the 21st ACM International Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 28–36. ACM, New York (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Chiraz Trabelsi
    • 1
  • Sadok B. Yahia
    • 1
    • 2
  1. 1.Faculty of Sciences of TunisUniversity Tunis El-ManarTunisTunisia
  2. 2.Department of Computer Science, UMR 5157 CNRS SamovarTELECOM SudParisEvry CedexFrance

Personalised recommendations