Advertisement

Towards an On-Line Analysis of Tweets Processing

  • Sandra Bringay
  • Nicolas Béchet
  • Flavien Bouillot
  • Pascal Poncelet
  • Mathieu Roche
  • Maguelonne Teisseire
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6861)

Abstract

Tweets exchanged over the Internet represent an important source of information, even if their characteristics make them difficult to analyze (a maximum of 140 characters, etc.). In this paper, we define a data warehouse model to analyze large volumes of tweets by proposing measures relevant in the context of knowledge discovery. The use of data warehouses as a tool for the storage and analysis of textual documents is not new but current measures are not well-suited to the specificities of the manipulated data. We also propose a new way for extracting the context of a concept in a hierarchy. Experiments carried out on real data underline the relevance of our proposal.

Keywords

Information Retrieval Data Warehouse Textual Data Probabilistic Latent Semantic Analysis Word Dimension 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Codd, E., Codd, S., Salley, C.: Providing OLAP (On-Line Analytical Processing) to User-Analysts: An IT Mandate. In: White Paper (1993)Google Scholar
  2. 2.
    Pérez-Martínez, J.M., Llavori, R.B., Cabo, M.J.A., Pedersen, T.B.: Contextualizing data warehouses with documents. Decision Support Systems 45(1), 77–94 (2008)CrossRefGoogle Scholar
  3. 3.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefzbMATHGoogle Scholar
  4. 4.
    Grabs, T., Schek, H.J.: ETH Zurich at INEX: Flexible Information Retrieval from XML with PowerDB-XML. In: Grabs, T., Schek, H.J. (eds.) XML with PowerDB-XML. INEX Workshop, pp. 141–148. ERCIM Publications (2002)Google Scholar
  5. 5.
    Roche, M., Prince, V.: Managing the acronym/expansion identification process for text-mining applications. Int. J. of Software and Informatics 2(2), 163–179 (2008)Google Scholar
  6. 6.
    Daille, B.: Approche mixte pour l’extraction automatique de terminologie: statistiques lexicales et filtres linguistiques. PhD thesis, Université Paris 7 (1994)Google Scholar
  7. 7.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors. In: Proceedings of WWW, pp. 851–860 (2010)Google Scholar
  8. 8.
    Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of SIGMOD, Demonstration, pp. 1155–1158 (2010)Google Scholar
  9. 9.
    Benhardus, J.: Streaming trend detection in twitter. In: National Science Foundation REU for Artificial Intelligence, NLP and IR (2010)Google Scholar
  10. 10.
    Keith, S., Kaser, O., Lemire, D.: Analyzing large collections of electronic text using olap. Technical Report TR-05-001, UNBSJ CSAS (2005)Google Scholar
  11. 11.
    Lin, C.X., Ding, B., Han, J., Zhu, F., Zhao, B.: Text Cube: Computing IR Measures for Multidimensional Text Database Analysis. In: Proc. of ICDM, pp. 905–910 (2008)Google Scholar
  12. 12.
    Zhang, D., Zhai, C., Han, J.: Topic cube: Topic modeling for olap on multidimensional text databases. In: Proc. of SIAM, pp. 1123–1134 (2009)Google Scholar
  13. 13.
    Hofmann, T.: Probabilistic latent semantic analysis. In: Proc. of Uncertainty in Artificial Intelligence, UAI 1999, pp. 289–296 (1999)Google Scholar
  14. 14.
    Pujolle, G., Ravat, F., Teste, O., Tournier, R.: Fonctions d’agrégation pour l’analyse en ligne (OLAP) de données textuelles. Fonctions TOP_KW et AVG_KW opérant sur des termes 13(6), 61–84 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Sandra Bringay
    • 1
    • 2
  • Nicolas Béchet
    • 3
  • Flavien Bouillot
    • 1
  • Pascal Poncelet
    • 1
  • Mathieu Roche
    • 1
  • Maguelonne Teisseire
    • 1
    • 4
  1. 1.LIRMM – CNRSUniv. Montpellier 2France
  2. 2.Dept MIApUniv. Montpellier 3France
  3. 3.INRIA Rocquencourt - Domaine de VoluceauFrance
  4. 4.CEMAGREF – UMR TETISFrance

Personalised recommendations