Mining Temporal Evolution of Entities in a Stream of Textual Documents

  • Gianvito Pio
  • Pasqua Fabiana Lanotte
  • Michelangelo Ceci
  • Donato Malerba
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8502)


One of the recently addressed research directions focuses on the problem of mining topic evolutions from textual documents. Following this main stream of research, in this paper we face the different, but related, problem of mining the topic evolution of entities (persons, companies, etc.) mentioned in the documents. To this aim, we incrementally analyze streams of time-stamped documents in order to identify clusters of similar entities and represent their evolution over time. The proposed solution is based on the concept of temporal profiles of entities extracted at periodic instants in time. Experiments performed both on synthetic and real world datasets prove that the proposed framework is a valuable tool to discover underlying evolutions of entities and results show significant improvements over the considered baseline methods.


Time Window Feature Selection Synthetic Dataset Textual Document Normalize Mutual Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agarwal, N., Galan, M., Liu, H., Subramanya, S.: Wiscoll: Collective wisdom based blog clustering. Inf. Sci. 180, 39–61 (2010)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.C.: On change diagnosis in evolving data streams. IEEE Trans. Knowl. Data Eng. 17(5), 587–600 (2005)CrossRefGoogle Scholar
  3. 3.
    Allan, J. (ed.): Topic Detection and Tracking: Event-based Information Organization. Kluwer International Series on Information Retrieval, Kluwer (2002)Google Scholar
  4. 4.
    Bansal, N., Chiang, F., Koudas, N., Tompa, F.W.: Seeking stable clusters in the blogosphere. In: VLDB, pp. 806–817. ACM (2007)Google Scholar
  5. 5.
    Brants, T., Chen, F., Farahat, A.: A system for new event detection. In: ACM SIGIR, pp. 330–337. SIGIR 2003. ACM (2003)Google Scholar
  6. 6.
    Ceci, M., Appice, A., Malerba, D.: Time-slice density estimation for semantic-based tourist destination suggestion. In: ECAI (2010)Google Scholar
  7. 7.
    Chung, S., McLeod, D.: Dynamic pattern mining: An incremental data clustering approach. In: Spaccapietra, S., Bertino, E., Jajodia, S., King, R., McLeod, D., Orlowska, M.E., Strous, L. (eds.) Journal on Data Semantics II. LNCS, vol. 3360, pp. 85–112. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: ICDM, pp. 226–231 (1996)Google Scholar
  9. 9.
    He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: NIPS (2005)Google Scholar
  10. 10.
    Jameel, S., Lam, W.: An n-gram topic model for time-stamped documents. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 292–304. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer (2002)Google Scholar
  12. 12.
    Kim, H., Sun, Y., Hockenmaier, J., Han, J.: Etm: Entity topic models for mining documents associated with entities. In: ICDM, pp. 349–358. IEEE (2012)Google Scholar
  13. 13.
    Kleinberg, J.: Bursty and hierarchical structure in streams. In: ACM SIGKDD, KDD 2002, pp. 91–101. ACM, New York (2002)Google Scholar
  14. 14.
    Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: KDD 2009, pp. 497–506. ACM, New York (2009)Google Scholar
  15. 15.
    Li, X., Yan, J., Fan, W., Liu, N., Yan, S., Chen, Z.: An online blog reading system by topic clustering and personalized ranking. ACM Trans. Internet Technol. 9, 9:1–9:26 (2009)Google Scholar
  16. 16.
    de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating typed dependency parses from phrase structure trees. In: LREC (2006)Google Scholar
  17. 17.
    Newman, M.E.J.: Modularity and community structure in networks. Proceedings of the National Academy of Sciences 103(23), 8577–8582 (2006)CrossRefGoogle Scholar
  18. 18.
    Ntoutsi, E., Spiliopoulou, M., Theodoridis, Y.: Fingerprint: Summarizing cluster evolution in dynamic environments. IJDWM 8(3), 27–44 (2012)Google Scholar
  19. 19.
    Sarawagi, S.: Information extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)CrossRefGoogle Scholar
  20. 20.
    Strehl, A., Ghosh, J.: Cluster ensembles — a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003)zbMATHMathSciNetGoogle Scholar
  21. 21.
    Varlamis, I., Vassalos, V., Palaios, A.: Monitoring the evolution of interests in the blogosphere. In: ICDEW, pp. 513–518 (2008)Google Scholar
  22. 22.
    Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B., Liu, X.: Learning approaches for detecting and tracking news events. IEEE Intelligent Systems and their Applications 14(4), 32–43 (1999)CrossRefGoogle Scholar
  23. 23.
    Zhong, S.: Efficient streaming text clustering. Neural Networks 18(5-6) (2005)Google Scholar
  24. 24.
    Zhu, Y., Shasha, D.: Efficient elastic burst detection in data streams. In: ACM SIGKDD, KDD 2003, pp. 336–345. ACM, New York (2003)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Gianvito Pio
    • 1
  • Pasqua Fabiana Lanotte
    • 1
  • Michelangelo Ceci
    • 1
  • Donato Malerba
    • 1
  1. 1.Dept. of Computer ScienceUniversity of Bari A. MoroBariItaly

Personalised recommendations