Skip to main content

Performing and Visualizing Temporal Analysis of Large Text Data Issued for Open Sources: Past and Future Methods

  • Conference paper
  • First Online:
Book cover Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery (BDAS 2015, BDAS 2016)

Abstract

In this paper we first propose a state of the art on the methods for the visualization and for the interpretation of textual data, and in particular of scientific data. We then shortly present our contributions to this field in the form of original methods for the automatic classification of documents and easy interpretation of their content through characteristic keywords and classes created by our algorithms. In a second step, we focus our analysis on the data evolving over time. We detail our diachronic approach, especially suitable for the detection and for visualization of topic changes. This allows us to conclude with Diachronic’Explorer, our upcoming visualization tool for visual exploration of evolutionary data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See URL: http://cordis.europa.eu/fp7/ict/programme/fet_en.html.

  2. 2.

    The choice of the weighting scheme is not really constrained by the approach instead of producing positive values. Such scheme is supposed to figure out the significance (i.e. semantic and importance) of the feature for the data. Feature Recall is a scale independent measure but feature Predominance is not. We have however shown experimentally in (Lamirel et al., 2014a) that the F-measure which is a combination of these two measures is only weakly influenced by feature scaling. Nevertheless, to guaranty full scale independent behavior for this measure, data must be standardized.

  3. 3.

    The ISTEX project (Initiative d’Excellence pour l’Information Scientifique et Technique) fits in the “Investment for the future” program, initiated by the French Ministry of Higher Education and Research (MESR), whose ambition is to strengthen research and French higher education on the world level. The ISTEX project main objective is to offer to the whole of the community of higher education and research, online access to the retrospective collections of scientific literature in all disciplines by engaging a national policy of massive acquisition of documentation: archives of journals, databases, corpus of texts.

    Reference: http://www.istex.fr.

  4. 4.

    A demo version of the tool can be found at URL: http://github.com/nicolasdugue/istex-demonstrateur.

  5. 5.

    The principle of computation of the strength of a topic link between periods is explained with more details in [14].

References

  1. Amstrong, J., Green, K., Graefe, A.: Forescating principles. In: Encyclopedia of Statistical Sciences (2011)

    Google Scholar 

  2. Blei, D., Ng, A.Y., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Cardon, D., Fouetillou, G., Roth, C.: Two paths of glory-structural positions and trajectories of websites within their topical territory. In: Fifth International AAAI Conference on Weblogs and Social Media - ICWSM 2011 (2011)

    Google Scholar 

  4. Daim, T., Rueda, G., Martin, H., Gerdsri, P.: Forecasting emerging technologies: Use of bibliometrics and patent analysis. Technol. Forecast. Soc. Chang. 73, 981–1012 (2006)

    Article  Google Scholar 

  5. Dermouche, M., Velcin, J., Loudcher, S., Khouas, L.: Une nouvelle mesure pour lévaluation des méthodes dextraction de thématiques : la vraisemblance généralisée. In: Actes des 13ièmes Journées Francophones dExtraction et de Gestion des Connaissances (EGC 2013), Toulouse, France, pp. 317–328 (2013)

    Google Scholar 

  6. Falk, I., Lamirel, J.C., Gardent, C.: Classifying french verbs using french and english lexical resources. In: International Conference on Computational Linguistic (ACL 2012), Jeju Island, Korea (2012)

    Google Scholar 

  7. Francesiaz, T., Graille, R., Metahri, B.: Introduction aux modèles probabilistes utilisés en fouille de données (2015)

    Google Scholar 

  8. Fritzke, B.: A growing neural gas network learns topologies. In: Advances in Neural Information Processing Systems, vol. 7, pp. 625–632 (1995)

    Google Scholar 

  9. Ganesan, A., Brantley, K., Pan, S., Chen, J.: Ldaexplore: Visualizing topic models generated using latent dirichlet allocation. arXiv preprint arxiv:1507.06593 (2015)

  10. Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Workshop on Automatic summarization, NAACL-ANLP, pp. 40–48 (2000)

    Google Scholar 

  11. Guille, A., Soriano-Morales, E.P.: Tom: A library for topic modeling and browsing. In: Actes des 16ièmes Journées Francophones dExtraction et de Gestion des Connaissances (EGC 2016), Reims, France, pp. 451–456 (2016)

    Google Scholar 

  12. Itoh, M., Yoshinaga, N., Toyoda, M., Kitsuregawa, M.: Analysis and visualization of temporal changes in bloggers’ activities and interests. In: 2012 IEEE Pacific Visualization Symposium (PacificVis), pp. 57–64 (2012)

    Google Scholar 

  13. Kajikawa, Y., Yoshikawa, J., Takeda, Y., Matushima, K.: Tracking emerging technologies in energy research: Toward a roadmap for sustainable energy. Technol. Forecast. Soc. Chang. 75, 771–782 (2008)

    Article  Google Scholar 

  14. Lamirel, J.C.: A new diachronic methodology for automatizing the analysis of research topics dynamics: an example of application on optoelectronics research. Scientometrics 93(1), 151–166 (2012)

    Article  Google Scholar 

  15. Lamirel, J.C., Cuxac, P.: Une nouvelle méthode statistique pour la classification robuste des données textuelles : le cas mitterand-chirac. In: JADT, Paris, France (2014)

    Google Scholar 

  16. Lamirel, J.C., Cuxac, P.: New quality indexes for optimal clustering model identification with high dimensional data. In: Proceedings of ICDM-HDM15 - International Workshop on High Dimensional Data Mining, Atlantic City, USA (2015)

    Google Scholar 

  17. Lamirel, J.C., Cuxac, P., Chivukula, A., Hajlaoui, K.: Optimizing text classification through efficient feature selection based on quality metric. J. Intell. Inf. Syst. 2013, 1–18 (2014). Special issue on PAKDD-QIMIE 2013

    Google Scholar 

  18. Lamirel, J.C., Ta, A., Attik, M.: Novel labeling strategies for hierarchical representation of multidimensional data analysis results. In: Proceedings of IASTED International Conference on Artificial Intelligence and Applications (AIA), Innsbruck, Austria (2008)

    Google Scholar 

  19. Mogoutov, A., Kahane, N.: Data search strategy for science and technology emergence: A scalable and evolutionary query for nanotechnology tracking. Res. Policy 36, 893–903 (2007)

    Article  Google Scholar 

  20. Monti, R., Roubelat, F.: La boîte outils de prospective stratégique et la prospective de dfense: rétrospective et perspectives. In: Actes des Entretiens Science & Défense, Paris (1998)

    Google Scholar 

  21. Noyons, E.: Science maps within a science policy context. In: Moed, H.F., Glänzel, W., Schmoch, U. (eds.) Handbook of Quantitative Science and Technology Research, pp. 237–255. Springer, Heidelberg (2004)

    Google Scholar 

  22. Osborne, F., Motta, E.: Understanding research dynamics. In: Presutti, V., Stankovic, M., Cambria, E., Cantador, I., Di Iorio, A., Di Noia, T., Lange, C., Reforgiato Recupero, D., Tordai, A. (eds.) SemWebEval 2014. CCIS, vol. 475, pp. 101–107. Springer, Heidelberg (2014)

    Google Scholar 

  23. Osborne, F., Scavo, G., Motta, E.: Identifying diachronic topic-based research communities by clustering shared research trajectories. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 114–129. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  24. Perea, M.P.: Dynamic cartography with diachronic data: dialectal stratigraphy. Literary Linguist. Comput. 28(1), 147–156 (2013)

    Article  MathSciNet  Google Scholar 

  25. Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Yolum, I., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  26. Ratinaud, P.: Visualisation chronologique des analyses alceste: application twitter avec l’exemple du hashtag # mariagepourtous (2014)

    Google Scholar 

  27. Rosvall, M., Bergstrom, C.T.: Mapping change in large networks. PloS ONE 5(1), e8694 (2010)

    Article  Google Scholar 

  28. Wang, X., Cheng, Q., Lu, W.: Analyzing evolution of research topics with neviewer: a new method based on dynamic co-word networks. Scientometrics 101(2), 1253–1271 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-Charles Lamirel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Lamirel, JC., Dugué, N., Cuxac, P. (2016). Performing and Visualizing Temporal Analysis of Large Text Data Issued for Open Sources: Past and Future Methods. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-34099-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-34098-2

  • Online ISBN: 978-3-319-34099-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics