Abstract
In this paper we first propose a state of the art on the methods for the visualization and for the interpretation of textual data, and in particular of scientific data. We then shortly present our contributions to this field in the form of original methods for the automatic classification of documents and easy interpretation of their content through characteristic keywords and classes created by our algorithms. In a second step, we focus our analysis on the data evolving over time. We detail our diachronic approach, especially suitable for the detection and for visualization of topic changes. This allows us to conclude with Diachronic’Explorer, our upcoming visualization tool for visual exploration of evolutionary data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The choice of the weighting scheme is not really constrained by the approach instead of producing positive values. Such scheme is supposed to figure out the significance (i.e. semantic and importance) of the feature for the data. Feature Recall is a scale independent measure but feature Predominance is not. We have however shown experimentally in (Lamirel et al., 2014a) that the F-measure which is a combination of these two measures is only weakly influenced by feature scaling. Nevertheless, to guaranty full scale independent behavior for this measure, data must be standardized.
- 3.
The ISTEX project (Initiative d’Excellence pour l’Information Scientifique et Technique) fits in the “Investment for the future” program, initiated by the French Ministry of Higher Education and Research (MESR), whose ambition is to strengthen research and French higher education on the world level. The ISTEX project main objective is to offer to the whole of the community of higher education and research, online access to the retrospective collections of scientific literature in all disciplines by engaging a national policy of massive acquisition of documentation: archives of journals, databases, corpus of texts.
Reference: http://www.istex.fr.
- 4.
A demo version of the tool can be found at URL: http://github.com/nicolasdugue/istex-demonstrateur.
- 5.
The principle of computation of the strength of a topic link between periods is explained with more details in [14].
References
Amstrong, J., Green, K., Graefe, A.: Forescating principles. In: Encyclopedia of Statistical Sciences (2011)
Blei, D., Ng, A.Y., Jordan, M.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cardon, D., Fouetillou, G., Roth, C.: Two paths of glory-structural positions and trajectories of websites within their topical territory. In: Fifth International AAAI Conference on Weblogs and Social Media - ICWSM 2011 (2011)
Daim, T., Rueda, G., Martin, H., Gerdsri, P.: Forecasting emerging technologies: Use of bibliometrics and patent analysis. Technol. Forecast. Soc. Chang. 73, 981–1012 (2006)
Dermouche, M., Velcin, J., Loudcher, S., Khouas, L.: Une nouvelle mesure pour lévaluation des méthodes dextraction de thématiques : la vraisemblance généralisée. In: Actes des 13ièmes Journées Francophones dExtraction et de Gestion des Connaissances (EGC 2013), Toulouse, France, pp. 317–328 (2013)
Falk, I., Lamirel, J.C., Gardent, C.: Classifying french verbs using french and english lexical resources. In: International Conference on Computational Linguistic (ACL 2012), Jeju Island, Korea (2012)
Francesiaz, T., Graille, R., Metahri, B.: Introduction aux modèles probabilistes utilisés en fouille de données (2015)
Fritzke, B.: A growing neural gas network learns topologies. In: Advances in Neural Information Processing Systems, vol. 7, pp. 625–632 (1995)
Ganesan, A., Brantley, K., Pan, S., Chen, J.: Ldaexplore: Visualizing topic models generated using latent dirichlet allocation. arXiv preprint arxiv:1507.06593 (2015)
Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Workshop on Automatic summarization, NAACL-ANLP, pp. 40–48 (2000)
Guille, A., Soriano-Morales, E.P.: Tom: A library for topic modeling and browsing. In: Actes des 16ièmes Journées Francophones dExtraction et de Gestion des Connaissances (EGC 2016), Reims, France, pp. 451–456 (2016)
Itoh, M., Yoshinaga, N., Toyoda, M., Kitsuregawa, M.: Analysis and visualization of temporal changes in bloggers’ activities and interests. In: 2012 IEEE Pacific Visualization Symposium (PacificVis), pp. 57–64 (2012)
Kajikawa, Y., Yoshikawa, J., Takeda, Y., Matushima, K.: Tracking emerging technologies in energy research: Toward a roadmap for sustainable energy. Technol. Forecast. Soc. Chang. 75, 771–782 (2008)
Lamirel, J.C.: A new diachronic methodology for automatizing the analysis of research topics dynamics: an example of application on optoelectronics research. Scientometrics 93(1), 151–166 (2012)
Lamirel, J.C., Cuxac, P.: Une nouvelle méthode statistique pour la classification robuste des données textuelles : le cas mitterand-chirac. In: JADT, Paris, France (2014)
Lamirel, J.C., Cuxac, P.: New quality indexes for optimal clustering model identification with high dimensional data. In: Proceedings of ICDM-HDM15 - International Workshop on High Dimensional Data Mining, Atlantic City, USA (2015)
Lamirel, J.C., Cuxac, P., Chivukula, A., Hajlaoui, K.: Optimizing text classification through efficient feature selection based on quality metric. J. Intell. Inf. Syst. 2013, 1–18 (2014). Special issue on PAKDD-QIMIE 2013
Lamirel, J.C., Ta, A., Attik, M.: Novel labeling strategies for hierarchical representation of multidimensional data analysis results. In: Proceedings of IASTED International Conference on Artificial Intelligence and Applications (AIA), Innsbruck, Austria (2008)
Mogoutov, A., Kahane, N.: Data search strategy for science and technology emergence: A scalable and evolutionary query for nanotechnology tracking. Res. Policy 36, 893–903 (2007)
Monti, R., Roubelat, F.: La boîte outils de prospective stratégique et la prospective de dfense: rétrospective et perspectives. In: Actes des Entretiens Science & Défense, Paris (1998)
Noyons, E.: Science maps within a science policy context. In: Moed, H.F., Glänzel, W., Schmoch, U. (eds.) Handbook of Quantitative Science and Technology Research, pp. 237–255. Springer, Heidelberg (2004)
Osborne, F., Motta, E.: Understanding research dynamics. In: Presutti, V., Stankovic, M., Cambria, E., Cantador, I., Di Iorio, A., Di Noia, T., Lange, C., Reforgiato Recupero, D., Tordai, A. (eds.) SemWebEval 2014. CCIS, vol. 475, pp. 101–107. Springer, Heidelberg (2014)
Osborne, F., Scavo, G., Motta, E.: Identifying diachronic topic-based research communities by clustering shared research trajectories. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 114–129. Springer, Heidelberg (2014)
Perea, M.P.: Dynamic cartography with diachronic data: dialectal stratigraphy. Literary Linguist. Comput. 28(1), 147–156 (2013)
Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Yolum, I., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005)
Ratinaud, P.: Visualisation chronologique des analyses alceste: application twitter avec l’exemple du hashtag # mariagepourtous (2014)
Rosvall, M., Bergstrom, C.T.: Mapping change in large networks. PloS ONE 5(1), e8694 (2010)
Wang, X., Cheng, Q., Lu, W.: Analyzing evolution of research topics with neviewer: a new method based on dynamic co-word networks. Scientometrics 101(2), 1253–1271 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Lamirel, JC., Dugué, N., Cuxac, P. (2016). Performing and Visualizing Temporal Analysis of Large Text Data Issued for Open Sources: Past and Future Methods. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-34099-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)