Abstract
In text data, the semantic relationships among keywords, sentences, paragraphs, sections, chapters, and documents are usually implicit. A reader must go through the entire corpus to capture insights such as relationships among characters in a novel, event causalities in news reports, and evolution of topics in research articles. The difficulties usually arise from the unstructured nature of text data as well as the low information acquisition efficiency of reading these unstructured texts. Therefore, how to convert unstructured text data into a structured form to facilitate understanding and cognition becomes an important problem that has attracted considerable research interest. In this chapter, we introduce the data models that are frequently used in current text visualization techniques. We review low-level data structures such as bag of words, the structures at the syntactic level such as the syntax tree, as well as the network-oriented data structures at the semantic level. We introduce these data models (i.e., structures) together with detailed visualization examples that show how the structures are used to represent and summarize the unstructured text data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Bag of words model. http://en.wikipedia.org/wiki/Bag-of-words_model.
References
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Cao, N., Gotz, D., Sun, J., Lin, Y.R., Qu, H.: Solarmap: multifaceted visual analytics for topic exploration. In: IEEE International Conference on Data Mining, pp. 101–110. IEEE (2011)
Cao, N., Sun, J., Lin, Y.R., Gotz, D., Liu, S., Qu, H.: Facetatlas: Multifaceted visualization for rich text corpora. IEEE Trans. Vis. Comput. Graph. 16(6), 1172–1181 (2010)
Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z.J., Qu, H., Tong, X.: Textflow: towards better understanding of evolving topics in text. IEEE Trans. Vis. Comput. Graph. 17(12), 2412–2421 (2011)
Griffiths, D.: Tenenbaum: hierarchical topic models and the nested chinese restaurant process. Adv. Neural Inf. Process. Syst. 16, 17 (2004)
Havre, S., Hetzler, B., Nowell, L.: Themeriver: visualizing theme changes over time. In: IEEE Symposium on Information Visualization, 2000. InfoVis 2000, pp. 115–123. IEEE (2000)
Liu, S., Zhou, M.X., Pan, S., Song, Y., Qian, W., Cai, W., Lian, X.: Tiara: interactive, topic-based visual text summarization and analysis. ACM Trans. Intell. Syst. Technol. (TIST) 3(2), 25 (2012)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvist. Investig. 30(1), 3–26 (2007)
Sun, A., Lim, E.P.: Hierarchical text classification and evaluation. In: IEEE International Conference on Data Mining, pp. 521–528. IEEE (2001)
Van Ham, F., Wattenberg, M., Viégas, F.B.: Mapping text with phrase nets. IEEE Trans. Vis. Comput. Graph. 15(6), 1169–1176 (2009)
Viegas, F.B., Wattenberg, M., Feinberg, J.: Participatory visualization with wordle. IEEE Trans. Vis. Comput. Graph. 15(6), 1137–1144 (2009)
Wattenberg, M., Viégas, F.B.: The word tree, an interactive visual concordance. IEEE Trans. Vis. Comput. Graph. 14(6), 1221–1228 (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 Atlantis Press and the author(s)
About this chapter
Cite this chapter
Cao, N., Cui, W. (2016). Data Model. In: Introduction to Text Visualization. Atlantis Briefs in Artificial Intelligence, vol 1. Atlantis Press, Paris. https://doi.org/10.2991/978-94-6239-186-4_3
Download citation
DOI: https://doi.org/10.2991/978-94-6239-186-4_3
Published:
Publisher Name: Atlantis Press, Paris
Print ISBN: 978-94-6239-185-7
Online ISBN: 978-94-6239-186-4
eBook Packages: Computer ScienceComputer Science (R0)