Data Model

Cao, Nan; Cui, Weiwei

doi:10.2991/978-94-6239-186-4_3

Nan Cao⁶ &
Weiwei Cui⁷

Part of the book series: Atlantis Briefs in Artificial Intelligence ((ABAI,volume 1))

2140 Accesses

Abstract

In text data, the semantic relationships among keywords, sentences, paragraphs, sections, chapters, and documents are usually implicit. A reader must go through the entire corpus to capture insights such as relationships among characters in a novel, event causalities in news reports, and evolution of topics in research articles. The difficulties usually arise from the unstructured nature of text data as well as the low information acquisition efficiency of reading these unstructured texts. Therefore, how to convert unstructured text data into a structured form to facilitate understanding and cognition becomes an important problem that has attracted considerable research interest. In this chapter, we introduce the data models that are frequently used in current text visualization techniques. We review low-level data structures such as bag of words, the structures at the syntactic level such as the syntax tree, as well as the network-oriented data structures at the semantic level. We introduce these data models (i.e., structures) together with detailed visualization examples that show how the structures are used to represent and summarize the unstructured text data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Bag of words model. http://en.wikipedia.org/wiki/Bag-of-words_model.

References

Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Article MathSciNet Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cao, N., Gotz, D., Sun, J., Lin, Y.R., Qu, H.: Solarmap: multifaceted visual analytics for topic exploration. In: IEEE International Conference on Data Mining, pp. 101–110. IEEE (2011)
Google Scholar
Cao, N., Sun, J., Lin, Y.R., Gotz, D., Liu, S., Qu, H.: Facetatlas: Multifaceted visualization for rich text corpora. IEEE Trans. Vis. Comput. Graph. 16(6), 1172–1181 (2010)
Article Google Scholar
Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z.J., Qu, H., Tong, X.: Textflow: towards better understanding of evolving topics in text. IEEE Trans. Vis. Comput. Graph. 17(12), 2412–2421 (2011)
Article Google Scholar
Griffiths, D.: Tenenbaum: hierarchical topic models and the nested chinese restaurant process. Adv. Neural Inf. Process. Syst. 16, 17 (2004)
Google Scholar
Havre, S., Hetzler, B., Nowell, L.: Themeriver: visualizing theme changes over time. In: IEEE Symposium on Information Visualization, 2000. InfoVis 2000, pp. 115–123. IEEE (2000)
Google Scholar
Liu, S., Zhou, M.X., Pan, S., Song, Y., Qian, W., Cai, W., Lian, X.: Tiara: interactive, topic-based visual text summarization and analysis. ACM Trans. Intell. Syst. Technol. (TIST) 3(2), 25 (2012)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvist. Investig. 30(1), 3–26 (2007)
Article Google Scholar
Sun, A., Lim, E.P.: Hierarchical text classification and evaluation. In: IEEE International Conference on Data Mining, pp. 521–528. IEEE (2001)
Google Scholar
Van Ham, F., Wattenberg, M., Viégas, F.B.: Mapping text with phrase nets. IEEE Trans. Vis. Comput. Graph. 15(6), 1169–1176 (2009)
Article Google Scholar
Viegas, F.B., Wattenberg, M., Feinberg, J.: Participatory visualization with wordle. IEEE Trans. Vis. Comput. Graph. 15(6), 1137–1144 (2009)
Article Google Scholar
Wattenberg, M., Viégas, F.B.: The word tree, an interactive visual concordance. IEEE Trans. Vis. Comput. Graph. 14(6), 1221–1228 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Yorktown Heights, New York, USA
Nan Cao
Microsoft Research Asia, Beijing, China
Weiwei Cui

Authors

Nan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Cui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nan Cao .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cao, N., Cui, W. (2016). Data Model. In: Introduction to Text Visualization. Atlantis Briefs in Artificial Intelligence, vol 1. Atlantis Press, Paris. https://doi.org/10.2991/978-94-6239-186-4_3

Download citation

DOI: https://doi.org/10.2991/978-94-6239-186-4_3
Published: 23 October 2016
Publisher Name: Atlantis Press, Paris
Print ISBN: 978-94-6239-185-7
Online ISBN: 978-94-6239-186-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics