Skip to main content

Data Model

  • Chapter
  • First Online:
Introduction to Text Visualization

Part of the book series: Atlantis Briefs in Artificial Intelligence ((ABAI,volume 1))

  • 2140 Accesses

Abstract

In text data, the semantic relationships among keywords, sentences, paragraphs, sections, chapters, and documents are usually implicit. A reader must go through the entire corpus to capture insights such as relationships among characters in a novel, event causalities in news reports, and evolution of topics in research articles. The difficulties usually arise from the unstructured nature of text data as well as the low information acquisition efficiency of reading these unstructured texts. Therefore, how to convert unstructured text data into a structured form to facilitate understanding and cognition becomes an important problem that has attracted considerable research interest. In this chapter, we introduce the data models that are frequently used in current text visualization techniques. We review low-level data structures such as bag of words, the structures at the syntactic level such as the syntax tree, as well as the network-oriented data structures at the semantic level. We introduce these data models (i.e., structures) together with detailed visualization examples that show how the structures are used to represent and summarize the unstructured text data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Bag of words model. http://en.wikipedia.org/wiki/Bag-of-words_model.

References

  1. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  MathSciNet  Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Cao, N., Gotz, D., Sun, J., Lin, Y.R., Qu, H.: Solarmap: multifaceted visual analytics for topic exploration. In: IEEE International Conference on Data Mining, pp. 101–110. IEEE (2011)

    Google Scholar 

  4. Cao, N., Sun, J., Lin, Y.R., Gotz, D., Liu, S., Qu, H.: Facetatlas: Multifaceted visualization for rich text corpora. IEEE Trans. Vis. Comput. Graph. 16(6), 1172–1181 (2010)

    Article  Google Scholar 

  5. Cui, W., Liu, S., Tan, L., Shi, C., Song, Y., Gao, Z.J., Qu, H., Tong, X.: Textflow: towards better understanding of evolving topics in text. IEEE Trans. Vis. Comput. Graph. 17(12), 2412–2421 (2011)

    Article  Google Scholar 

  6. Griffiths, D.: Tenenbaum: hierarchical topic models and the nested chinese restaurant process. Adv. Neural Inf. Process. Syst. 16, 17 (2004)

    Google Scholar 

  7. Havre, S., Hetzler, B., Nowell, L.: Themeriver: visualizing theme changes over time. In: IEEE Symposium on Information Visualization, 2000. InfoVis 2000, pp. 115–123. IEEE (2000)

    Google Scholar 

  8. Liu, S., Zhou, M.X., Pan, S., Song, Y., Qian, W., Cai, W., Lian, X.: Tiara: interactive, topic-based visual text summarization and analysis. ACM Trans. Intell. Syst. Technol. (TIST) 3(2), 25 (2012)

    Google Scholar 

  9. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)

    Google Scholar 

  10. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvist. Investig. 30(1), 3–26 (2007)

    Article  Google Scholar 

  11. Sun, A., Lim, E.P.: Hierarchical text classification and evaluation. In: IEEE International Conference on Data Mining, pp. 521–528. IEEE (2001)

    Google Scholar 

  12. Van Ham, F., Wattenberg, M., Viégas, F.B.: Mapping text with phrase nets. IEEE Trans. Vis. Comput. Graph. 15(6), 1169–1176 (2009)

    Article  Google Scholar 

  13. Viegas, F.B., Wattenberg, M., Feinberg, J.: Participatory visualization with wordle. IEEE Trans. Vis. Comput. Graph. 15(6), 1137–1144 (2009)

    Article  Google Scholar 

  14. Wattenberg, M., Viégas, F.B.: The word tree, an interactive visual concordance. IEEE Trans. Vis. Comput. Graph. 14(6), 1221–1228 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Cao .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Atlantis Press and the author(s)

About this chapter

Cite this chapter

Cao, N., Cui, W. (2016). Data Model. In: Introduction to Text Visualization. Atlantis Briefs in Artificial Intelligence, vol 1. Atlantis Press, Paris. https://doi.org/10.2991/978-94-6239-186-4_3

Download citation

  • DOI: https://doi.org/10.2991/978-94-6239-186-4_3

  • Published:

  • Publisher Name: Atlantis Press, Paris

  • Print ISBN: 978-94-6239-185-7

  • Online ISBN: 978-94-6239-186-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics