, Volume 116, Issue 3, pp 1887–1944 | Cite as

LitStoryTeller+: an interactive system for multi-level scientific paper visual storytelling with a supportive text mining toolbox

  • Qing Ping
  • Chaomei Chen


The continuing growth of scientific publications has posed a double-challenge to researchers, to not only grasp the overall research trends in a scientific domain, but also get down to research details embedded in a collection of core papers. Existing work on science mapping provides multiple tools to visualize research trends in domain on macro-level, and work from the digital humanities have proposed text visualization of documents, topics, sentences, and words on micro-level. However, existing micro-level text visualizations are not tailored for scientific paper corpus, and cannot support meso-level scientific reading, which aligns a set of core papers based on their research progress, before drilling down to individual papers. To bridge this gap, the present paper proposes LitStoryTeller+, an interactive system under a unified framework that can support both meso-level and micro-level scientific paper visual storytelling. More specifically, we use entities (concepts and terminologies) as basic visual elements, and visualize entity storylines across papers and within a paper borrowing metaphors from screen play. To identify entities and entity communities, named entity recognition and community detection are performed. We also employ a variety of text mining methods such as extractive text summarization and comparative sentence classification to provide rich textual information supplementary to our visualizations. We also propose a top-down story-reading strategy that best takes advantage of our system. Two comprehensive hypothetical walkthroughs to explore documents from the computer science domain and history domain with our system demonstrate the effectiveness of our story-reading strategy and the usefulness of LitStoryTeller+.


Visual storytelling Narrative storylines Close-reading Scientific paper visualization Extractive summarization Comparative sentence classification 



This study is supported by the project “A Visual Analytic Observatory of Scientific Knowledge” funded by National Science Foundation (NSF 1633286).


  1. Abdul-Rahman, A., Lein, J., Coles, K., Maguire, E., Meyer, M., Wynne, M., et al. (2013). Rule-based visual mappingswith a case study on poetry visualization. Paper presented at the Computer Graphics Forum.Google Scholar
  2. Alexander, E., Kohlmann, J., Valenza, R., Witmore, M., & Gleicher, M. (2014). Serendip: Topic model-driven visual exploration of text corpora. Paper presented at the visual analytics science and technology (VAST), 2014 IEEE conference on.Google Scholar
  3. Bikel, D. M., Miller, S., Schwartz, R., & Weischedel, R. (1997). Nymble: a high-performance learning name-finder. Paper presented at the proceedings of the fifth conference on applied natural language processing.Google Scholar
  4. Blei, D. M., & Lafferty, J. D. (2005). Correlated topic models. Paper presented at the Proceedings of the 18th International Conference on Neural Information Processing Systems.Google Scholar
  5. Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.CrossRefGoogle Scholar
  6. Bornmann, L., & Mutz, R. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. Journal of the Association for Information Science and Technology, 66(11), 2215–2222.CrossRefGoogle Scholar
  7. Borthwick, A., & Grishman, R. (1999). A maximum entropy approach to named entity recognition. Citeseer.Google Scholar
  8. Bostock, M. (2016). Force-Directed Graph. Accessed 8 June 2018.
  9. Bostock, M. (2017). Narrative Charts. Retrieved from Accessed 8 June 2018.
  10. Callon, M., Courtial, J.-P., & Laville, F. (1991). Co-word analysis as a tool for describing the network of interactions between basic and technological research: The case of polymer chemsitry. Scientometrics, 22(1), 155–205.CrossRefGoogle Scholar
  11. Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. Paper presented at the proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval.Google Scholar
  12. Chavalarias, D., & Cointet, J.-P. (2013). Phylomemetic patterns in science evolution—The rise and fall of scientific fields. PLoS ONE, 8(2), e54847.CrossRefGoogle Scholar
  13. Chen, C. (2004). Searching for intellectual turning points: Progressive knowledge domain visualization. Proceedings of the National Academy of Sciences, 101(suppl 1), 5303–5310.CrossRefGoogle Scholar
  14. Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the Association for Information Science and Technology, 57(3), 359–377.Google Scholar
  15. Clement, T., Plaisant, C., & Vuillemot, R. (2009). The Story of One: Humanity scholarship with visualization and text analysis. Relation, 10(1.43), 8485.Google Scholar
  16. Correll, M., Witmore, M., & Gleicher, M. (2011). Exploring collections of tagged text for literary scholarship. Paper presented at the Computer Graphics Forum.Google Scholar
  17. Don, A., Zheleva, E., Gregory, M., Tarkan, S., Auvil, L., Clement, T., et al. (2007). Discovering interesting usage patterns in text collections: Integrating text mining with visualization. Paper presented at the Proceedings of the sixteenth ACM conference on Conference on information and knowledge management.Google Scholar
  18. Dunne, C., Shneiderman, B., Gove, R., Klavans, J., & Dorr, B. (2012). Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization. Journal of the Association for Information Science and Technology, 63(12), 2351–2369.Google Scholar
  19. Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.CrossRefGoogle Scholar
  20. Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821–7826.MathSciNetCrossRefzbMATHGoogle Scholar
  21. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228–5235.CrossRefGoogle Scholar
  22. Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M. (2001). Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. Paper presented at the proceedings of the 17th international conference on data engineering.Google Scholar
  23. Hofmann, T. (1999). Probabilistic latent semantic analysis. Paper presented at the Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence.Google Scholar
  24. Inselberg, A., & Dimsdale, B. (1987). Parallel coordinates for visualizing multi-dimensional geometry. In Computer graphics 1987 (pp. 25–44). Springer.Google Scholar
  25. Jindal, N., & Liu, B. (2006a). Identifying comparative sentences in text documents. Paper presented at the proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval.Google Scholar
  26. Jindal, N., & Liu, B. (2006b). Mining comparative sentences and relations. Paper presented at the AAAI.Google Scholar
  27. Kirschner, P. A., Buckingham-Shum, S. J., & Carr, C. S. (2012). Visualizing argumentation: Software tools for collaborative and educational sense-making. London: Springer.Google Scholar
  28. Kobourov, S. G. (2012). Spring embedders and force directed graph drawing algorithms. arXiv preprint arXiv:1201.3011.
  29. Koch, S., John, M., Wörner, M., Müller, A., & Ertl, T. (2014). VarifocalReader—in-depth visual analysis of large text documents. IEEE Transactions on Visualization and Computer Graphics, 20(12), 1723–1732.CrossRefGoogle Scholar
  30. Lin, H., & Bilmes, J. (2011). A class of submodular functions for document summarization. Paper presented at the proceedings of the 49th annual meeting of the association for computational linguistics: Human Language Technologies-Volume 1.Google Scholar
  31. Liu, S., Wu, Y., Wei, E., Liu, M., & Liu, Y. (2013). Storyflow: Tracking the evolution of stories. IEEE Transactions on Visualization and Computer Graphics, 19(12), 2436–2445.CrossRefGoogle Scholar
  32. McCallum, A., & Li, W. (2003). Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. Paper presented at the Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4.Google Scholar
  33. McCurdy, N., Lein, J., Coles, K., & Meyer, M. (2016). Poemage: Visualizing the sonic topology of a poem. IEEE Transactions on Visualization and Computer Graphics, 22(1), 439–448.CrossRefGoogle Scholar
  34. Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. Paper presented at the Proceedings of the 2004 conference on empirical methods in natural language processing.Google Scholar
  35. Nemhauser, G. L., Wolsey, L. A., & Fisher, M. L. (1978). An analysis of approximations for maximizing submodular set functions—I. Mathematical Programming, 14(1), 265–294.MathSciNetCrossRefzbMATHGoogle Scholar
  36. Nenkova, A., & Vanderwende, L. (2005). The impact of frequency on summarization. Microsoft Research, Redmond, Washington. Technical Report MSR-TR-2005, 101.Google Scholar
  37. Ping, Q., & Chen, C. (2017). LitStoryTeller: An interactive system for visual exploration of scientific papers leveraging named entities and comparative sentences. In Proceedings of ISSI 2017–The 16th international conference on scientometrics and informetrics, Wuhan University, China, 1118-1130.Google Scholar
  38. Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., & Welling, M. (2008). Fast collapsed gibbs sampling for latent Dirichlet allocation. Paper presented at the proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining.Google Scholar
  39. Schneider, N., Hwa, R., Gianfortoni, P., Das, D., Heilman, M., Black, A., et al. (2010). Visualizing topical quotations over time to understand news discourse. Technical Report CMU-LTI-01-103, CMU, 2010.Google Scholar
  40. Tanahashi, Y., & Ma, K.-L. (2012). Design considerations for optimizing storyline visualizations. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2679–2688.CrossRefGoogle Scholar
  41. Van Eck, N. J., & Waltman, L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523–538.CrossRefGoogle Scholar
  42. Viegas, F. B., Wattenberg, M., & Feinberg, J. (2009). Participatory visualization with wordle. IEEE Transactions on Visualization and Computer Graphics, 15(6), 1190–1197.CrossRefGoogle Scholar
  43. Wilhelm, T., Burghardt, M., & Wolff, C. (2013). “To See or Not to See”—An interactive tool for the visualization and analysis of shakespeare plays. In R. Franken Wendelstorf, E. Lindinger, & J. Sieck (Eds.), Kultur und informatik: Visual worlds & interactive spaces (pp. 175–185). Glückstadt: Verlag Werner Hülsbusch.Google Scholar
  44. Zhu, X., Goldberg, A., Van Gael, J., & Andrzejewski, D. (2007). Improving diversity in ranking using absorbing random walks. Paper presented at the Human Language Technologies 2007: The conference of the north american chapter of the association for computational linguistics; Proceedings of the main conference.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2018

Authors and Affiliations

  1. 1.College of Computing and InformaticsDrexel UniversityPhiladelphiaUSA

Personalised recommendations