Skip to main content
Log in

Exploratory search of academic publication and citation data using interactive tag cloud visualizations

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Acquiring an overview of an unfamiliar discipline and exploring relevant papers and journals is often a laborious task for researchers. In this paper we show how exploratory search can be supported on a large collection of academic papers to allow users to answer complex scientometric questions which traditional retrieval approaches do not support optimally. We use our ConceptCloud browser, which makes use of a combination of concept lattices and tag clouds, to visually present academic publication data (specifically, the ACM Digital Library) in a browsable format that facilitates exploratory search. We augment this dataset with semantic categories, obtained through automatic keyphrase extraction from papers’ titles and abstracts, in order to provide the user with uniform keyphrases of the underlying data collection. We use the citations and references of papers to provide additional mechanisms for exploring relevant research by presenting aggregated reference and citation data not only for a single paper but also across topics, authors and journals, which is novel in our approach. We conduct a user study to evaluate our approach in which we asked 34 participants, from different academic backgrounds with varying degrees of research experience, to answer a variety of scientometric questions using our ConceptCloud browser. Participants were able to answer complex scientometric questions using our ConceptCloud browser with a mean correctness of 73%, with the user’s prior research experience having no statistically significant effect on the results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. The introductory video of the ConceptCloud Browser for academic papers is available at https://www.youtube.com/watch?v=8zJ618yOWBI.

References

  • Abt, H. A. (2007). The future of single-authored papers. Scientometrics, 73(3), 353–358.

    Article  Google Scholar 

  • Accociation for Computing Machinery. (2015). ACM computing classification system ToC. http://www.acm.org/about/class. Accessed 18 August 2016.

  • ACM Digital Library. (2016). ACM digital library. http://dl.acm.org/. Accessed 18 August 2016.

  • Aguillo, I. F., Bar-Ilan, J., Levene, M., & Ortega, J. L. (2010). Comparing university rankings. Scientometrics, 85(1), 243–256.

    Article  Google Scholar 

  • Beck, F., Koch, S., & Weiskopf, D. (2016). Visual analysis and dissemination of scientific literature collections with survis. IEEE Transactions on Visualization and Computer Graphics, 22(1), 180–189.

    Article  Google Scholar 

  • Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the seventh international conference on World Wide Web. WWW ’07 (pp. 107–117). Amsterdam, The Netherlands: Elsevier Science Publishers B.V.

  • Carpineto, C., & Romano, G. (1996). A lattice conceptual clustering system and its application to browsing retrieval. Machine Learning, 24(2), 95–122.

    Google Scholar 

  • Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with Google’s PageRank algorithm. Journal of Informetrics, 1(1), 8–15.

    Article  Google Scholar 

  • Connor, J. (2012). Scholar updates: Making new connections. http://googlescholar.blogspot.co.za/2012/08/scholar-updates-making-new-connections.html. Accessed 18 August 2016.

  • Davey, B. A., & Priestley, H. A. (2002). Introduction to lattices and order (2nd ed.). Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • de Solla Price, D. J. (1965). Networks of scientific papers. Science, 149(3683), 510–515.

    Article  Google Scholar 

  • Dörk, M., Riche, N. H., Ramos, G., & Dumais, S. (2012). Pivotpaths: Strolling through faceted information spaces. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2709–2718.

    Article  Google Scholar 

  • Dunaiski, M., Visser, W., & Geldenhuys, J. (2016). Evaluating paper and author ranking algorithms using impact and contribution awards. Journal of Informetrics, 10(2), 392–407.

    Article  Google Scholar 

  • Dunne, C., Shneiderman, B., Gove, R., Klavans, J., & Dorr, B. (2012). Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization. Journal of the American Society for Information Science and Technology, 63(12), 2351–2369.

    Article  Google Scholar 

  • Eccles, C. (2002). The use of university rankings in the united kingdom. Higher Education in Europe, 27(4), 423–432.

    Article  Google Scholar 

  • Fischer, B. (2000). Specification-based browsing of software component libraries. Automated Software Engineering, 7(2), 179–200.

    Article  MathSciNet  Google Scholar 

  • Frank, E., Paynter, G. W., Witten, I. H., Gutwin, C., & Nevill-Manning, C. G. (1999). Domain-specific keyphrase extraction. In Proceedings of the sixteenth international joint conference on artificial intelligence. IJCAI ’99 (pp. 668–673). San Francisco, CA: Morgan Kaufmann Publishers Inc.

  • Ganter, B. (2010). Two basic algorithms in concept analysis. In International conference on formal concept analysis (pp. 312–340). Berlin: Springer.

  • Ganter, B., & Wille, R. (1999). Formal concept analysis—Mathematical foundations. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Garfield, E. (1979). Is citation analysis a legitimate evaluation tool? Scientometrics, 1(4), 359–375.

    Article  Google Scholar 

  • Gollapalli, S. D., & Caragea, C. (2014). Extracting keyphrases from research papers using citation networks. In AAAI (pp. 1629–1635).

  • Greene, G. J., & Fischer, B. (2014). Conceptcloud: A tagcloud browser for software archives. In Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. FSE 2014 (pp. 759–762). New York, NY: ACM.

  • Greene, G. J., & Fischer, B. (2015). Interactive tag cloud visualization of software version control repositories. In 2015 IEEE 3rd working conference on software visualization (VISSOFT). VISSOFT 2015 (pp. 56–65). IEEE.

  • Greene, G. J., & Fischer, B. (2016). Cvexplorer: Identifying candidate developers by mining and exploring their open source contributions. In Proceedings of the 31st IEEE/ACM international conference on automated software engineering. ASE 2016 (pp. 804–809). New York, NY: ACM.

  • Grineva, M., Grinev, M., & Lizorkin, D. (2009). Extracting key terms from noisy and multi-theme documents. In Proceedings of the 18th international conference on World Wide Web. WWW ’09 (pp. 661–670). New York, NY: ACM.

  • Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, 102(46), 16569–16572.

    Article  Google Scholar 

  • Hoey, S. E. (2015). New research features on Mendeley.com! https://blog.mendeley.com/2015/11/03/new-research-features-on-mendeley-com/. Accessed 18 August 2016.

  • Huang, C., Tian, Y., Zhou, Z., Ling, C. X., & Huang, T. (2006). Keyphrase extraction using semantic networks structure analysis. In Proceedings of the sixth international conference on data mining. ICDM ’06 (pp. 275–284). Washington, DC: IEEE Computer Society.

  • Hulth, A. (2003). Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on empirical methods in natural language processing. EMNLP ’03 (pp. 216–223). Stroudsburg, PA: Association for Computational Linguistics.

  • Jiang, X., Hu, Y., & Li, H. (2009). A ranking approach to keyphrase extraction. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval. SIGIR ’09 (pp. 756–757). New York, NY: ACM.

  • Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st annual meeting on association for computational linguistics (Vol. 1, pp. 423–430). ACL ’03. Stroudsburg, PA: Association for Computational Linguistics.

  • Li, Y., Bandar, Z. A., & McLean, D. (2003). An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4), 871–882.

    Article  Google Scholar 

  • Li, Y., McLean, D., Bandar, Z. A., O’Shea, J. D., & Crockett, K. (2006). Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering, 18(8), 1138–1150.

    Article  Google Scholar 

  • Lindig, C. (1995). Concept-based component retrieval. In: Working Notes of the IJCAI-95 Workshop: Formal Approaches to the Reuse of Plans, Proofs, and Programs. pp. 21–25.

  • Lindig, C. (2000). Fast concept analysis. In Working with conceptual structures-contributions to ICCS (pp. 152–161).

  • Liu, Z., Li, P., Zheng, Y., & Sun, M. (2009). Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the 2009 conference on empirical methods in natural language processing. EMNLP ’09 (Vol. 1, pp. 257–266). Stroudsburg, PA: Association for Computational Linguistics.

  • Liu, P., Wu, Q., Mu, X., Yu, K., & Guo, Y. (2015). Detecting the intellectual structure of library and information science based on formal concept analysis. Scientometrics, 104(3), 737–762.

    Article  Google Scholar 

  • Lohmann, S., Ziegler, J., & Tetzlaff, L. (2009). Comparison of tag cloud layouts: Task-related performance and visual exploration. In INTERACT (1) (pp. 392–404).

  • Marchionini, G. (2006). Exploratory search: From finding to understanding. Communications of the ACM, 49(4), 41–46.

    Article  Google Scholar 

  • Medelyan, O., Frank, E., & Witten, I. H. (2009). Human-competitive tagging using automatic keyphrase extraction. In Proceedings of the 2009 conference on empirical methods in natural language processing. EMNLP ’09 (Vol. 3, pp. 1318–1327). Stroudsburg, PA: Association for Computational Linguistics.

  • Medlar, A., Ilves, K., Wang, P., Buntine, W., & Glowacka, D. (2016). Pulp: A system for exploratory search of scientific literature. In Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval. SIGIR ’16 (pp. 1133–1136). New York, NY: ACM.

  • Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order into texts. Proceedings of EMNLP, 4(4), 404–411.

    Google Scholar 

  • Miller, G. A. (1995). Wordnet: A lexical database for english. Communications of the ACM, 38(11), 39–41.

    Article  Google Scholar 

  • Nguyen, T. D., & Kan, M.-y. (2007). Keyphrase Extraction in Scientific Publications. In: Proceedings of the 10th International Conference on Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers. Springer-Verlag, pp. 317–326.

  • Osborne, F., Motta, E., & Mulholland, P. (2013). Exploring scholarly data with reexplore. In The semantic web–ISWC 2013 (pp. 460–477). Berlin: Springer.

  • Parolo, P. D. B., Pan, R. K., Ghosh, R., Huberman, B. A., Kaski, K., & Fortunato, S. (2015). Attention decay in science. Journal of Informetrics, 9(4), 734–745.

    Article  Google Scholar 

  • Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.

    Article  Google Scholar 

  • Rosvall, M., & Bergstrom, C. T. (2010). Mapping change in large networks. PloS One, 5(1), e8694.

    Article  Google Scholar 

  • Schrammel, J., Leitner, M., & Tscheligi, M. (2009). Semantically structured tag clouds: An empirical evaluation of clustered presentation approaches. In Proceedings of the SIGCHI conference on human factors in computing systems. CHI ’09 (pp. 2037–2040). New York, NY: ACM.

  • Van Dogen, S. M. (2000). Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht.

  • Wallace, M. L., Larivière, V., & Gingras, Y. (2012). A small world of citations? The influence of collaboration networks on citation practices. PLoS One, 7(3), e33339.

    Article  Google Scholar 

  • Wan, X., & Xiao, J. (2008). Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the 23rd national conference on artificial intelligence . AAAI’08 (Vol. 2, pp. 855–860). London: AAAI Press.

  • West, J. D., Bergstrom, T. C., & Bergstrom, C. T. (2010). The eigenfactor metricstm: A network approach to assessing scholarly journals. College and Research Libraries, 71(3), 236–244.

    Article  Google Scholar 

  • West, J. D., Jensen, M. C., Dandrea, R. J., Gordon, G. J., & Bergstrom, C. T. (2013). Author-level eigenfactor metrics: Evaluating the influence of authors, institutions, and countries within the social science research network community. Journal of the American Society for Information Science and Technology, 64(4), 787–801.

    Article  Google Scholar 

  • White, R. W., & Roth, R. A. (2009). Exploratory search: Beyond the query-response paradigm. Synthesis Lectures on Information Concepts, Retrieval, and Services, 1(1), 1–98.

  • Wille, R. (1982). Restructuring lattice theory: An approach based on hierarchies of concepts. In Ordered sets. Reidel (pp. 445–470).

  • Witten, I. H., Paynter, G. W., Frank, E., Gutwin, C., & Nevill-Manning, C. G. (1999). KEA: Practical automatic keyphrase extraction. In Proceedings of the fourth ACM conference on digital libraries (pp. 254–255). ACM.

  • You, W., Fontaine, D., & Barthes, J.-P. (2009). Automatic keyphrase extraction with a refined candidate set. In IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technologies (Vol. 1, pp. 576–579). IET.

  • Zhang, J., Yu, Q., Zheng, F., Long, C., Lu, Z., & Duan, Z. (2016). Comparing keywords plus of wos and author keywords: A case study of patient adherence research. Journal of the Association for Information Science and Technology, 67(4), 967–972.

    Article  Google Scholar 

Download references

Acknowledgements

This research is funded in part by a STIAS Doctoral Scholarship, CAIR, NRF Grant 93582 and the MIH Media Lab.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcel Dunaiski.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dunaiski, M., Greene, G.J. & Fischer, B. Exploratory search of academic publication and citation data using interactive tag cloud visualizations. Scientometrics 110, 1539–1571 (2017). https://doi.org/10.1007/s11192-016-2236-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-016-2236-3

Keywords

Navigation