Skip to main content
Log in

A theoretical model for the automatic generation of tag clouds

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

This paper presents a new approach to information retrieval from non-structured attributes in databases, which involves the processing of text attributes. To make retrieval more effective, frequent text sequences are extracted and mathematically represented as intermediate forms which permit a clearer and more precise definition of operations on texts. These intermediate forms appear to users in the form of tag clouds to facilitate content identification, exploration, and querying. In this sense, tag cloud visualization is a simple, user-friendly visual interface to data. This paper proposes a theoretical model for the representation of frequent text sequences and their operations as well as a general procedure for generating tag clouds from text attributes in databases. The tag clouds thus obtained were compared with conventional tag clouds composed of single terms. Our study showed that automatically generated multi-term tag clouds provide better results than mono-term tag clouds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.flickr.com/.

  2. http://www.wordle.net/.

  3. http://semanticcloud.sandra-siegel.de/.

  4. http://www.smashingmagazine.com/.

  5. http://tagcrowd.com/.

  6. http://www.tagcloud-generator.com/.

  7. http://onlinelibrary.wiley.com/.

  8. http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1939-0122.

  9. http://nlp.stanford.edu/software/tagger.shtml.

References

  1. Agili A, Fabbri M, Panunzi A, Zini M (2008) Integration of a multilingual keyword extractor in a document management system. In: Proceedings of the 6th international language resources and evaluation, LREC

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceeding of the 20th international conference in very large data bases, VLDB, Citeseer, vol 1215, pp 487–499

  3. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, pp 3–14

  4. Balachandran V, Balachandran V, Khemani D (2012) Interpretable and reconfigurable clustering of document datasets by deriving word-based rules. Knowl Inf Syst 32(3):475–503

    Article  Google Scholar 

  5. Bar-Ilan J, Shoham S, Idan A, Miller Y, Shachak A (2008) Structured versus unstructured tagging: a case study. Online Inf Rev 32:635–647

    Article  Google Scholar 

  6. Begelman G, Keller P, Smadja F (2006) Automated tag clustering: improving search and exploration in the tag space. In: Collaborative web tagging workshop at WWW2006. Citeseer

  7. Campaña JR, Martín-Bautista MJ, Medina JM, Vila MA (2009) Semantic enrichment of database textual attributes. In: Flexible query answering systems, pp 488–499

  8. Campaña JR, Medina JM, Vila MA (2011) Semantic processing of database textual attributes using wikipedia. In: Flexible query answering systems, pp 84–95

  9. Don A, Zheleva E, Gregory M, Tarkan S, Auvil L, Clement T, Shneiderman B, Plaisant C (2007) Discovering interesting usage patterns in text collections: Integrating text mining with visualization. In: Proceedings of the 16th ACM conference on information and knowledge management, ACM, pp 213–222

  10. Durao F, Dolog P, Leginus M, Lage R (2012) SimSpectrum: a similarity based spectral clustering approach to generate a tag cloud. In: Current trends in web, engineering, pp 145–154

  11. García-Silva A, Corcho O, Alani H, Gómez-Pérez A (2012) Review of the state of the art: discovering and associating semantics to tags in folksonomies. Knowl Eng Rev 27(01):57–85

    Article  Google Scholar 

  12. Grahl M, Hotho A, Stumme G (2007) Conceptual clustering of social bookmarking sites. In: Proceedings of I-KNOW, vol 7, pp 5–7

  13. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, ACM, pp 1–12

  14. Hassan-Montero Y, Herrero-Solana V (2006) Improving tag-clouds as visual information retrieval interfaces. In: International conference on multidisciplinary information sciences and technologies, Citeseer, pp 25–28

  15. Hearst M, Rosner D (2008) Tag clouds: data analysis tool or social signaller? In: Hawaii international conference on system sciences (HICSS), IEEE computer society, pp 160–169

  16. Helic D, Trattner C, Strohmaier M, Andrews K (2011) Are tag clouds useful for navigation? A network-theoretic analysis. Int J Soc Comput Cyber-Phys Syst 1(1):33–55

    Article  Google Scholar 

  17. Heymann P, Garcia-Molina H (2006) Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical Report. University of Stanford, Infolab

  18. Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. SIGKDD Explor Newsl 2(1):58–64

    Article  Google Scholar 

  19. Howard H (2009) Knowledge discovery in databases. Online Notes. Computer Science. University of Regina

  20. Hsieh W, Lai W, Chou S (2006) A collaborative tagging system for learning resources sharing. Current Dev Technol Assist Educ 2:1364–1368

    Google Scholar 

  21. Koutrika G, Zadeh Z, Garcia-Molina H (2009) Data Clouds: Summarizing keyword search results over structured data. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, ACM, pp 391–402

  22. Kuo B, Hentrich T, Good B, Wilkinson M (2007) Tag clouds for summarizing web search results. In: Proceedings of the 16th international conference on world wide web, ACM, pp 1204–1205

  23. Leone S, Geel M, Müller C, Norrie M (2011) Exploiting tag clouds for database browsing and querying. In: Information systems, evolution, pp 15–28

  24. Marín N, Martín-Bautista MJ, Prados M, Vila MA (2006) Enhancing short text retrieval in databases. In: Flexible query answering systems, pp 613–624

  25. Marinho L, Hotho A, Jáschke R, Nanopoulos A, Rendle S, Schmidt-Thieme L, Stumme G, Symeonidis P (2012) Social tagging systems. In: Recommender systems for social tagging systems, pp 3–15

  26. Martín-Bautista MJ, Prados M, Vila MA, Martínez-Folgoso S (2006) A knowledge representation for short texts based on frequent itemsets. In: Proceedings of the 11th conference of information processing and management of uncertainty (IPMU), Paris, pp 1065–1070

  27. Martín-Bautista MJ, Vila MA, Martínez-Folgoso S (2008) A new semantic representation for short texts. In: Data warehousing and knowledge discovery, vol 5182, pp 347–356

  28. Martínez-Folgoso S (2008) Una solución semántica al tratamiento de atributos textuales en un modelo relacional orientado a objetos: implementación en software libre. Ph.D. thesis, Department of Computer Sciencie and Artificial Intelligence. University of Granada, Spain

  29. Milgram S, Jodelet D (1976) Psychological maps of paris. In: Environmental psychology, pp 104–124

  30. Morik K, Kaspari A, Wurst M, Skirzynski M (2012) Multi-objective frequent termset clustering. Knowl Inf Syst 30(3):715–738

    Article  Google Scholar 

  31. Panunzi A, Marco F, Massimo M (2006) Integrating methods and lrs for automatic keyword extraction from open domain texts. In: Proceedings of the 5th international language resources and evaluation (LREC), pp 1917–1920

  32. Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21th international conference on very large data bases, VLDB ’95. Morgan Kaufmann, pp 432–444

  33. Schmitz P (2006) Inducing ontology from Flickr tags. In: Collaborative web tagging workshop at WWW2006, Citeseer, pp 210–214

  34. Sinclair J, Cardew-Hall M (2008) The folksonomy tag cloud: when is it useful? J Inf Sci 34:15–30

    Article  Google Scholar 

  35. Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 661–666

  36. Torres-Parejo U (2010) Tratamiento semántico de la información recuperada de internet con fines de consulta y exploración. Master Thesis. Department of Computer Sciencie and Artificial Intelligence. University of Granada, Spain

  37. Torres-Parejo U, Campaña JR, Vila MA, Delgado M (2012) Text retrieval and visualization in databases using tag clouds. Commun Comput Inf Sci 297:390–399

    Article  Google Scholar 

  38. Venetis P, Koutrika G, Garcia-Molina H (2011) On the selection of tags for tag clouds. In: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, pp 835–844

  39. Viégas FB, Wattenberg M (2008) TIMELINES: Tag clouds and the case for vernacular visualization. Interactions 15:49–52

    Article  Google Scholar 

  40. Viégas FB, Wattenberg M, Feinberg J (2009) Participatory visualization with Wordle. IEEE Trans Vis Comput Graph 15:1137–1144

    Article  Google Scholar 

  41. Watters D, Chicago I (2008) Meaningful clouds: towards a novel interface for document visualization. Online Notes. University of Chicago

  42. Xexéo G, Morgado F, Fiuza P (2009) Automatically generated tag clouds. XXIV Simpósio Brasileiro de Banco de Datos

  43. Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: 3rd Intl. Conf. on Knowledge Discovery and Data Mining.

Download references

Acknowledgments

This work has been partially supported by the “Consejería de Economía, Innovación, y Ciencia de Andalucía” (Spain) under research projects P07-TIC-02786, P10-TIC-6109, and P11-TIC-7460.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ursula Torres-Parejo.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Torres-Parejo, U., Campaña, J.R., Vila, M.A. et al. A theoretical model for the automatic generation of tag clouds. Knowl Inf Syst 40, 315–347 (2014). https://doi.org/10.1007/s10115-013-0651-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-013-0651-9

Keywords

Navigation