Abstract
This paper presents a new approach to information retrieval from non-structured attributes in databases, which involves the processing of text attributes. To make retrieval more effective, frequent text sequences are extracted and mathematically represented as intermediate forms which permit a clearer and more precise definition of operations on texts. These intermediate forms appear to users in the form of tag clouds to facilitate content identification, exploration, and querying. In this sense, tag cloud visualization is a simple, user-friendly visual interface to data. This paper proposes a theoretical model for the representation of frequent text sequences and their operations as well as a general procedure for generating tag clouds from text attributes in databases. The tag clouds thus obtained were compared with conventional tag clouds composed of single terms. Our study showed that automatically generated multi-term tag clouds provide better results than mono-term tag clouds.
Similar content being viewed by others
Notes
References
Agili A, Fabbri M, Panunzi A, Zini M (2008) Integration of a multilingual keyword extractor in a document management system. In: Proceedings of the 6th international language resources and evaluation, LREC
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceeding of the 20th international conference in very large data bases, VLDB, Citeseer, vol 1215, pp 487–499
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering, pp 3–14
Balachandran V, Balachandran V, Khemani D (2012) Interpretable and reconfigurable clustering of document datasets by deriving word-based rules. Knowl Inf Syst 32(3):475–503
Bar-Ilan J, Shoham S, Idan A, Miller Y, Shachak A (2008) Structured versus unstructured tagging: a case study. Online Inf Rev 32:635–647
Begelman G, Keller P, Smadja F (2006) Automated tag clustering: improving search and exploration in the tag space. In: Collaborative web tagging workshop at WWW2006. Citeseer
Campaña JR, Martín-Bautista MJ, Medina JM, Vila MA (2009) Semantic enrichment of database textual attributes. In: Flexible query answering systems, pp 488–499
Campaña JR, Medina JM, Vila MA (2011) Semantic processing of database textual attributes using wikipedia. In: Flexible query answering systems, pp 84–95
Don A, Zheleva E, Gregory M, Tarkan S, Auvil L, Clement T, Shneiderman B, Plaisant C (2007) Discovering interesting usage patterns in text collections: Integrating text mining with visualization. In: Proceedings of the 16th ACM conference on information and knowledge management, ACM, pp 213–222
Durao F, Dolog P, Leginus M, Lage R (2012) SimSpectrum: a similarity based spectral clustering approach to generate a tag cloud. In: Current trends in web, engineering, pp 145–154
García-Silva A, Corcho O, Alani H, Gómez-Pérez A (2012) Review of the state of the art: discovering and associating semantics to tags in folksonomies. Knowl Eng Rev 27(01):57–85
Grahl M, Hotho A, Stumme G (2007) Conceptual clustering of social bookmarking sites. In: Proceedings of I-KNOW, vol 7, pp 5–7
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, ACM, pp 1–12
Hassan-Montero Y, Herrero-Solana V (2006) Improving tag-clouds as visual information retrieval interfaces. In: International conference on multidisciplinary information sciences and technologies, Citeseer, pp 25–28
Hearst M, Rosner D (2008) Tag clouds: data analysis tool or social signaller? In: Hawaii international conference on system sciences (HICSS), IEEE computer society, pp 160–169
Helic D, Trattner C, Strohmaier M, Andrews K (2011) Are tag clouds useful for navigation? A network-theoretic analysis. Int J Soc Comput Cyber-Phys Syst 1(1):33–55
Heymann P, Garcia-Molina H (2006) Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical Report. University of Stanford, Infolab
Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining—a general survey and comparison. SIGKDD Explor Newsl 2(1):58–64
Howard H (2009) Knowledge discovery in databases. Online Notes. Computer Science. University of Regina
Hsieh W, Lai W, Chou S (2006) A collaborative tagging system for learning resources sharing. Current Dev Technol Assist Educ 2:1364–1368
Koutrika G, Zadeh Z, Garcia-Molina H (2009) Data Clouds: Summarizing keyword search results over structured data. In: Proceedings of the 12th international conference on extending database technology: advances in database technology, ACM, pp 391–402
Kuo B, Hentrich T, Good B, Wilkinson M (2007) Tag clouds for summarizing web search results. In: Proceedings of the 16th international conference on world wide web, ACM, pp 1204–1205
Leone S, Geel M, Müller C, Norrie M (2011) Exploiting tag clouds for database browsing and querying. In: Information systems, evolution, pp 15–28
Marín N, Martín-Bautista MJ, Prados M, Vila MA (2006) Enhancing short text retrieval in databases. In: Flexible query answering systems, pp 613–624
Marinho L, Hotho A, Jáschke R, Nanopoulos A, Rendle S, Schmidt-Thieme L, Stumme G, Symeonidis P (2012) Social tagging systems. In: Recommender systems for social tagging systems, pp 3–15
Martín-Bautista MJ, Prados M, Vila MA, Martínez-Folgoso S (2006) A knowledge representation for short texts based on frequent itemsets. In: Proceedings of the 11th conference of information processing and management of uncertainty (IPMU), Paris, pp 1065–1070
Martín-Bautista MJ, Vila MA, Martínez-Folgoso S (2008) A new semantic representation for short texts. In: Data warehousing and knowledge discovery, vol 5182, pp 347–356
Martínez-Folgoso S (2008) Una solución semántica al tratamiento de atributos textuales en un modelo relacional orientado a objetos: implementación en software libre. Ph.D. thesis, Department of Computer Sciencie and Artificial Intelligence. University of Granada, Spain
Milgram S, Jodelet D (1976) Psychological maps of paris. In: Environmental psychology, pp 104–124
Morik K, Kaspari A, Wurst M, Skirzynski M (2012) Multi-objective frequent termset clustering. Knowl Inf Syst 30(3):715–738
Panunzi A, Marco F, Massimo M (2006) Integrating methods and lrs for automatic keyword extraction from open domain texts. In: Proceedings of the 5th international language resources and evaluation (LREC), pp 1917–1920
Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. In: Proceedings of the 21th international conference on very large data bases, VLDB ’95. Morgan Kaufmann, pp 432–444
Schmitz P (2006) Inducing ontology from Flickr tags. In: Collaborative web tagging workshop at WWW2006, Citeseer, pp 210–214
Sinclair J, Cardew-Hall M (2008) The folksonomy tag cloud: when is it useful? J Inf Sci 34:15–30
Tao F, Murtagh F, Farid M (2003) Weighted association rule mining using weighted support and significance framework. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 661–666
Torres-Parejo U (2010) Tratamiento semántico de la información recuperada de internet con fines de consulta y exploración. Master Thesis. Department of Computer Sciencie and Artificial Intelligence. University of Granada, Spain
Torres-Parejo U, Campaña JR, Vila MA, Delgado M (2012) Text retrieval and visualization in databases using tag clouds. Commun Comput Inf Sci 297:390–399
Venetis P, Koutrika G, Garcia-Molina H (2011) On the selection of tags for tag clouds. In: Proceedings of the fourth ACM international conference on Web search and data mining, ACM, pp 835–844
Viégas FB, Wattenberg M (2008) TIMELINES: Tag clouds and the case for vernacular visualization. Interactions 15:49–52
Viégas FB, Wattenberg M, Feinberg J (2009) Participatory visualization with Wordle. IEEE Trans Vis Comput Graph 15:1137–1144
Watters D, Chicago I (2008) Meaningful clouds: towards a novel interface for document visualization. Online Notes. University of Chicago
Xexéo G, Morgado F, Fiuza P (2009) Automatically generated tag clouds. XXIV Simpósio Brasileiro de Banco de Datos
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: 3rd Intl. Conf. on Knowledge Discovery and Data Mining.
Acknowledgments
This work has been partially supported by the “Consejería de Economía, Innovación, y Ciencia de Andalucía” (Spain) under research projects P07-TIC-02786, P10-TIC-6109, and P11-TIC-7460.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Torres-Parejo, U., Campaña, J.R., Vila, M.A. et al. A theoretical model for the automatic generation of tag clouds. Knowl Inf Syst 40, 315–347 (2014). https://doi.org/10.1007/s10115-013-0651-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0651-9