Abstract
We are overwhelmed and overloaded with the data deluge brought by the digital age. Hierarchies are pervasive cognitive patterns that allow us to reorganize data and reduce the dimensionality of the information space to manageable levels (e.g., filesystems and navigational menus). In spite of their widespread adoption, such hierarchies can be improved to cope with the present needs of data sharing and reuse. First, we seldom use mechanisms to evaluate how well they partition the information space. Second, we build static and content-driven hierarchies instead of dynamic and context-driven (i.e., task-driven) ones. Third, we use ad hoc and implicit hierarchization criteria, whereas they should be explicit and shareable. This paper discusses the problems related to the construction of hierarchies, and presents a conceptual framework to turn them into reconfigurable and shareable artifacts. Moreover, it explores how dynamically reconfigurable hierarchies can better cope with the multi-faceted nature of content, illustrating these principles through a tool that validates our proposal.
Similar content being viewed by others
References
Acm CCS (2010) Acm’s computing classification system (ccs). http://www.acm.org/about/class/1998
Baker L, McCallum A (1998) Distributional clustering of words for text classification. In: ACM SIGIR’98: Proceedings of the 21st annual international conference on research and development in information retrieval. ACM, pp 96–103
Berman F (2008) Got data?: a guide to data preservation in the information age. Commun ACM 51:50–56
Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84
Bloehdorn S, Cimiano P, Hotho A (2005) Learning ontologies to improve text clustering and classification. In: Proceeding of the 29th annual conference of the German classification society (GfKl), Magdeburg, Germany, pp 334–341
Crescenzi V, Mecca G (2004) Automatic information extraction from large websites. J ACM (JACM) 51(5):731–779
Dekel O, Keshet J, Singer Y (2004) Large margin hierarchical classification. J Am Stat Assoc 104(487):1213
Dumais S, Chen H (2000) Hierarchical classification of web content. In: ACM SIGIR’00: proceedings of the 23rd annual Iinternational conference on research and development in information retrieval. ACM, pp 256–263
Fernandes A, Moura AMDC, Porto F (2003) An ontology-based approach for organizing, sharing, and querying knowledge objects on the web. In: DEXA’03: proceedings of the 14th international workshop on database and expert systems applications. IEEE, pp 604–609
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172
Gates S, Teiken W, Cheng K (2005) Taxonomies by the numbers: building high-performance taxonomies. In: proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 568–577
Hua Y, Jiang H, Zhu Y, Feng D, Tian L (2012) Semantic-aware metadata organization paradigm in next-generation file systems. IEEE Trans Parallel Distrib Syst 23(2):337–344
Irmak U, Kraft R (2010) A scalable machine-learning approach for semi-structured named entity recognition. In: Proceeings of the 19th international conference on World Wide Web. ACM, pp 461–470
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs, NJ, USA
Joachims T (1997) A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Machine learning international workshop, pp 143–151
Kiritchenko S, Matwin S, Nock R, Famili AF (2006) Learning and evaluation in the presence of class hierarchies : application to text categorization. In: Proceedings of the 19th Canadian conference on artificial intelligence
Kohonen T, Kaski S, Lagus K, Salojarvi J, Honkela J, Paatero V, Saarela A (2000) Self organization of a massive document collection. IEEE Trans Neural Netw 11(3):574–585
Koller D, Sahami M (1997) Hierarchically classifying documents using very few words. In: ICML’97: proceedings of the 14th international conference on machine learning. Morgan Kaufmann, pp 170–178
Köorner C, Benz D, Hotho A, Strohmaier M (2010) Stop thinking, start tagging: tag semantics emerge from collaborative verbosity. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 521–530
Laender AHF, Ribeiro-Neto BA, da Silva AS, Teixeira JS (2002) A brief survey of web data extraction tools. ACM Sigmod Rec 31(2):84–93
Liu J, Yu S, Le J (2005) Dynamic mining hierarchical topic from web news stream data using divisive-agglomerative clustering method. In: PAKDD’05: proceeding of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, Berlin, pp 826–831
McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: AAAI’98: workshop on learning for text categorization, vol 752, pp 41–48
Michalski RS (1980) Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. J Policy Anal Info Syst 4(3):219–244
Miller GA (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol Rev 63(2):81–97
Mishra N, Motwani R (2004) Introduction: special issue on theoretical advances in data clustering. Mach Learn 56(1–3):5–7
Pant G, Srinivasan P (2005) Learning to crawl : comparing classification schemes. ACM Trans Info Syst 23(4):430–462
Popitsch N, Schandl B (2010) Ad-hoc file sharing using linked data technologies. In: PSD’10: proceedings of the international workshop on personal semantic data
Qi X, Davison BD (2009) Web page classification. ACM Comput Surv 41(2):1–31
Řehůřek R., Sojka P (2010) Software framework for topic modelling with large corpora. In: LREC’10: proceedings of the workshop on new challenges for NLP frameworks. ELRA, pp 45– 50
Schütze H, Hull DA, Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: ACM SIGIR’95: proceedings of the 18th annual international conference on research and development in information retrieval. ACM, pp 229–237
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1–47
Senra RDA, Medeiros CB (2011) Organographs - multi-faceted hierarchical categorization of web documents. In: WEBIST’11: proceedings of the 7th international conference on web information systems and technologies, pp 583–588
Sneath P, Sokal R (1973) Numerical taxonomy. The principles and practice of numerical classification. W. H. Freeman and Company, San Francisco, pp xv + 573. ISBN 0-7167-0697-0
Turmo J, Ageno A, Català N (2006) Adaptive information extraction. ACM Comput Surv (CSUR) 38(2):4
Weigend A, Wiener E, Pedersen J (1999) Exploiting hierarchy in text categorization. Inf Retr 1(3):193–216
Xu J, Dichev C, Esterline A (2009) On the Effectiveness of collaborative tagging systems for describing resources. In: WRI’09: proceedings of the world congress on computer science and information engineering, vol 4. IEEE Computer Society, pp 467–471
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: ACM SIGIR’99: proceedings of the 22nd annual international conference on research and development in, information retrieval, pp 42–49
Acknowledgments
This work was supported by the Microsoft Research FAPESP Virtual Institute (NavScales project), the Center for Computational Engineering and Sciences—Fapesp/Cepid 2013/08293-7, CNPq (MuZOO Project and PRONEX-FAPESP), INCT in Web Science(CNPq 557.128/2009-9) and CAPES. We also thank all LIS members from IC-Unicamp for their comments and suggestions. Last but not least, we thank the JODS reviewers for their valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Senra, R.D.A., Medeiros, C.B. Evaluate, Reorganize and Share: An Approach to Dynamically Organize Digital Hierarchies. J Data Semant 3, 225–236 (2014). https://doi.org/10.1007/s13740-014-0035-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-014-0035-7