Skip to main content
Log in

Evaluate, Reorganize and Share: An Approach to Dynamically Organize Digital Hierarchies

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

We are overwhelmed and overloaded with the data deluge brought by the digital age. Hierarchies are pervasive cognitive patterns that allow us to reorganize data and reduce the dimensionality of the information space to manageable levels (e.g., filesystems and navigational menus). In spite of their widespread adoption, such hierarchies can be improved to cope with the present needs of data sharing and reuse. First, we seldom use mechanisms to evaluate how well they partition the information space. Second, we build static and content-driven hierarchies instead of dynamic and context-driven (i.e., task-driven) ones. Third, we use ad hoc and implicit hierarchization criteria, whereas they should be explicit and shareable. This paper discusses the problems related to the construction of hierarchies, and presents a conceptual framework to turn them into reconfigurable and shareable artifacts. Moreover, it explores how dynamically reconfigurable hierarchies can better cope with the multi-faceted nature of content, illustrating these principles through a tool that validates our proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Acm CCS (2010) Acm’s computing classification system (ccs). http://www.acm.org/about/class/1998

  2. Baker L, McCallum A (1998) Distributional clustering of words for text classification. In: ACM SIGIR’98: Proceedings of the 21st annual international conference on research and development in information retrieval. ACM, pp 96–103

  3. Berman F (2008) Got data?: a guide to data preservation in the information age. Commun ACM 51:50–56

    Article  Google Scholar 

  4. Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  MathSciNet  Google Scholar 

  5. Bloehdorn S, Cimiano P, Hotho A (2005) Learning ontologies to improve text clustering and classification. In: Proceeding of the 29th annual conference of the German classification society (GfKl), Magdeburg, Germany, pp 334–341

  6. Crescenzi V, Mecca G (2004) Automatic information extraction from large websites. J ACM (JACM) 51(5):731–779

    Article  MATH  MathSciNet  Google Scholar 

  7. Dekel O, Keshet J, Singer Y (2004) Large margin hierarchical classification. J Am Stat Assoc 104(487):1213

    Google Scholar 

  8. Dumais S, Chen H (2000) Hierarchical classification of web content. In: ACM SIGIR’00: proceedings of the 23rd annual Iinternational conference on research and development in information retrieval. ACM, pp 256–263

  9. Fernandes A, Moura AMDC, Porto F (2003) An ontology-based approach for organizing, sharing, and querying knowledge objects on the web. In: DEXA’03: proceedings of the 14th international workshop on database and expert systems applications. IEEE, pp 604–609

  10. Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172

    Google Scholar 

  11. Gates S, Teiken W, Cheng K (2005) Taxonomies by the numbers: building high-performance taxonomies. In: proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 568–577

  12. Hua Y, Jiang H, Zhu Y, Feng D, Tian L (2012) Semantic-aware metadata organization paradigm in next-generation file systems. IEEE Trans Parallel Distrib Syst 23(2):337–344

    Article  Google Scholar 

  13. Irmak U, Kraft R (2010) A scalable machine-learning approach for semi-structured named entity recognition. In: Proceeings of the 19th international conference on World Wide Web. ACM, pp 461–470

  14. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs, NJ, USA

    MATH  Google Scholar 

  15. Joachims T (1997) A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Machine learning international workshop, pp 143–151

  16. Kiritchenko S, Matwin S, Nock R, Famili AF (2006) Learning and evaluation in the presence of class hierarchies : application to text categorization. In: Proceedings of the 19th Canadian conference on artificial intelligence

  17. Kohonen T, Kaski S, Lagus K, Salojarvi J, Honkela J, Paatero V, Saarela A (2000) Self organization of a massive document collection. IEEE Trans Neural Netw 11(3):574–585

    Article  Google Scholar 

  18. Koller D, Sahami M (1997) Hierarchically classifying documents using very few words. In: ICML’97: proceedings of the 14th international conference on machine learning. Morgan Kaufmann, pp 170–178

  19. Köorner C, Benz D, Hotho A, Strohmaier M (2010) Stop thinking, start tagging: tag semantics emerge from collaborative verbosity. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 521–530

  20. Laender AHF, Ribeiro-Neto BA, da Silva AS, Teixeira JS (2002) A brief survey of web data extraction tools. ACM Sigmod Rec 31(2):84–93

    Article  Google Scholar 

  21. Liu J, Yu S, Le J (2005) Dynamic mining hierarchical topic from web news stream data using divisive-agglomerative clustering method. In: PAKDD’05: proceeding of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, Berlin, pp 826–831

  22. McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: AAAI’98: workshop on learning for text categorization, vol 752, pp 41–48

  23. Michalski RS (1980) Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. J Policy Anal Info Syst 4(3):219–244

    MathSciNet  Google Scholar 

  24. Miller GA (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol Rev 63(2):81–97

    Article  Google Scholar 

  25. Mishra N, Motwani R (2004) Introduction: special issue on theoretical advances in data clustering. Mach Learn 56(1–3):5–7

    Article  Google Scholar 

  26. Pant G, Srinivasan P (2005) Learning to crawl : comparing classification schemes. ACM Trans Info Syst 23(4):430–462

    Article  Google Scholar 

  27. Popitsch N, Schandl B (2010) Ad-hoc file sharing using linked data technologies. In: PSD’10: proceedings of the international workshop on personal semantic data

  28. Qi X, Davison BD (2009) Web page classification. ACM Comput Surv 41(2):1–31

    Article  Google Scholar 

  29. Řehůřek R., Sojka P (2010) Software framework for topic modelling with large corpora. In: LREC’10: proceedings of the workshop on new challenges for NLP frameworks. ELRA, pp 45– 50

  30. Schütze H, Hull DA, Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: ACM SIGIR’95: proceedings of the 18th annual international conference on research and development in information retrieval. ACM, pp 229–237

  31. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1–47

    Google Scholar 

  32. Senra RDA, Medeiros CB (2011) Organographs - multi-faceted hierarchical categorization of web documents. In: WEBIST’11: proceedings of the 7th international conference on web information systems and technologies, pp 583–588

  33. Sneath P, Sokal R (1973) Numerical taxonomy. The principles and practice of numerical classification. W. H. Freeman and Company, San Francisco, pp xv + 573. ISBN 0-7167-0697-0

  34. Turmo J, Ageno A, Català N (2006) Adaptive information extraction. ACM Comput Surv (CSUR) 38(2):4

    Article  Google Scholar 

  35. Weigend A, Wiener E, Pedersen J (1999) Exploiting hierarchy in text categorization. Inf Retr 1(3):193–216

    Article  Google Scholar 

  36. Xu J, Dichev C, Esterline A (2009) On the Effectiveness of collaborative tagging systems for describing resources. In: WRI’09: proceedings of the world congress on computer science and information engineering, vol 4. IEEE Computer Society, pp 467–471

  37. Yang Y, Liu X (1999) A re-examination of text categorization methods. In: ACM SIGIR’99: proceedings of the 22nd annual international conference on research and development in, information retrieval, pp 42–49

Download references

Acknowledgments

This work was supported by the Microsoft Research FAPESP Virtual Institute (NavScales project), the Center for Computational Engineering and Sciences—Fapesp/Cepid 2013/08293-7, CNPq (MuZOO Project and PRONEX-FAPESP), INCT in Web Science(CNPq 557.128/2009-9) and CAPES. We also thank all LIS members from IC-Unicamp for their comments and suggestions. Last but not least, we thank the JODS reviewers for their valuable suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo Dias Arruda Senra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Senra, R.D.A., Medeiros, C.B. Evaluate, Reorganize and Share: An Approach to Dynamically Organize Digital Hierarchies. J Data Semant 3, 225–236 (2014). https://doi.org/10.1007/s13740-014-0035-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-014-0035-7

Keywords

Navigation