Advertisement

Journal on Data Semantics

, Volume 3, Issue 4, pp 225–236 | Cite as

Evaluate, Reorganize and Share: An Approach to Dynamically Organize Digital Hierarchies

  • Rodrigo Dias Arruda Senra
  • Claudia Bauzer Medeiros
Original Article
  • 296 Downloads

Abstract

We are overwhelmed and overloaded with the data deluge brought by the digital age. Hierarchies are pervasive cognitive patterns that allow us to reorganize data and reduce the dimensionality of the information space to manageable levels (e.g., filesystems and navigational menus). In spite of their widespread adoption, such hierarchies can be improved to cope with the present needs of data sharing and reuse. First, we seldom use mechanisms to evaluate how well they partition the information space. Second, we build static and content-driven hierarchies instead of dynamic and context-driven (i.e., task-driven) ones. Third, we use ad hoc and implicit hierarchization criteria, whereas they should be explicit and shareable. This paper discusses the problems related to the construction of hierarchies, and presents a conceptual framework to turn them into reconfigurable and shareable artifacts. Moreover, it explores how dynamically reconfigurable hierarchies can better cope with the multi-faceted nature of content, illustrating these principles through a tool that validates our proposal.

Keywords

Organograph Data sharing Data integration Organization 

Notes

Acknowledgments

This work was supported by the Microsoft Research FAPESP Virtual Institute (NavScales project), the Center for Computational Engineering and Sciences—Fapesp/Cepid 2013/08293-7, CNPq (MuZOO Project and PRONEX-FAPESP), INCT in Web Science(CNPq 557.128/2009-9) and CAPES. We also thank all LIS members from IC-Unicamp for their comments and suggestions. Last but not least, we thank the JODS reviewers for their valuable suggestions.

References

  1. 1.
    Acm CCS (2010) Acm’s computing classification system (ccs). http://www.acm.org/about/class/1998
  2. 2.
    Baker L, McCallum A (1998) Distributional clustering of words for text classification. In: ACM SIGIR’98: Proceedings of the 21st annual international conference on research and development in information retrieval. ACM, pp 96–103Google Scholar
  3. 3.
    Berman F (2008) Got data?: a guide to data preservation in the information age. Commun ACM 51:50–56CrossRefGoogle Scholar
  4. 4.
    Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84CrossRefMathSciNetGoogle Scholar
  5. 5.
    Bloehdorn S, Cimiano P, Hotho A (2005) Learning ontologies to improve text clustering and classification. In: Proceeding of the 29th annual conference of the German classification society (GfKl), Magdeburg, Germany, pp 334–341Google Scholar
  6. 6.
    Crescenzi V, Mecca G (2004) Automatic information extraction from large websites. J ACM (JACM) 51(5):731–779CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Dekel O, Keshet J, Singer Y (2004) Large margin hierarchical classification. J Am Stat Assoc 104(487):1213Google Scholar
  8. 8.
    Dumais S, Chen H (2000) Hierarchical classification of web content. In: ACM SIGIR’00: proceedings of the 23rd annual Iinternational conference on research and development in information retrieval. ACM, pp 256–263Google Scholar
  9. 9.
    Fernandes A, Moura AMDC, Porto F (2003) An ontology-based approach for organizing, sharing, and querying knowledge objects on the web. In: DEXA’03: proceedings of the 14th international workshop on database and expert systems applications. IEEE, pp 604–609Google Scholar
  10. 10.
    Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172Google Scholar
  11. 11.
    Gates S, Teiken W, Cheng K (2005) Taxonomies by the numbers: building high-performance taxonomies. In: proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 568–577Google Scholar
  12. 12.
    Hua Y, Jiang H, Zhu Y, Feng D, Tian L (2012) Semantic-aware metadata organization paradigm in next-generation file systems. IEEE Trans Parallel Distrib Syst 23(2):337–344CrossRefGoogle Scholar
  13. 13.
    Irmak U, Kraft R (2010) A scalable machine-learning approach for semi-structured named entity recognition. In: Proceeings of the 19th international conference on World Wide Web. ACM, pp 461–470Google Scholar
  14. 14.
    Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs, NJ, USAzbMATHGoogle Scholar
  15. 15.
    Joachims T (1997) A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Machine learning international workshop, pp 143–151Google Scholar
  16. 16.
    Kiritchenko S, Matwin S, Nock R, Famili AF (2006) Learning and evaluation in the presence of class hierarchies : application to text categorization. In: Proceedings of the 19th Canadian conference on artificial intelligenceGoogle Scholar
  17. 17.
    Kohonen T, Kaski S, Lagus K, Salojarvi J, Honkela J, Paatero V, Saarela A (2000) Self organization of a massive document collection. IEEE Trans Neural Netw 11(3):574–585CrossRefGoogle Scholar
  18. 18.
    Koller D, Sahami M (1997) Hierarchically classifying documents using very few words. In: ICML’97: proceedings of the 14th international conference on machine learning. Morgan Kaufmann, pp 170–178Google Scholar
  19. 19.
    Köorner C, Benz D, Hotho A, Strohmaier M (2010) Stop thinking, start tagging: tag semantics emerge from collaborative verbosity. In: Proceedings of the 19th international conference on World Wide Web. ACM, pp 521–530Google Scholar
  20. 20.
    Laender AHF, Ribeiro-Neto BA, da Silva AS, Teixeira JS (2002) A brief survey of web data extraction tools. ACM Sigmod Rec 31(2):84–93CrossRefGoogle Scholar
  21. 21.
    Liu J, Yu S, Le J (2005) Dynamic mining hierarchical topic from web news stream data using divisive-agglomerative clustering method. In: PAKDD’05: proceeding of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining. Springer, Berlin, pp 826–831Google Scholar
  22. 22.
    McCallum A, Nigam K (1998) A comparison of event models for naive bayes text classification. In: AAAI’98: workshop on learning for text categorization, vol 752, pp 41–48Google Scholar
  23. 23.
    Michalski RS (1980) Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. J Policy Anal Info Syst 4(3):219–244MathSciNetGoogle Scholar
  24. 24.
    Miller GA (1956) The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychol Rev 63(2):81–97CrossRefGoogle Scholar
  25. 25.
    Mishra N, Motwani R (2004) Introduction: special issue on theoretical advances in data clustering. Mach Learn 56(1–3):5–7CrossRefGoogle Scholar
  26. 26.
    Pant G, Srinivasan P (2005) Learning to crawl : comparing classification schemes. ACM Trans Info Syst 23(4):430–462CrossRefGoogle Scholar
  27. 27.
    Popitsch N, Schandl B (2010) Ad-hoc file sharing using linked data technologies. In: PSD’10: proceedings of the international workshop on personal semantic dataGoogle Scholar
  28. 28.
    Qi X, Davison BD (2009) Web page classification. ACM Comput Surv 41(2):1–31CrossRefGoogle Scholar
  29. 29.
    Řehůřek R., Sojka P (2010) Software framework for topic modelling with large corpora. In: LREC’10: proceedings of the workshop on new challenges for NLP frameworks. ELRA, pp 45– 50Google Scholar
  30. 30.
    Schütze H, Hull DA, Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: ACM SIGIR’95: proceedings of the 18th annual international conference on research and development in information retrieval. ACM, pp 229–237Google Scholar
  31. 31.
    Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1–47Google Scholar
  32. 32.
    Senra RDA, Medeiros CB (2011) Organographs - multi-faceted hierarchical categorization of web documents. In: WEBIST’11: proceedings of the 7th international conference on web information systems and technologies, pp 583–588Google Scholar
  33. 33.
    Sneath P, Sokal R (1973) Numerical taxonomy. The principles and practice of numerical classification. W. H. Freeman and Company, San Francisco, pp xv + 573. ISBN 0-7167-0697-0Google Scholar
  34. 34.
    Turmo J, Ageno A, Català N (2006) Adaptive information extraction. ACM Comput Surv (CSUR) 38(2):4CrossRefGoogle Scholar
  35. 35.
    Weigend A, Wiener E, Pedersen J (1999) Exploiting hierarchy in text categorization. Inf Retr 1(3):193–216CrossRefGoogle Scholar
  36. 36.
    Xu J, Dichev C, Esterline A (2009) On the Effectiveness of collaborative tagging systems for describing resources. In: WRI’09: proceedings of the world congress on computer science and information engineering, vol 4. IEEE Computer Society, pp 467–471Google Scholar
  37. 37.
    Yang Y, Liu X (1999) A re-examination of text categorization methods. In: ACM SIGIR’99: proceedings of the 22nd annual international conference on research and development in, information retrieval, pp 42–49Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Rodrigo Dias Arruda Senra
    • 1
  • Claudia Bauzer Medeiros
    • 1
  1. 1.Institute of ComputingUniversity of Campinas (UNICAMP)CampinasBrazil

Personalised recommendations