Skip to main content

Advertisement

Log in

Asymmetric information distances for automated taxonomy construction

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A novel method for automatically constructing taxonomies for specific research domains is presented. The proposed methodology uses term co-occurrence frequencies as an indicator of the semantic closeness between terms. To support the automated creation of taxonomies or subject classifications we present a simple modification to the basic distance measure, and describe a set of procedures by which these measures may be converted into estimates of the desired taxonomy. To demonstrate the viability of this approach, a pilot study on renewable energy technologies is conducted, where the proposed method is used to construct a hierarchy of terms related to alternative energy. These techniques have many potential applications, but one activity in which we are particularly interested is the mapping and subsequent prediction of future developments in the technology and research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. An Y et al (2004) Characterizing and mining the citation graph of the computer science literature. Knowl Inf Syst 6(6): 664–678

    Article  Google Scholar 

  2. Anuradha et al (2007) Bibliometric indicators of Indian research collaboration patterns: A correspondence analysis. Scientometrics 71(2): 179–189

    Article  Google Scholar 

  3. Blaschke C, Valencia A (2002) Automatic ontology construction from the literature. Genome Inform 13: 201–213

    Google Scholar 

  4. Braun T et al (2000) Growth and trends of fullerene research as reflected in its journal literature. Chem Rev 100(1): 23–38

    Article  Google Scholar 

  5. Chiu W-T, Ho Y-S (2007) Bibliometric analysis of tsunami research. Scientometrics 73(1): 3–17

    Article  Google Scholar 

  6. Cilibrasi R, Vitanyi P (2006) Automatic extraction of meaning from the web. In: IEEE International Symp. Information Theory

  7. Cilibrasi RL, Vitányi PMB (2007) The google similarity distance. IEEE T Knowl Data En 19(3): 370–383

    Article  Google Scholar 

  8. Daim TU et al (2006) Forecasting emerging technologies: use of bibliometrics and patent analysis. Technol Forecast Soc 73(8): 981–1012

    Article  Google Scholar 

  9. Daim TU, et al. (2005) Technology forecasting using bibliometric analysis and system dynamics. In: Technology management: a unifying discipline for melting the boundaries, pp 112–122

  10. de Miranda et al (2006) Text mining as a valuable tool in foresight exercises: a study on nanotechnology. Technol Forecast Soc 73(8): 1013–1027

    Article  Google Scholar 

  11. Kandylas V et al (2008) Finding cohesive clusters for analyzing knowledge communities. Knowl Inf Syst 17(3): 335–354

    Article  Google Scholar 

  12. Kim M-J (2007) A bibliometric analysis of the effectiveness of Koreas Biotechnology Stimulation Plans, with a comparison with four other Asian nations. Scientometrics 72(3): 371–388

    Article  Google Scholar 

  13. Korte B, Vygen J (2006) Combinatorial optimization: theory and algorithms, 3rd edn. Springer, Germany

    MATH  Google Scholar 

  14. Kostoff RN (2001) Text mining using database tomography and bibliometrics: a review. Technol Forecast Soc 68: 223–253

    Article  Google Scholar 

  15. Li Y (2001) An effective implementation of a direct spanning tree representation in GAs. Appl Evol Comput 2037: 11–19

    Article  Google Scholar 

  16. Li Y, Bouchebaba Y (2000) A new genetic algorithm for the optimal communication spanning tree problem. Artif Evol 1829: 162–173

    Article  Google Scholar 

  17. Losiewicz P et al (2000) Textual data mining to support science and technology management. J Int Inf Syst 15(2): 99–119

    Article  Google Scholar 

  18. Lu W et al (2007) Node similarity in the citation graph. Knowl Inf Syst 11(1): 105–129

    Article  Google Scholar 

  19. Makrehchi M, Kamel MS (2007) Automatic taxonomy extraction using google and term dependency. In: WI ’07: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, Washington, DC, USA, pp 321–325

  20. Martino J (1993) Technological forecasting for decision making. McGraw-Hill Engineering and Technology Management Series, New York

  21. Porter A (2005) Tech mining. Compet Int Mag 8(1): 30–36

    Google Scholar 

  22. Porter A (2007) How “Tech Mining” can enhance R&D Management. Res Tech Manage 50(2): 15–20

    Google Scholar 

  23. Raidl GR (2000) An efficient evolutionary algorithm for the degree-constrained minimum spanning tree problem. In: Evolutionary Computation, 2000. Proceedings of the 2000 Congress on, vol 1, pp. 104–111

  24. Saka A, Igami M (2007) Mapping modern science using co-citation analysis. In: IV ’07: Proceedings of the 11th International Conference Information Visualization. IEEE Computer Society, Washington, DC, USA, pp 453–458

  25. Smalheiser NR (2001) Predicting emerging technologies with the aid of text-based data mining: the micro approach. Technovation 21(10): 689–693

    Article  Google Scholar 

  26. Small H (2006) Tracking and predicting growth areas in science. Scientometrics 68(3): 595–610

    Article  Google Scholar 

  27. Zhu D, Porter A (2002) Automated extraction and visualization of information for technological intelligence and forecasting. Technol Forecast Soc 69(5): 495–506

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Lee Woon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Woon, W.L., Madnick, S. Asymmetric information distances for automated taxonomy construction. Knowl Inf Syst 21, 91–111 (2009). https://doi.org/10.1007/s10115-009-0203-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0203-5

Keywords

Navigation