Abstract
A novel method for automatically constructing taxonomies for specific research domains is presented. The proposed methodology uses term co-occurrence frequencies as an indicator of the semantic closeness between terms. To support the automated creation of taxonomies or subject classifications we present a simple modification to the basic distance measure, and describe a set of procedures by which these measures may be converted into estimates of the desired taxonomy. To demonstrate the viability of this approach, a pilot study on renewable energy technologies is conducted, where the proposed method is used to construct a hierarchy of terms related to alternative energy. These techniques have many potential applications, but one activity in which we are particularly interested is the mapping and subsequent prediction of future developments in the technology and research.
Similar content being viewed by others
References
An Y et al (2004) Characterizing and mining the citation graph of the computer science literature. Knowl Inf Syst 6(6): 664–678
Anuradha et al (2007) Bibliometric indicators of Indian research collaboration patterns: A correspondence analysis. Scientometrics 71(2): 179–189
Blaschke C, Valencia A (2002) Automatic ontology construction from the literature. Genome Inform 13: 201–213
Braun T et al (2000) Growth and trends of fullerene research as reflected in its journal literature. Chem Rev 100(1): 23–38
Chiu W-T, Ho Y-S (2007) Bibliometric analysis of tsunami research. Scientometrics 73(1): 3–17
Cilibrasi R, Vitanyi P (2006) Automatic extraction of meaning from the web. In: IEEE International Symp. Information Theory
Cilibrasi RL, Vitányi PMB (2007) The google similarity distance. IEEE T Knowl Data En 19(3): 370–383
Daim TU et al (2006) Forecasting emerging technologies: use of bibliometrics and patent analysis. Technol Forecast Soc 73(8): 981–1012
Daim TU, et al. (2005) Technology forecasting using bibliometric analysis and system dynamics. In: Technology management: a unifying discipline for melting the boundaries, pp 112–122
de Miranda et al (2006) Text mining as a valuable tool in foresight exercises: a study on nanotechnology. Technol Forecast Soc 73(8): 1013–1027
Kandylas V et al (2008) Finding cohesive clusters for analyzing knowledge communities. Knowl Inf Syst 17(3): 335–354
Kim M-J (2007) A bibliometric analysis of the effectiveness of Koreas Biotechnology Stimulation Plans, with a comparison with four other Asian nations. Scientometrics 72(3): 371–388
Korte B, Vygen J (2006) Combinatorial optimization: theory and algorithms, 3rd edn. Springer, Germany
Kostoff RN (2001) Text mining using database tomography and bibliometrics: a review. Technol Forecast Soc 68: 223–253
Li Y (2001) An effective implementation of a direct spanning tree representation in GAs. Appl Evol Comput 2037: 11–19
Li Y, Bouchebaba Y (2000) A new genetic algorithm for the optimal communication spanning tree problem. Artif Evol 1829: 162–173
Losiewicz P et al (2000) Textual data mining to support science and technology management. J Int Inf Syst 15(2): 99–119
Lu W et al (2007) Node similarity in the citation graph. Knowl Inf Syst 11(1): 105–129
Makrehchi M, Kamel MS (2007) Automatic taxonomy extraction using google and term dependency. In: WI ’07: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, Washington, DC, USA, pp 321–325
Martino J (1993) Technological forecasting for decision making. McGraw-Hill Engineering and Technology Management Series, New York
Porter A (2005) Tech mining. Compet Int Mag 8(1): 30–36
Porter A (2007) How “Tech Mining” can enhance R&D Management. Res Tech Manage 50(2): 15–20
Raidl GR (2000) An efficient evolutionary algorithm for the degree-constrained minimum spanning tree problem. In: Evolutionary Computation, 2000. Proceedings of the 2000 Congress on, vol 1, pp. 104–111
Saka A, Igami M (2007) Mapping modern science using co-citation analysis. In: IV ’07: Proceedings of the 11th International Conference Information Visualization. IEEE Computer Society, Washington, DC, USA, pp 453–458
Smalheiser NR (2001) Predicting emerging technologies with the aid of text-based data mining: the micro approach. Technovation 21(10): 689–693
Small H (2006) Tracking and predicting growth areas in science. Scientometrics 68(3): 595–610
Zhu D, Porter A (2002) Automated extraction and visualization of information for technological intelligence and forecasting. Technol Forecast Soc 69(5): 495–506
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Woon, W.L., Madnick, S. Asymmetric information distances for automated taxonomy construction. Knowl Inf Syst 21, 91–111 (2009). https://doi.org/10.1007/s10115-009-0203-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-009-0203-5