Asymmetric information distances for automated taxonomy construction

Woon, Wei Lee; Madnick, Stuart

doi:10.1007/s10115-009-0203-5

Asymmetric information distances for automated taxonomy construction

Regular Paper
Published: 02 April 2009

Volume 21, pages 91–111, (2009)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Wei Lee Woon¹ &
Stuart Madnick²

168 Accesses
23 Citations
Explore all metrics

Abstract

A novel method for automatically constructing taxonomies for specific research domains is presented. The proposed methodology uses term co-occurrence frequencies as an indicator of the semantic closeness between terms. To support the automated creation of taxonomies or subject classifications we present a simple modification to the basic distance measure, and describe a set of procedures by which these measures may be converted into estimates of the desired taxonomy. To demonstrate the viability of this approach, a pilot study on renewable energy technologies is conducted, where the proposed method is used to construct a hierarchy of terms related to alternative energy. These techniques have many potential applications, but one activity in which we are particularly interested is the mapping and subsequent prediction of future developments in the technology and research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

An Y et al (2004) Characterizing and mining the citation graph of the computer science literature. Knowl Inf Syst 6(6): 664–678
Article Google Scholar
Anuradha et al (2007) Bibliometric indicators of Indian research collaboration patterns: A correspondence analysis. Scientometrics 71(2): 179–189
Article Google Scholar
Blaschke C, Valencia A (2002) Automatic ontology construction from the literature. Genome Inform 13: 201–213
Google Scholar
Braun T et al (2000) Growth and trends of fullerene research as reflected in its journal literature. Chem Rev 100(1): 23–38
Article Google Scholar
Chiu W-T, Ho Y-S (2007) Bibliometric analysis of tsunami research. Scientometrics 73(1): 3–17
Article Google Scholar
Cilibrasi R, Vitanyi P (2006) Automatic extraction of meaning from the web. In: IEEE International Symp. Information Theory
Cilibrasi RL, Vitányi PMB (2007) The google similarity distance. IEEE T Knowl Data En 19(3): 370–383
Article Google Scholar
Daim TU et al (2006) Forecasting emerging technologies: use of bibliometrics and patent analysis. Technol Forecast Soc 73(8): 981–1012
Article Google Scholar
Daim TU, et al. (2005) Technology forecasting using bibliometric analysis and system dynamics. In: Technology management: a unifying discipline for melting the boundaries, pp 112–122
de Miranda et al (2006) Text mining as a valuable tool in foresight exercises: a study on nanotechnology. Technol Forecast Soc 73(8): 1013–1027
Article Google Scholar
Kandylas V et al (2008) Finding cohesive clusters for analyzing knowledge communities. Knowl Inf Syst 17(3): 335–354
Article Google Scholar
Kim M-J (2007) A bibliometric analysis of the effectiveness of Koreas Biotechnology Stimulation Plans, with a comparison with four other Asian nations. Scientometrics 72(3): 371–388
Article Google Scholar
Korte B, Vygen J (2006) Combinatorial optimization: theory and algorithms, 3rd edn. Springer, Germany
MATH Google Scholar
Kostoff RN (2001) Text mining using database tomography and bibliometrics: a review. Technol Forecast Soc 68: 223–253
Article Google Scholar
Li Y (2001) An effective implementation of a direct spanning tree representation in GAs. Appl Evol Comput 2037: 11–19
Article Google Scholar
Li Y, Bouchebaba Y (2000) A new genetic algorithm for the optimal communication spanning tree problem. Artif Evol 1829: 162–173
Article Google Scholar
Losiewicz P et al (2000) Textual data mining to support science and technology management. J Int Inf Syst 15(2): 99–119
Article Google Scholar
Lu W et al (2007) Node similarity in the citation graph. Knowl Inf Syst 11(1): 105–129
Article Google Scholar
Makrehchi M, Kamel MS (2007) Automatic taxonomy extraction using google and term dependency. In: WI ’07: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, Washington, DC, USA, pp 321–325
Martino J (1993) Technological forecasting for decision making. McGraw-Hill Engineering and Technology Management Series, New York
Porter A (2005) Tech mining. Compet Int Mag 8(1): 30–36
Google Scholar
Porter A (2007) How “Tech Mining” can enhance R&D Management. Res Tech Manage 50(2): 15–20
Google Scholar
Raidl GR (2000) An efficient evolutionary algorithm for the degree-constrained minimum spanning tree problem. In: Evolutionary Computation, 2000. Proceedings of the 2000 Congress on, vol 1, pp. 104–111
Saka A, Igami M (2007) Mapping modern science using co-citation analysis. In: IV ’07: Proceedings of the 11th International Conference Information Visualization. IEEE Computer Society, Washington, DC, USA, pp 453–458
Smalheiser NR (2001) Predicting emerging technologies with the aid of text-based data mining: the micro approach. Technovation 21(10): 689–693
Article Google Scholar
Small H (2006) Tracking and predicting growth areas in science. Scientometrics 68(3): 595–610
Article Google Scholar
Zhu D, Porter A (2002) Automated extraction and visualization of information for technological intelligence and forecasting. Technol Forecast Soc 69(5): 495–506
Article Google Scholar

Download references

Author information

Authors and Affiliations

Masdar Institute of Science and Technology, MASDAR, P.O. Box 54224, Abu Dhabi, UAE
Wei Lee Woon
Sloan School of Management, M.I.T., E53-321, Cambridge, MA, 02139, USA
Stuart Madnick

Authors

Wei Lee Woon
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Madnick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Lee Woon.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Woon, W.L., Madnick, S. Asymmetric information distances for automated taxonomy construction. Knowl Inf Syst 21, 91–111 (2009). https://doi.org/10.1007/s10115-009-0203-5

Download citation

Received: 16 September 2008
Revised: 26 February 2009
Accepted: 08 March 2009
Published: 02 April 2009
Issue Date: October 2009
DOI: https://doi.org/10.1007/s10115-009-0203-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Asymmetric information distances for automated taxonomy construction

Abstract

Access this article

Similar content being viewed by others

GOT: Generalization over Taxonomies, a Software Toolkit for Content Analysis with Taxonomies

An Update for Taxonomy Designers

Constructing a Focused Taxonomy from a Document Collection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Asymmetric information distances for automated taxonomy construction

Abstract

Access this article

Similar content being viewed by others

GOT: Generalization over Taxonomies, a Software Toolkit for Content Analysis with Taxonomies

An Update for Taxonomy Designers

Constructing a Focused Taxonomy from a Document Collection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation