Semantic distances for technology landscape visualization

Woon, Wei Lee; Madnick, Stuart

doi:10.1007/s10844-011-0182-3

Semantic distances for technology landscape visualization

Published: 26 October 2011

Volume 39, pages 29–58, (2012)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Wei Lee Woon¹ &
Stuart Madnick²

374 Accesses
5 Citations
Explore all metrics

Abstract

This paper presents a novel approach to the visualization of research domains in science and technology. The proposed methodology is based on the use of bibliometrics; i.e., analysis is conducted using information regarding trends and patterns of publication rather than the actual content. In particular, we explore the use of term co-occurrence frequencies as an indicator of semantic closeness between pairs of terms. To demonstrate the utility of this approach, a number of visualizations are generated for a collection of renewable energy related keywords. As these keywords are regarded as manifestations of the associated research topics, we contend that the proposed visualizations can be interpreted as representations of the underlying technology landscape.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

Article 26 March 2021

The journal coverage of Web of Science and Scopus: a comparative analysis

Article 19 October 2015

The bibliometric analysis of scholarly production: How great is the impact?

Article Open access 28 July 2015

Notes

Year 2005. Source: Energy Information Administration, DOE, US Government.
http://scholar.google.com
http://www.scirus.org

References

Antolín, G., Tinaut, F. V., Briceño, Y., Castaño, V., Pérez, C., & Ramírez, A. I. (2002). Optimisation of biodiesel production by sunflower oil transesterification. Bioresource Technology, 83(2), 111–114.
Article Google Scholar
Anuradha, K., & Urs, S. (2007). Bibliometric indicators of indian research collaboration patterns: A correspondence analysis. Scientometrics, 71(2), 179–189.
Article Google Scholar
Baek, N. C., Shin, U. C., & Yoon, J. H. (2005). A study on the design and analysis of a heat pump heating system using wastewater as a heat source. Solar Energy, 78(3), 427–440.
Article Google Scholar
Bengisu, M., & Nekhili, R. (2006). Forecasting emerging technologies with the aid of science and technology databases. Technological Forecasting and Social Change, 73(7), 835–844.
Article Google Scholar
Bishop, C. (1995). Neural networks for pattern recognition. London: Oxford University Press.
Google Scholar
Bishop, C. (2006). Pattern recognition and machine learning. Information science and statistics. Singapore: Springer.
Google Scholar
Börner, K., Dall’Asta, L., Ke, W., & Vespignani, A. (2005). Studying the emerging global brain: Analyzing and visualizing the impact of co-authorship teams. Complexity, 10(4), 57–67.
Article Google Scholar
Braun, T., Schubert, A. P., & Kostoff, R. N. (2000). Growth and trends of fullerene research as reflected in its journal literature. Chemical Reviews, 100(1), 23–38.
Article Google Scholar
Chiu, W.-T., & Ho, Y.-S. (2007). Bibliometric analysis of tsunami research. Scientometrics, 73(1), 3–17.
Article Google Scholar
Cilibrasi, R., & Vitanyi, P. (2006). Automatic extraction of meaning from the web. In IEEE international symp. information theory.
Cilibrasi, R. L., & Vitanyi, P. M. B. (2007). The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3), 370–383.
Article Google Scholar
Daim, T. U., Rueda, G., Martin, H., & Gerdsri, P. (2006). Forecasting emerging technologies: Use of bibliometrics and patent analysis. Technological Forecasting and Social Change, 73(8), 981–1012.
Article Google Scholar
Daim, T. U., Rueda, G. R., & Martin, H. T. (2005). Technology forecasting using bibliometric analysis and system dynamics. In Technology management: A unifying discipline for melting the boundaries (pp. 112–122).
de Miranda, C., Dos, G. M., & Filho, L. F. (2006). Text mining as a valuable tool in foresight exercises: A study on nanotechnology. Technological Forecasting and Social Change, 73(8), 1013–1027.
Google Scholar
Ding, Y., Chowdhury, G., & Foo, S. (2001). Bibliometric cartography of information retrieval research by using co-word analysis. Information Processing & Management, 37(6), 817–842.
Article MATH Google Scholar
Elnekave, M. (2008). Adsorption heat pumps for providing coupled heating and cooling effects in olive oil mills. International Journal of Energy Research, 32(6), 559–568.
Article Google Scholar
Glänzel, W., & Schubert, A. (2005). Analysing scientific networks through co-authorship. In Handbook of quantitative science and technology research (pp. 257–276).
Hansel, A., & Lindblad, P. (1998). Towards optimization of cyanobacteria as biotechnologically relevant producers of molecular hydrogen, a clean and renewable energy source. Applied Microbiology and Biotechnology, 50(2), 153–160.
Article Google Scholar
Igami, M. (2008). Exploration of the evolution of nanotechnology via mapping of patent applications. Scientometrics, 77(2), 289–308.
Article Google Scholar
Janssens, F., Leta, J., Glänzel, W., & De Moor, B. (2006). Towards mapping library and information science. Information Processing & Management, 42(6), 1614–1642.
Article Google Scholar
Kajikawa, Y., & Takeda, Y. (2008). Structure of research on biomass and bio-fuels: A citation-based approach. Technological Forecasting and Social Change, 75(9), 1349–1359.
Article Google Scholar
Kajikawa, Y., Yoshikawa, J., Takeda, Y., & Matsushima, K. (2007). Tracking emerging technologies in energy research: Toward a roadmap for sustainable energy. Technological Forecasting and Social Change, 75(6), 771–782.
Article Google Scholar
Kim, M.-J. (2007). A bibliometric analysis of the effectiveness of Korea’s biotechnology stimulation plans, with a comparison with four other Asian nations. Scientometrics, 72(3), 371–388.
Article Google Scholar
King, D. A. (2004). The scientific impact of nations. Nature, 430(6997), 311–316.
Article Google Scholar
Kostoff, R. N. (2001). Text mining using database tomography and bibliometrics: A review. Technological Forecasting and Social Change, 68, 223–253.
Article Google Scholar
Losiewicz, P., Oard, D., & Kostoff, R. (2000). Textual data mining to support science and technology management. Journal of Intelligent Information Systems, 15(2), 99–119.
Article Google Scholar
Manning, C., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1). Cambridge: Cambridge University Press.
Book Google Scholar
Martino, J. (1993). Technological forecasting for decision making. McGraw-Hill Engineering and Technology Management Series.
Mcdowall, W., & Eames, M. (2006). Forecasts, scenarios, visions, backcasts and roadmaps to the hydrogen economy: A review of the hydrogen futures literature. Energy Policy, 34(11), 1236–1250.
Article Google Scholar
Morel, C., Serruya, S., Penna, G., & Guimarães, R. (2009). Co-authorship network analysis: A powerful tool for strategic planning of research, development and capacity building programs on neglected diseases. PLoS Neglected Tropical Diseases, 3(8), e501.
Article Google Scholar
Porter, A. (2005). Tech mining. Competitive Intelligence Magazine, 8(1), 30–36.
Google Scholar
Porter, A. (2007). How “tech mining” can enhance R&D management. Research Technology Management, 50(2), 15–20.
Google Scholar
Porter, A., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719–745.
Article Google Scholar
Porter, A., Roper, A., Mason, T., Rossini, F., & Banks, J. (1991). Forecasting and management of technology. New York: Wiley-Interscience.
Google Scholar
Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4(4), 406–425.
Google Scholar
Saka, A., & Igami, M. (2007). Mapping modern science using co-citation analysis. In IV ’07: Proceedings of the 11th international conference information visualization, Washington, DC, U.S.A. (pp. 453–458). Los Alamitos: IEEE Computer Society.
Google Scholar
Sammon, J. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 100(18), 401–409.
Article Google Scholar
Smalheiser, N. R. (2001). Predicting emerging technologies with the aid of text-based data mining: The micro approach. Technovation, 21(10), 689–693.
Article Google Scholar
Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68(3), 595–610.
Article Google Scholar
Takeda, Y., & Kajikawa, Y. (2009). Optics: A bibliometric approach to detect emerging research domains and intellectual bases. Scientometrics, 78(3), 543–558.
Article Google Scholar
Takeda, Y., Mae, S., Kajikawa, Y., & Matsushima, K. (2009). Nanobiotechnology as an emerging research domain from nanotechnology: A bibliometric approach. Scientometrics, 80(1), 23–38.
Article Google Scholar
Upham, S., & Small, H. (2010). Emerging research fronts in science and technology: Patterns of new knowledge development. Scientometrics, 83(1), 15–38.
Article Google Scholar
Van Der Heijden, K. (2000). Scenarios and forecasting—Two perspectives. Technological Forecasting and Social Change, 65, 31–36.
Article Google Scholar
Woon, W., & Madnick, S. (2009). Asymmetric information distances for automated taxonomy construction. Knowledge and Information Systems, 21, 91–111. doi:10.1007/s10115-009-0203-5.
Article Google Scholar
Woon, W. L., Zeineldin, H., & Madnick, S. (2011). Bibliometric analysis of distributed generation. Technological Forecasting and Social Change, 78(3), 408–420.
Article Google Scholar
Zhu, D., & Porter, A. (2002). Automated extraction and visualization of information for technological intelligence and forecasting. Technological Forecasting and Social Change, 69(5), 495–506.
Article Google Scholar

Download references

Acknowledgements

We would like to thank the Masdar Institute of Science and Technology (MIST) and the Masdar Initiative for their support of this work.

Author information

Authors and Affiliations

Computing and Information Science Program, Masdar Institute of Science and Technology, P.O. Box 54224, Abu Dhabi, United Arab Emirates
Wei Lee Woon
Sloan School of Management, Massachusetts Institute of Technology, E62-422, Cambridge, MA, 02139, USA
Stuart Madnick

Authors

Wei Lee Woon
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Madnick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Lee Woon.

Appendices

Appendix A: Renewable energy related keywords

1.1 A.1 Keywords from Kajikawa et al.

Combustion, coal, battery, petroleum, fuel cell, wastewater, heat pump, engine, solar cell, power system.

1.2 A.2 Author keywords

Biomass, CDS, CDTE, energy efficiency, gasification, global warming, least-cost energy policies, power generation, populus, qtl, renewable energy, review, sustainable farming and forestry, adsorption, alternative fuel, arabidopsis, ash deposits, bio-fuels, biodiesel, biomass, biomass-fired power boilers, carbon nanotubes, chemicals, co-firing, coal, corn stover, electricity, emissions, energy balance, energy conversion, energy economy and management, energy policy, energy sources, enzymatic digestion, fast pyrolysis, fuels, gas engines, gas storage, gasification, genome sequence, genomics, high efficiency, hydrolysis, inorganic material, investment, landfill, model plant, natural gas, poplar, pretreatment, pyrolysis, renewable energy, renewables, sugars, sunflower oil, thermal conversion, thermal processing, thin films, transesterification.

1.3 A.3 Keyword plus

16s ribosomal-rna, activation, active-site, adsorption, agrobacterium-mediated transformation, anabaena-variabilis, anacystis-nidulans, aqueous ammonia, bidirectional hydrogenase, biomass conversion processes, briquettes, canopy structure, catalysts, cds, cdte, cells, cellulases, cellulose, ch4, chalcopyrite, charge-transfer dynamics, chemical heat pipe, co2, cocombustion, combustion, composites, conversion, corn stover, coupled electron-transfer, devolatilization, diesel-power-plant, differentiation, dye, efficiency, electrocatalytic hydrogen evolution, electrochemical reduction, electrodes, electron-transfer, elemental sulfur, energy, enzymatic-hydrolysis, families, fermi-level equilibration, films, fimi, flash pyrolysis, fluidized-bed, fuel, fuel-cell vehicles, fuels, functionalized gold nanoparticles, gasification, gasoline, gel electrolyte, gene-transfer, genetic-linkage maps, glycosyl hydrolases, grain morphology, graphite nanofibers, herbaceous biomass, homogeneous catalysis, hybrid poplar, hydrogen, hydrogen-peroxide, hydrogen-production, hydrolysis, ignition, infrastructure, kinetics, light interception, lignin removal, lignocellulosic materials, lime pretreatment, liquefaction, liquid, liquids, mechanisms, metal-complexes, metals, molecular-genetics, monte-carlo simulations, mutagenesis, nanocrystalline semiconductor-films, nickel, nitrogen-fixation, open-top chambers, oxidative addition, partial oxidation, particles, photoelectrochemical cells, photoelectrochemical properties, photoinduced electron-transfer, photonic crystals, photoproduction, photosystem-ii, physisorption, place-exchange-reactions, pores, pressure cooking, products, proton reduction, pulverized coal, pyrolysis, rapd markers, recombination, recycled percolation process, ruthenium polypyridyl complex, seawater, sediment, sensitized nanocrystalline TiO₂, sensitizers, short-rotation, solar furnace, solar-cells, sp strain atcc-29133, sputtering deposition method, step gene replacement, surface-plasmon resonance, synergism, synthesis gas, system, TiO₂ films, TiO₂ thin-films, titanium-dioxide films, transgenic poplar, transport, trichoderma-reesei qm-9414, values, walled carbon nanotubes, waste paper, water-oxidation, wheat-straw mixtures, wood.

Appendix B: Additional visualizations

1.1 B.1 Author keywords

In this appendix, alternative visualizations/clusterings generated using the author-generated keywords are presented. As mentioned in Section 4.1, two alternative forms are presented here. The first, which is shown in Fig. 8, uses Sammon Mapping to generate a topographic representation of the inter-keyword distances. Secondly, k-means clusters were generated using the Jaccard distance, which was described in Section 3.1. This is presented in Table 3. We note that:

(1)
As discussed in Section 4.1, as well as by the labeling of highly similar clusters found in Fig. 8, the different visualizations were broadly in agreement with the earlier results as to the overall structure of the research landscape.
(2)
However, at the same time, the representations are not identical; this is not surprising since most of these methods are non-linear and would result in distortion and “stretching” of the visualization space.
(3)
While the results were consistent with the earlier results, we felt that the visualizations obtained using the Google distance followed by hierarchical clustering or k-means tended to be clearer.
(4)
For e.g., the Sammon map shown in Fig. 8 is quite difficult to read except under very high magnification. This is a result of the Sammon mapping technique which, by definition, places similar nodes very close to each other in the visualization space; in many cases this resulted in overlapping terms, while on the other hand large sections of the visualization space remained relatively sparse.
(5)
Also, when using the Jaccard distance the clusters tended to be more unbalanced. For e.g. in Table 3, we see that many of the keywords have been lumped together in cluster JK1. This implies that the mapping induced by the Jaccard distance was not able to achieve a good uniform “spread” of the keywords.

Table 3 Clusters generated automatically by applying the k-means algorithm to the author keywords data (Jaccard distances)

Full size table

1.2 B.2 Keyword plus

Similar trends were observed here as with the previous sub-section. Again, the broad structure of the research landscape seems to have been preserved. As before, Sammon Mapping (shown in Figs. 9 and 10) resulted in a somewhat cluttered representation of the landscape, while application of the k-means algorithm to the Jaccard distances again resulted in a less uniform distribution of keywords amongst clusters, where it can be seen that there are a significant number of clusters with only one keyword or phrase (nine such clusters were found in Table 4, as compared to only one such cluster in Table 2).

Table 4 Clusters generated using K-means: keyword plus data (Jaccard distances)

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Woon, W.L., Madnick, S. Semantic distances for technology landscape visualization. J Intell Inf Syst 39, 29–58 (2012). https://doi.org/10.1007/s10844-011-0182-3

Download citation

Received: 24 March 2011
Revised: 29 September 2011
Accepted: 29 September 2011
Published: 26 October 2011
Issue Date: August 2012
DOI: https://doi.org/10.1007/s10844-011-0182-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic distances for technology landscape visualization

Abstract

Access this article

Similar content being viewed by others

The journal coverage of Web of Science, Scopus and Dimensions: A comparative analysis

The journal coverage of Web of Science and Scopus: a comparative analysis

The bibliometric analysis of scholarly production: How great is the impact?

Notes

References

Acknowledgements