Skip to main content

Advertisement

Log in

Semantic distances for technology landscape visualization

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

This paper presents a novel approach to the visualization of research domains in science and technology. The proposed methodology is based on the use of bibliometrics; i.e., analysis is conducted using information regarding trends and patterns of publication rather than the actual content. In particular, we explore the use of term co-occurrence frequencies as an indicator of semantic closeness between pairs of terms. To demonstrate the utility of this approach, a number of visualizations are generated for a collection of renewable energy related keywords. As these keywords are regarded as manifestations of the associated research topics, we contend that the proposed visualizations can be interpreted as representations of the underlying technology landscape.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Year 2005. Source: Energy Information Administration, DOE, US Government.

  2. http://scholar.google.com

  3. http://www.scirus.org

References

  • Antolín, G., Tinaut, F. V., Briceño, Y., Castaño, V., Pérez, C., & Ramírez, A. I. (2002). Optimisation of biodiesel production by sunflower oil transesterification. Bioresource Technology, 83(2), 111–114.

    Article  Google Scholar 

  • Anuradha, K., & Urs, S. (2007). Bibliometric indicators of indian research collaboration patterns: A correspondence analysis. Scientometrics, 71(2), 179–189.

    Article  Google Scholar 

  • Baek, N. C., Shin, U. C., & Yoon, J. H. (2005). A study on the design and analysis of a heat pump heating system using wastewater as a heat source. Solar Energy, 78(3), 427–440.

    Article  Google Scholar 

  • Bengisu, M., & Nekhili, R. (2006). Forecasting emerging technologies with the aid of science and technology databases. Technological Forecasting and Social Change, 73(7), 835–844.

    Article  Google Scholar 

  • Bishop, C. (1995). Neural networks for pattern recognition. London: Oxford University Press.

    Google Scholar 

  • Bishop, C. (2006). Pattern recognition and machine learning. Information science and statistics. Singapore: Springer.

    Google Scholar 

  • Börner, K., Dall’Asta, L., Ke, W., & Vespignani, A. (2005). Studying the emerging global brain: Analyzing and visualizing the impact of co-authorship teams. Complexity, 10(4), 57–67.

    Article  Google Scholar 

  • Braun, T., Schubert, A. P., & Kostoff, R. N. (2000). Growth and trends of fullerene research as reflected in its journal literature. Chemical Reviews, 100(1), 23–38.

    Article  Google Scholar 

  • Chiu, W.-T., & Ho, Y.-S. (2007). Bibliometric analysis of tsunami research. Scientometrics, 73(1), 3–17.

    Article  Google Scholar 

  • Cilibrasi, R., & Vitanyi, P. (2006). Automatic extraction of meaning from the web. In IEEE international symp. information theory.

  • Cilibrasi, R. L., & Vitanyi, P. M. B. (2007). The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3), 370–383.

    Article  Google Scholar 

  • Daim, T. U., Rueda, G., Martin, H., & Gerdsri, P. (2006). Forecasting emerging technologies: Use of bibliometrics and patent analysis. Technological Forecasting and Social Change, 73(8), 981–1012.

    Article  Google Scholar 

  • Daim, T. U., Rueda, G. R., & Martin, H. T. (2005). Technology forecasting using bibliometric analysis and system dynamics. In Technology management: A unifying discipline for melting the boundaries (pp. 112–122).

  • de Miranda, C., Dos, G. M., & Filho, L. F. (2006). Text mining as a valuable tool in foresight exercises: A study on nanotechnology. Technological Forecasting and Social Change, 73(8), 1013–1027.

    Google Scholar 

  • Ding, Y., Chowdhury, G., & Foo, S. (2001). Bibliometric cartography of information retrieval research by using co-word analysis. Information Processing & Management, 37(6), 817–842.

    Article  MATH  Google Scholar 

  • Elnekave, M. (2008). Adsorption heat pumps for providing coupled heating and cooling effects in olive oil mills. International Journal of Energy Research, 32(6), 559–568.

    Article  Google Scholar 

  • Glänzel, W., & Schubert, A. (2005). Analysing scientific networks through co-authorship. In Handbook of quantitative science and technology research (pp. 257–276).

  • Hansel, A., & Lindblad, P. (1998). Towards optimization of cyanobacteria as biotechnologically relevant producers of molecular hydrogen, a clean and renewable energy source. Applied Microbiology and Biotechnology, 50(2), 153–160.

    Article  Google Scholar 

  • Igami, M. (2008). Exploration of the evolution of nanotechnology via mapping of patent applications. Scientometrics, 77(2), 289–308.

    Article  Google Scholar 

  • Janssens, F., Leta, J., Glänzel, W., & De Moor, B. (2006). Towards mapping library and information science. Information Processing & Management, 42(6), 1614–1642.

    Article  Google Scholar 

  • Kajikawa, Y., & Takeda, Y. (2008). Structure of research on biomass and bio-fuels: A citation-based approach. Technological Forecasting and Social Change, 75(9), 1349–1359.

    Article  Google Scholar 

  • Kajikawa, Y., Yoshikawa, J., Takeda, Y., & Matsushima, K. (2007). Tracking emerging technologies in energy research: Toward a roadmap for sustainable energy. Technological Forecasting and Social Change, 75(6), 771–782.

    Article  Google Scholar 

  • Kim, M.-J. (2007). A bibliometric analysis of the effectiveness of Korea’s biotechnology stimulation plans, with a comparison with four other Asian nations. Scientometrics, 72(3), 371–388.

    Article  Google Scholar 

  • King, D. A. (2004). The scientific impact of nations. Nature, 430(6997), 311–316.

    Article  Google Scholar 

  • Kostoff, R. N. (2001). Text mining using database tomography and bibliometrics: A review. Technological Forecasting and Social Change, 68, 223–253.

    Article  Google Scholar 

  • Losiewicz, P., Oard, D., & Kostoff, R. (2000). Textual data mining to support science and technology management. Journal of Intelligent Information Systems, 15(2), 99–119.

    Article  Google Scholar 

  • Manning, C., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval (Vol. 1). Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Martino, J. (1993). Technological forecasting for decision making. McGraw-Hill Engineering and Technology Management Series.

  • Mcdowall, W., & Eames, M. (2006). Forecasts, scenarios, visions, backcasts and roadmaps to the hydrogen economy: A review of the hydrogen futures literature. Energy Policy, 34(11), 1236–1250.

    Article  Google Scholar 

  • Morel, C., Serruya, S., Penna, G., & Guimarães, R. (2009). Co-authorship network analysis: A powerful tool for strategic planning of research, development and capacity building programs on neglected diseases. PLoS Neglected Tropical Diseases, 3(8), e501.

    Article  Google Scholar 

  • Porter, A. (2005). Tech mining. Competitive Intelligence Magazine, 8(1), 30–36.

    Google Scholar 

  • Porter, A. (2007). How “tech mining” can enhance R&D management. Research Technology Management, 50(2), 15–20.

    Google Scholar 

  • Porter, A., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719–745.

    Article  Google Scholar 

  • Porter, A., Roper, A., Mason, T., Rossini, F., & Banks, J. (1991). Forecasting and management of technology. New York: Wiley-Interscience.

    Google Scholar 

  • Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4(4), 406–425.

    Google Scholar 

  • Saka, A., & Igami, M. (2007). Mapping modern science using co-citation analysis. In IV ’07: Proceedings of the 11th international conference information visualization, Washington, DC, U.S.A. (pp. 453–458). Los Alamitos: IEEE Computer Society.

    Google Scholar 

  • Sammon, J. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 100(18), 401–409.

    Article  Google Scholar 

  • Smalheiser, N. R. (2001). Predicting emerging technologies with the aid of text-based data mining: The micro approach. Technovation, 21(10), 689–693.

    Article  Google Scholar 

  • Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68(3), 595–610.

    Article  Google Scholar 

  • Takeda, Y., & Kajikawa, Y. (2009). Optics: A bibliometric approach to detect emerging research domains and intellectual bases. Scientometrics, 78(3), 543–558.

    Article  Google Scholar 

  • Takeda, Y., Mae, S., Kajikawa, Y., & Matsushima, K. (2009). Nanobiotechnology as an emerging research domain from nanotechnology: A bibliometric approach. Scientometrics, 80(1), 23–38.

    Article  Google Scholar 

  • Upham, S., & Small, H. (2010). Emerging research fronts in science and technology: Patterns of new knowledge development. Scientometrics, 83(1), 15–38.

    Article  Google Scholar 

  • Van Der Heijden, K. (2000). Scenarios and forecasting—Two perspectives. Technological Forecasting and Social Change, 65, 31–36.

    Article  Google Scholar 

  • Woon, W., & Madnick, S. (2009). Asymmetric information distances for automated taxonomy construction. Knowledge and Information Systems, 21, 91–111. doi:10.1007/s10115-009-0203-5.

    Article  Google Scholar 

  • Woon, W. L., Zeineldin, H., & Madnick, S. (2011). Bibliometric analysis of distributed generation. Technological Forecasting and Social Change, 78(3), 408–420.

    Article  Google Scholar 

  • Zhu, D., & Porter, A. (2002). Automated extraction and visualization of information for technological intelligence and forecasting. Technological Forecasting and Social Change, 69(5), 495–506.

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the Masdar Institute of Science and Technology (MIST) and the Masdar Initiative for their support of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Lee Woon.

Appendices

Appendix A: Renewable energy related keywords

1.1 A.1 Keywords from Kajikawa et al.

Combustion, coal, battery, petroleum, fuel cell, wastewater, heat pump, engine, solar cell, power system.

1.2 A.2 Author keywords

Biomass, CDS, CDTE, energy efficiency, gasification, global warming, least-cost energy policies, power generation, populus, qtl, renewable energy, review, sustainable farming and forestry, adsorption, alternative fuel, arabidopsis, ash deposits, bio-fuels, biodiesel, biomass, biomass-fired power boilers, carbon nanotubes, chemicals, co-firing, coal, corn stover, electricity, emissions, energy balance, energy conversion, energy economy and management, energy policy, energy sources, enzymatic digestion, fast pyrolysis, fuels, gas engines, gas storage, gasification, genome sequence, genomics, high efficiency, hydrolysis, inorganic material, investment, landfill, model plant, natural gas, poplar, pretreatment, pyrolysis, renewable energy, renewables, sugars, sunflower oil, thermal conversion, thermal processing, thin films, transesterification.

1.3 A.3 Keyword plus

16s ribosomal-rna, activation, active-site, adsorption, agrobacterium-mediated transformation, anabaena-variabilis, anacystis-nidulans, aqueous ammonia, bidirectional hydrogenase, biomass conversion processes, briquettes, canopy structure, catalysts, cds, cdte, cells, cellulases, cellulose, ch4, chalcopyrite, charge-transfer dynamics, chemical heat pipe, co2, cocombustion, combustion, composites, conversion, corn stover, coupled electron-transfer, devolatilization, diesel-power-plant, differentiation, dye, efficiency, electrocatalytic hydrogen evolution, electrochemical reduction, electrodes, electron-transfer, elemental sulfur, energy, enzymatic-hydrolysis, families, fermi-level equilibration, films, fimi, flash pyrolysis, fluidized-bed, fuel, fuel-cell vehicles, fuels, functionalized gold nanoparticles, gasification, gasoline, gel electrolyte, gene-transfer, genetic-linkage maps, glycosyl hydrolases, grain morphology, graphite nanofibers, herbaceous biomass, homogeneous catalysis, hybrid poplar, hydrogen, hydrogen-peroxide, hydrogen-production, hydrolysis, ignition, infrastructure, kinetics, light interception, lignin removal, lignocellulosic materials, lime pretreatment, liquefaction, liquid, liquids, mechanisms, metal-complexes, metals, molecular-genetics, monte-carlo simulations, mutagenesis, nanocrystalline semiconductor-films, nickel, nitrogen-fixation, open-top chambers, oxidative addition, partial oxidation, particles, photoelectrochemical cells, photoelectrochemical properties, photoinduced electron-transfer, photonic crystals, photoproduction, photosystem-ii, physisorption, place-exchange-reactions, pores, pressure cooking, products, proton reduction, pulverized coal, pyrolysis, rapd markers, recombination, recycled percolation process, ruthenium polypyridyl complex, seawater, sediment, sensitized nanocrystalline TiO2, sensitizers, short-rotation, solar furnace, solar-cells, sp strain atcc-29133, sputtering deposition method, step gene replacement, surface-plasmon resonance, synergism, synthesis gas, system, TiO2 films, TiO2 thin-films, titanium-dioxide films, transgenic poplar, transport, trichoderma-reesei qm-9414, values, walled carbon nanotubes, waste paper, water-oxidation, wheat-straw mixtures, wood.

Appendix B: Additional visualizations

1.1 B.1 Author keywords

In this appendix, alternative visualizations/clusterings generated using the author-generated keywords are presented. As mentioned in Section 4.1, two alternative forms are presented here. The first, which is shown in Fig. 8, uses Sammon Mapping to generate a topographic representation of the inter-keyword distances. Secondly, k-means clusters were generated using the Jaccard distance, which was described in Section 3.1. This is presented in Table 3. We note that:

  1. (1)

    As discussed in Section 4.1, as well as by the labeling of highly similar clusters found in Fig. 8, the different visualizations were broadly in agreement with the earlier results as to the overall structure of the research landscape.

  2. (2)

    However, at the same time, the representations are not identical; this is not surprising since most of these methods are non-linear and would result in distortion and “stretching” of the visualization space.

  3. (3)

    While the results were consistent with the earlier results, we felt that the visualizations obtained using the Google distance followed by hierarchical clustering or k-means tended to be clearer.

  4. (4)

    For e.g., the Sammon map shown in Fig. 8 is quite difficult to read except under very high magnification. This is a result of the Sammon mapping technique which, by definition, places similar nodes very close to each other in the visualization space; in many cases this resulted in overlapping terms, while on the other hand large sections of the visualization space remained relatively sparse.

  5. (5)

    Also, when using the Jaccard distance the clusters tended to be more unbalanced. For e.g. in Table 3, we see that many of the keywords have been lumped together in cluster JK1. This implies that the mapping induced by the Jaccard distance was not able to achieve a good uniform “spread” of the keywords.

Fig. 8
figure 8

Sammon map of author keywords data. Thematic clusters have been highlighted, and where possible have been linked to clusters found in the hierarchical maps

Table 3 Clusters generated automatically by applying the k-means algorithm to the author keywords data (Jaccard distances)

1.2 B.2 Keyword plus

Similar trends were observed here as with the previous sub-section. Again, the broad structure of the research landscape seems to have been preserved. As before, Sammon Mapping (shown in Figs. 9 and 10) resulted in a somewhat cluttered representation of the landscape, while application of the k-means algorithm to the Jaccard distances again resulted in a less uniform distribution of keywords amongst clusters, where it can be seen that there are a significant number of clusters with only one keyword or phrase (nine such clusters were found in Table 4, as compared to only one such cluster in Table 2).

Fig. 9
figure 9

Sammon map of keyword plus terms, set 1. Thematic clusters have been highlighted, and where possible have been linked to clusters found in the hierarchical maps

Fig. 10
figure 10

Sammon map of author keyword plus terms, set 2. Thematic clusters have been highlighted, and where possible have been linked to clusters found in the hierarchical maps

Table 4 Clusters generated using K-means: keyword plus data (Jaccard distances)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Woon, W.L., Madnick, S. Semantic distances for technology landscape visualization. J Intell Inf Syst 39, 29–58 (2012). https://doi.org/10.1007/s10844-011-0182-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-011-0182-3

Keywords

Navigation