Introduction

Nanotechnology has revolutionized numerous application domains and is widely recognized as a critical indicator of a country’s technological competence. More than 60 countries have adopted national projects or programs, such as the United States’ National Nanotechnology Initiative (NNI, http://www.nano.gov) (Roco et al. 2000), to support nanotechnology research. The funding made available from various public and private resources and the growing interest in this domain have contributed to its rapid development and public recognition.

Different analysis methods have been proposed to assess nanotechnology’s development status. Patents have been used to represent commercialized research results in nanotechnology (Meyer 2001; Huang et al. 2003b, 2004), grant documents have been used to study the effect of public funding on nanotechnology (Huang et al. 2005; Roco 2005), and academic literature has been used to represent the research efforts in academia (Schummer 2004; Kostoff et al. 2006). During the past 30 years, a large number of scientific documents on nanotechnology development have been generated and stored in various databases around the world. However, previous studies have focused primarily on applying certain analytical techniques on specific data sets (in specific time periods and regions) to answer specific research questions. Few of them have had the intention of making the analytical tools and data sets available to the public.

The proposed Web-based knowledge mapping system has the potential to support the assessment of nanotechnology development by making the massive volume of nanotechnology-related documents available and by providing a set of flexible and easy-to-use analysis tools. However, a number of technical challenges need to be addressed for a system to function effectively:

  • Distributed collection of data/documents: Patents are published by the patent offices of different countries. Academic literature is published in various journals and stored in different databases. Searching for and collecting nanotechnology-related documents from multiple databases (each with its own interface) from around the world requires several different procedures and processes.

  • Unstructured data/document formats: Although digitized documents have been widely used in the storage of patents, grants, and other types of documents, such documents usually contain different data fields. To make the unstructured data ready for analysis, significant efforts are needed for data parsing and preprocessing.

  • Implementation of the analysis tools: The analysis tools need to be tailored to different documents’ characteristics and data fields. Algorithms for analyzing large-volume data sets in real time may need to be re-designed.

Due to these challenges, there are few knowledge mapping systems for scientific document analysis in the public nanotechnology domain. We therefore proposed a framework to use in building such knowledge mapping systems in order to analyze nanotechnology status. In the paper, we discussed the prototype system we created, Nano Mapper, which provides integrated Web access to a variety of visualization and analytical tools for nanotechnology patents from the United States Patent and Trademark Office (USPTO), European Patent Office (EPO), and Japan Patent Office (JPO) and grants from the U.S. National Science Foundation (NSF). In the current system, we do not include academic literature for copyright reasons.

In Section “Research background” of this paper, we briefly review the previous patent and grant analysis studies, and discuss existing nanotechnology Web portals/knowledge portals. In Section “Nano Mapper system design,” we present our methodology and Nano Mapper’s architecture and major functionalities. In Section “Nanotechnology Development in USPTO (2005–2006),” we analyze the nanotechnology patents published in the USPTO in 2005–2006 using Nano Mapper. Section “Conclusions” concludes the paper by summarizing our findings and discussing future work.

Research background

Patent analysis

Patents contain rich information about technology innovations. A large number of patents published in patent offices around the world are publicly available. As an important indicator of technological advancement, patents have been widely used to assess the research and development status of different domains (Narin 1994; Karki 1997; Oppenheim 2000), including nanotechnology (Huang et al. 2003b), gastroenterology (Lewison 1998), and high-technology fields (Huang et al. 2003a). In the nanotechnology domain, Meyer studied the interrelationships between academia and industry using patents from the USPTO and scientific literature from the Thomson Science Citation Index (Meyer 2001). Hullmann et al. used bibliometric measures on both patents and literature to assess nanotechnology’s status in the 1980s and 1990s (Hullmann and Meyer 2003). Huang et al. extended previous studies and developed a patent analysis framework that included bibliometric analysis, content analysis, and citation analysis to assess nanotechnology development at the country, institution, and technology field levels (Huang et al. 2003b, 2004).

Patents are managed by different patent offices throughout the world. Although many studies have used data from a single office, such a method may lead to biased analysis results. Previous research found that domestic applicants tend to file more patents with their home country patent office than foreign applicants do (“home advantage” effect) (European Commission 1997). This “home advantage” effect affects the composition of patents in patent databases (Ganguli 1998; Criscuolo 2006). In addition, patent offices worldwide have different examination procedures and policies, which may also affect patent publication and patent contents. To provide a more comprehensive understanding of global nanotechnology development, the patents from multiple patent offices have been analyzed (Li et al. 2007b; Chen et al. 2008).

Grant analysis

In recent years, a significant amount of public funding has been devoted to nanotechnology. In the United States, >5% of the National Science Foundation (NSF) budget was dedicated to supporting nanotechnology research in 2005 (Roco 2005). In Europe, funding from the European Commission and individual countries comprises the major portion of nanotechnology funding (Hullmann 2006).

Previous research has studied the impact of public funding on research and innovation in different domains by analyzing grant documents. Many of these studies used scientific publications as indicators of research output (Adams and Griliches 1998; Arora and Gambardella 1998; Narin 1998; Payne and Siow 2003) and found that the impact of public funding is dependent on the particular technology field. In the nanotechnology domain, Huang et al. (2005) studied the relationship between NSF funding and patent publications. They found that the patents published by NSF-funded researchers had a significantly higher impact on the nanotechnology domain as compared to other reference groups. They also found that the topics in grants change faster than those in patents.

Web portals for nanotechnology

In response to the rapid development of nanotechnology after 2000, several Web portals have been built to provide improved access to nanotechnology-related information (Table 1). The first type of portal focuses on providing nanotechnology-related news articles, interviews, and research reports, such as “Nanotechnology Now,” “Nano Tsunami,” and “Nano Science & Technology Institute.” The second type of portal aims to build a hub of URLs to nanotechnology Websites, forums, books, journals, databases, etc., such as “ENS Nanotechnology Portal” and “Nano Scout.” The third type of portal provides access to nanotechnology equipment, education materials and software; examples include “National Nanotechnology Infrastructure Network,” and “NanoHUB.” Lastly, there are Websites available that maintain the roadmap/history of nanotechnology and provide an introduction to the domain; one such example is the “Wikipedia Nanotechnology Portal.” These Websites can help researchers find nanotechnology-related information, but they do not systematically collect nanotechnology-related scientific documents or provide functionality for analyzing nanotechnology development. The well-established patent and grant analysis methods in previous studies have not been widely implemented in actual online applications/Websites. Building online systems with patent and grant analysis functionalities may better assist researchers and policy makers in nanotechnology to analyze the data and make decisions.

Table 1 Major nanotechnology knowledge portals

Nano Mapper system design

In this research project, we proposed a framework for building knowledge mapping systems for patent analysis and grant analysis for the nanotechnology domain. The framework contains three steps (see Fig. 1): data acquisition, parsing, and system building. We integrated multiple patent and grant data sets and selected data analysis and information visualization tools into one system. Our prototype system, Nano Mapper (http://nanomapper.eller.arizona.edu), is based on this framework.

Fig. 1
figure 1

Framework for building nanotechnology knowledge mapping systems

Data acquisition

We used keyword searching to collect nanotechnology-related documents (i.e., patents and grants) in various databases. Table 2 shows a list of nanotechnology keywords provided by domain experts that was used to search and retrieve documents from the online interfaces of the existing databases.

Table 2 Nanotechnology keywords and the number of patents collected from USPTO, EPO, JPO and grants collected from NSF

In Nano Mapper, the patents were collected from USPTO, EPO, and JPO which collectively cover three major regions in nanotechnology research (Huang et al. 2003b). USPTO provides online full-text access to patents issued since 1976, which can be searched using almost any of a patent’s data fields. EPO’s database, esp@cenet, provides access to European patents issued since 1978, which can be searched based on title, abstract, and some bibliographic information. The site esp@cenet also stores >80 countries’ patent applications. The JPO patent database (Patent Abstracts of Japan, PAJ) contains patents issued since 1976. This system is difficult to use for searching and retrieving patents. We chose to retrieve JPO patent applications from esp@cenet and check their publication status (whether application or registered patent) through PAJ. We kept only registered patents in our study.

Grants were retrieved from the NSF grant database. NSF provides online access to grant abstracts, which can be searched using almost any of a grant’s data fields.

Different databases provide different search interfaces to search patents, grants, or other documents. All four databases used to build the Nano Mapper prototype support keyword searching in document titles and abstracts (“title-abstract” search). Moreover, USPTO enables more complex search functions. Following the suggestions of domain experts, we also searched USPTO nanotechnology patents by matching the keywords on patent title, abstract, and claims (“title-claims” search) and on the entire patent document (“full-text” search) (Huang et al. 2003b). In general, “title-abstract” search provided more accurate results concerning the nanotechnology contents, while the other two search methods provided better coverage of nanotechnology-related patents. Table 2 shows the number of documents collected with each nanotechnology keyword from the four databases by different search methods.

Parsing

The documents retrieved from online databases are usually free text in html format. These documents need to be parsed into structured data and stored in a relational database. In general, each data source needs a separate parser. However, since the search interfaces seldom change, the parsers can be reused to annually update data collections for the system. In the Nano Mapper system, the patent parsers extract patent identification information (patent id, patent application number, patent priority number), bibliographic information (publication date, inventor name, applicant name), classification information (International classification, United States classification, European classification), citation information, and content information (title, abstract, claims, and description) from patents. The grant parsers extract grant ID, bibliographic information (start and expiration date, grant amount, principal investigator), funding agent information (NSF organization, program, and directorate), and content information (title, abstract) from grants.

System building

After parsing the collected documents into a database, a knowledge mapping system can be built based on the architecture shown in Fig. 2. It is a three-layer structure which contains a presentation layer, a logic control layer, and a database layer.

Fig. 2
figure 2

System architecture of the Nano Mapper

The presentation layer implements the user interface and provides Web access to five types of functions: search function, basic statistics, trend analysis, citation network analysis, and content map analysis. The search and statistics functions are implemented with JSP (Java Server Pages) dynamic pages. The visualizations are implemented using Java Applet. To visualize patent and grant publication trends in charts, we customized an open source java library—Chart 2D (http://chart2d.sourceforge.net). To visualize the citation networks, we customized an open source graph drawing software—Graphviz, provided by AT&T Labs (http://www.research.att.com/sw/tools/graphviz) (Gansner and North 2000). In order to visualize the content maps of nanotechnology-related patents and grants, we used the content map package developed by the Artificial Intelligence Lab, University of Arizona (http://ai.arizona.edu).

At the logic control layer, SQL queries are designed to perform search and analytical functions. To handle large data sets and provide online analysis of statistics, trends, and citation networks, some pre-computing is conducted and the publication statistics and citation statistics are summarized to year level. Searching these intermediate tables saves user query time. For content analysis, we identified major technology topics from the nanotechnology documents and generated content maps using the self-organizing map (SOM) algorithm (Chen et al. 1996; Ong et al. 2005). This is a time-consuming process, so content maps for selected time periods only are made available.

At the database layer, we use Microsoft SQL Server 2000 to store parsed patent and grant data for Nano Mapper.

Nano Mapper system functionalities

Search functions

The Nano Mapper system provides three searching functions for patents and grants. Users may search using:

  • Patent/grant identifiers.

  • Keywords in title, abstract, or (patent) claims

  • A combination of criteria on different patent/grant data fields (i.e., advanced search).

Nano Mapper also provides a combined search function, which searches for keywords in title/abstract on all four data sets simultaneously. The results from the four databases are shown together in one interface, which can then be browsed and compared.

Figure 3 illustrates the advanced search function using the USPTO data set as an example. In advanced search, the interface enables users to input criteria on most data fields. On USPTO patents, the data fields include patent title, examiner, inventor, assignee, assignee country, classification code, abstract, claims, etc. For some categorical data fields, e.g., assignee country, the interface provides lookup functions to help find the appropriate search criteria. For a user query, the result set will be sorted by publication date in a reverse order. The user can browse the results using the navigation bar at the bottom. The user can also access the details of any document, including all data fields in our system and the URLs to their original Websites.

Fig. 3
figure 3

Advanced search in Nano Mapper system (a) Search interface (b) Sample of result sets (c) Details of a patent (the contents of claims and description are omitted here)

Basic statistics

The Nano Mapper can calculate and display the statistics on patent/grant publication and citation status for selected time periods at different analytical levels. Figure 4 shows the interface of statistics generation with USPTO patents. For patents, the user can set the analytical level as country, institution, inventor, or technology field. The results can be sorted by the number of patents, the number of cites, and the average number of cites each analytical unit has. For USPTO patents, the user can restrict the statistics generation in the range of the data collected using any of the three search methods. The statistics can be downloaded in CSV format for further off-line study.

Fig. 4
figure 4

Country level statistics for nanotechnology-related patents in USPTO

Publication trend analysis

Nano Mapper can visualize and compare the annual publication trends of patents and grants at different analytical levels. Figure 5 shows the country level analysis on USPTO patents. The analytical units (countries in Fig. 5) can be easily modified. To add analytical units of interest, the user can search for names in a pop-up window. The interface also provides shortcuts to add the top 10 or the top 11 to 20 most productive analytical units into the comparison. The analysis results include a line chart and a table of statistics showing the different units’ number of publications in each year. The statistics can be downloaded in a CSV file format for further off-line study.

Fig. 5
figure 5

Country level publication trend analysis of nanotechnology-related patents in USPTO

Citation network analysis

Nano Mapper enables users to visualize patent citation networks at different analytical levels for different time periods (Fig. 6), which can be used to assess knowledge diffusion patterns (Huang et al. 2003b; Kostoff et al. 2006; Li et al. 2007a). To emphasize the more important citation relationships, the top 100 relationships between analytical units with the largest number of citations are visualized. In citation networks, the direction of a link represents the direction of the citations between two nodes. For example, a link from the “United States” pointing to “Germany” means that the United States’ patents cited German patents. Each link is labeled with the total number of citations.

Fig. 6
figure 6

USPTO country citation network (“title-claims” search, 1976–2006)

Content map analysis

Nano Mapper uses content map technology in order to identify and visualize major nanotechnology topics for different time periods in the document titles and abstracts. The research topics are represented by noun phrase keywords extracted from patent/grant documents using a Natural Language Processing tool, the Arizona Noun Phraser. The topics are organized by the multi-level self-organizing map algorithm (Chen et al. 1996; Ong et al. 2005) and visualized by the content map interface. As Fig. 7 shows, the content map interface contains two components: a folder tree (on the left side in Fig. 7) and a hierarchical content map. The folder tree displays the topics identified from nanotechnology-related patents or grants. The hierarchical content map displays corresponding topic regions in the map. Each topic region is labeled with the topic keyword and the number of documents. The size of a topic region is proportional to the number of documents related to that topic. Conceptually, more similar topics (according to their co-occurrence patterns in documents) are positioned closer on the map. If the user clicks a topic region, the sub-topics will be expanded on the interface. If there are no sub-topics, the documents related to the selected topic will be shown.

Fig. 7
figure 7

The content map for topics in USPTO nanotechnology-related patents from 2000 to 2004

Since generating a content map is time-consuming, we pre-generated a set of content maps for a sequence of time periods for each data set. For the content maps of two continuous time periods, we computed the growth rate of each topic area between the two maps. A baseline growth rate is computed at the entire content map level. A topic region with a similar growth rate to the base growth rate is assigned a green color. A topic region with a higher or lower growth rate is assigned a warmer or colder color, respectively (Fig. 7). If the topic is brand new, a red color is assigned to the region.

Nanotechnology development in USPTO (2005–2006)

We use the system outputs from Nano Mapper to assess the nanotechnology development status reflected in USPTO patents between 2005 and 2006. This is a continuation of our previous longitudinal studies (Huang et al. 2003b, 2004, 2006).

In the Nano Mapper database, we collected nanotechnology-related patents issued by the USPTO from 1976 to 2006. For 2005–2006 we collected (see summary in Table 3):

Table 3 Nanotechnology related patents issued by the USPTO 2005–2006 collected through title-abstract, title-claims, and full-text searches
  • 2,042 nanotechnology-related patents authored by 4,774 inventors from 874 assignee institutions in 31 countries by using “title-abstract” search.

  • 4,081 nanotechnology-related patents invented by 9,491 inventors from 1,585 assignee institutions in 34 countries by using “title-claims” search.

  • 18,953 nanotechnology-related patents invented by 40,216 inventors from 5,328 assignee institutions in 49 countries by using “full-text” search.

Figure 8 shows a graph of the annual publications of nanotechnology-related patents in USPTO from 1976 to 2006. Although the three search methods have different coverage, they show a similar growth pattern of nanotechnology development. In 2005–2006, the rapid growth of nanotechnology patent publication continued with some minor fluctuation. The growth rates between 2005 and 2006 were 20–30% using the three search methods.

Fig. 8
figure 8

Number of nanotechnology patents in USPTO using three types of search methods (1976–2006) (a) Normal scale (b) Log scale

Country analysis

Tables 46 present the 10 most productive nanotechnology assignee countries in the USPTO for 1976–2004 and 2005–2006 using different search methods. In general, the three search methods provide similar results. The United States and Japan continued to be the top 2 countries in 2005–2006. China (Taiwan), Republic of Korea, and Netherlands saw rapid growth. Their ranks rose significantly among all countries. Australia and China entered the top 10 assignee countries lists in 2005–2006, which indicated their rapid growth of nanotechnology innovation.

Table 4 Most productive assignee countries by “title-abstract” patent search (1976–2004 and 2005–2006)
Table 5 Most productive assignee countries by “title-claims” patent search (1976–2004 and 2005–2006)
Table 6 Most productive assignee countries by “full-text” patent search (1976–2004 and 2005–2006)

Tables 79 show the countries with a stronger impact on the nanotechnology domain according to the average number of cites per patent they received by December 2006. We include only the countries with a reasonable number of patents for comparison. Although the patents published in 2005–2006 have not received many citations, they still hint at the changes in each country’s impact. In general, the United States continued to have a very high impact among other productive countries. In 2005–2006, Australia, China (Taiwan), France, and Netherlands showed an increase in their impacts compared to other countries. Other productive countries, including Japan, Federal Republic of Germany, and Republic of Korea, showed a slight decrease in their impact rankings.

Table 7 High impact assignee countries with citations through December 2006 by “title-abstract” search (with >30 patents in 1976–2004 and >10 patents in 2005–2006)
Table 8 High impact assignee countries with citations through December 2006 by “title-claims” search (with >100 patents in 1976–2004 and >30 patents in 2005–2006)
Table 9 High impact assignee countries with citations through December 2006 with “full-text” search (with >300 patents in 1976–2004 and >100 patents in 2005–2006)

Institution analysis

Tables 1012 show the top 10 assignee institutions that have published the largest number of nanotechnology patents in the USPTO. The three search methods provide slightly different results. However, International Business Machines Corporation, The Regents of the University of California, Eastman Kodak Company, Minnesota Mining and Manufacturing (3M), and Micron Technology, Inc. continued to be the most productive institutions in 2005–2006 as in 1976–2004. Some institutions, including Hewlett-Packard Development Company, Samsung Electronics, and Intel Corporation, had a significant increase in nanotechnology patent publication and became the most productive in the domain.

Table 10 Most productive assignees by “title-abstract” patent search (1976–2004 and 2005–2006)
Table 11 Most productive assignees by “title-claims” patent search (1976–2004 and 2005–2006)
Table 12 Most productive assignees by “full-text” patent search (1976–2004 and 2005–2006)

Tables 1315 show the top 10 assignees that have high impact on the nanotechnology domain using different types of search in USPTO. We only keep the assignees with a reasonable number of patents. The three search methods provide slightly different results. In general, patents from some famous universities were cited more than others, including patents from the Board of Trustees of the Leland Stanford Junior University, the Regents of the University of California, Massachusetts Institute of Technology, etc. In 2005–2006, some institutions showed a more significant increase in their impact, including Nanosys, Micron Technology, Tsinghua University, Hitachi, Canon, etc. The Regents of the University of California and Micron Technology produced a large number of high impact patents, which indicates their significant role in nanotechnology.

Table 13 High impact assignees with citations through December 2006 by “title-abstract” search (with >10 patents in both 1976–2004 and 2005–2006)
Table 14 High impact assignees with citations through December 2006 by “title-claims” search (with >20 patents in both 1976–2004 and 2005–2006)
Table 15 High impact assignees with citations through December 2006 by “full-text” search (with >50 patents in both 1976–2004 and 2005–2006)

Technology field analysis

Following our previous research, we used the first-level United States Patent Classification categories (http://www.uspto.gov/go/classification/selectnumwithtitle.htm) as representations of USPTO patents’ technology fields.

Tables 1618 report the top technology fields to which more nanotechnology-related patents were assigned. In general, the three search methods provide similar results. The top technology fields were similar in the two time periods, but their ranks changed. In 2005–2006, technology fields “257: Active solid-state devices,” “438: Semiconductor device manufacturing,” and “423: Chemistry of inorganic compounds” experienced faster growth compared with other technology fields.

Table 16 Most productive technology fields by “title-abstract” patent search (1976–2004 and 2005–2006)
Table 17 Most productive technology fields by “title-claims” patent search (1976–2004 and 2005–2006)
Table 18 Most productive technology fields by “full-text” patent search (1976–2004 and 2005–2006)

Tables 1921 show the high impact nanotechnology fields in the USPTO. For comparison purposes, we use only the technology fields with a reasonable number of patents. The three search methods show slightly different results in the high impact technology fields. However, we notice that the relative impact of technology fields “257: Active solid-state devices (e.g., transistors, solid-state diodes),” “428: Stock material or miscellaneous articles,” and “438: Semiconductor device manufacturing: process” increased in both “title-abstract” search and “title-claims” search. In addition, technology field “423: Chemistry of inorganic compounds” continued to have a high impact on the domain.

Table 19 High impact technology fields with citations through December 2006 by “title-abstract” patent search (with >50 patents in 1976–2004 and >30 patents in 2005–2006)
Table 20 High impact technology fields with citations through December 2006 by “full-text” patent search (with >100 patents in 1976–2004 and >50 patents in 2005–2006)
Table 21 High impact technology fields with citations through December 2006 by “full-text” search (with >200 patents in 1976–2004 and >100 patents in 2005–2006)

Comparing both analyses, we noticed that technology fields 257, 438, and 423, had an increase in both number of patents and number of citations per patent. These three technology fields have attracted several researchers’ interest in recent years.

Conclusions

This paper presents our efforts to create an Internet knowledge mapping system to assess nanotechnology development status based on patent and grant analysis. A research framework and a prototype system, Nano Mapper, are presented for nanotechnology-related patents from USPTO, EPO, and JPO and grants from NSF in the interval 1976–2006. The Nano Mapper provides search functions, statistics, trend analysis, citation network analysis, and content map analysis to assist users’ online analysis.

Using Nano Mapper, we evaluated nanotechnology patents published in 2005–2006 by the USPTO and found that:

  • Nanotechnology patent publication continues the growth trend seen in previous years with a growth rate of 20–30% between 2005 and 2006.

  • The United States patents continued to have a high impact on the nanotechnology domain. China (Taiwan), the Republic of Korea, and the Netherlands experienced rapid growth in patent publication in USPTO in 2005–2006. The citation impact of the patents from Australia, China (Taiwan), France, and Netherlands increased significantly.

  • In the nanotechnology domain, International Business Machines Corporation, The Regents of the University of California, Eastman Kodak Company, Minnesota Mining and Manufacturing (3M), and Micron Technology, Inc. continued to be among the most productive institutions. Hewlett-Packard Development Company, Samsung Electronics, and Intel Corporation each saw a significant increase in nanotechnology publication. New institutions led by Nanosys, Micron Technology, Tsinghua University, Hitachi, and Canon increased their citation impact in 2005–2006.

  • In 2005–2006, there was rapid growth in patent publication in technology fields “257: Active solid-state devices,” “438: Semiconductor device manufacturing,” and “423: Chemistry of inorganic compounds” as compared with other technology fields. The impact of the patents in technology fields “257: Active solid-state devices (e.g., transistors, solid-state diodes),” “428: Stock material or miscellaneous articles,” and “438: Semiconductor device manufacturing: process” also increased during the same time period. Technology field “423: Chemistry of inorganic compounds” continued to have a strong impact on the nanotechnology domain.

The Nano Mapper system provides a search and analysis infrastructure for researchers and policy makers. In our future research, we plan to annually update the data sets in the system. We will incorporate other types of scientific documents into our framework and introduce additional analytical and visualization methods.