Semantic fingerprints-based author name disambiguation in Chinese documents


Author name disambiguation is an important problem that needs to be resolved in bibliometric analysis or tech mining. Many techniques have been presented; however, most of them require a long run time or additional information. A new method based on semantic fingerprints was presented to disambiguate author names without external data. A manually annotated dataset was built to testify on the efficiency of the presented method. Experiments using co-author features, institution features, and text fingerprints were conducted respectively. We found that the first two methods had higher precision, but their recall was low, and the text fingerprint method had higher recall and satisfied precision. Based on these results, we integrated co-author features, institution features, and text fingerprints to provide semantic fingerprints for disambiguating author names and achieving better performance on the F-measure.

This work is mainly supported by the National Natural Science Foundation of China (Project 71473237), and partially supported by The National Key Technology R&D Program of Chinese 12th Five-Year Plan (2011–2015) (2015BAH25F01), and The Program of the China Knowledge Centre for Engineering Science and Technology (CKCEST-2016-2-10). Authors are grateful to the National Natural Science Foundation of China, the Ministry of Science and Technology of China, and the Chinese Academy of Engineering for their financial support to carry out this work.

Han, H., Yao, C., Fu, Y. et al. Semantic fingerprints-based author name disambiguation in Chinese documents. Scientometrics 111, 1879–1896 (2017).

  Name disambiguation
  Simhash
  Semantic fingerprint