Skip to main content
Log in

Automatic patent document summarization for collaborative knowledge systems and services

  • Published:
Journal of Systems Science and Systems Engineering Aims and scope Submit manuscript


Engineering and research teams often develop new products and technologies by referring to inventions described in patent databases. Efficient patent analysis builds R&D knowledge, reduces new product development time, increases market success, and reduces potential patent infringement. Thus, it is beneficial to automatically and systematically extract information from patent documents in order to improve knowledge sharing and collaboration among R&D team members. In this research, patents are summarized using a combined ontology based and TF-IDF concept clustering approach. The ontology captures the general knowledge and core meaning of patents in a given domain. Then, the proposed methodology extracts, clusters, and integrates the content of a patent to derive a summary and a cluster tree diagram of key terms. Patents from the International Patent Classification (IPC) codes B25C, B25D, B25F (categories for power hand tools) and B24B, C09G and H011 (categories for chemical mechanical polishing) are used as case studies to evaluate the compression ratio, retention ratio, and classification accuracy of the summarization results. The evaluation uses statistics to represent the summary generation and its compression ratio, the ontology based keyword extraction retention ratio, and the summary classification accuracy. The results show that the ontology based approach yields about the same compression ratio as previous non-ontology based research but yields on average an 11% improvement for the retention ratio and a 14% improvement for classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


  1. Aizawa, A. (2003). An information-theoretic perspective of TF-IDF measures. Information Processing & Management, 39(1): 45–65

    Article  MATH  MathSciNet  Google Scholar 

  2. Aone, C., Okurowski, M.E., Gorlinsky, J. & Larsen, B. (1997). A scalable summarization system using robust NLP. In: Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization, 10–17, Madrid, Spain, 1997

  3. Blanchard, A. (2007). Understanding and customizing stopword lists for enhanced patent mapping. World Patent Information, 29(4): 308–316

    Article  Google Scholar 

  4. Bobillo, F., Delgado, M. & Gómez-Romero, J. (2008). Representation of context-dependant knowledge in ontologies: a model and an application. Expert Systems with Applications, 35(4): 1899–1908

    Article  Google Scholar 

  5. Brown, C.T. (2006). Stapling Device. United States Patent, No. US 7,014,088 B2

  6. Buitelaar, P., Cimiano, P., Frank, A., Hartung, M. & Racioppa, S. (2008). Ontology-based information extraction and integration from heterogeneous data sources. International Journal of Human-Computer Studies, 66(11): 759–788

    Article  Google Scholar 

  7. Chung, T.M. & Nation, P. (2004). Identifying technical vocabulary. System, 32(2): 251–263

    Article  Google Scholar 

  8. Edmundson, H.P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2): 264–285

    Article  MATH  Google Scholar 

  9. Ercan, G. & Cicekli, I. (2007). Using lexical chains for keyword extraction. Information Processing & Management, 43(6): 1705–1714

    Article  Google Scholar 

  10. Fattori, M., Pedrazzi, G. & Turra, R. (2003). Text mining applied to patent mapping: a practical business case. World Patent Information 25: 335–342

    Article  Google Scholar 

  11. Fum, D., Guida, G. & Tasso, C. (1985). Evaluating importance: a step towards text summarization, In: Proceedings of the 9th International Joint Conference on Artificial Intelligence, 840–844, Los Angeles, CA, USA

  12. Goldstein, J., Kantrowitz, M., Mittal, V. & Carbonell, J. (1999). Summarizing text documents: sentence selection and evaluation metrics. In: Research and Development in Information Retrieval. Available via DIALOG.

  13. Gong, Y. & Liu, X. (2001). Generic text summarization using relevance measure and latent semantic analysis, In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Available via DIALOG.

  14. Greiff, W.R. (1998). A Theory of Term Weighting Based on Exploratory Data Analysis. Computer Science Department, University of Massachusetts, Amherst

    Google Scholar 

  15. Gruber, T.R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, 5(2): 199–220

    Article  Google Scholar 

  16. Han, J. & Kamber, M. (2000). Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco, California

    Google Scholar 

  17. Hassel, M. (2004). Evaluation of automatic text summarization — a practical implementation. Licentiate Thesis, Department of Numerical Analysis and Computer Science, Royal Institute of Technology, Stockholm, Sweden

    Google Scholar 

  18. Hovy, E. & Lin, C.Y. (1999). Automated text summarization in SUM MARIST. In: Advances in Automatic Text Summarization. Available via DIALOG.

  19. Hsu, S.H. (2003). Ontology-based semantic annotation authoring and retrieval (in Chinese). M.S. Thesis, Department of Computer Science, National Dong Hwa University, Hualien, Taiwan, China

    Google Scholar 

  20. Hsu, F.C., Trappey, A.J.C., Hou, J.L., Trappey, C.V. & Liu, S.J. (2006). Technology and knowledge document clustering analysis for enterprise R&D strategic planning. International Journal Technology Management, 36(4): 336–353

    Article  Google Scholar 

  21. Hu, Y., Li, H., Cao, Y., Teng, L., Meyerzon, D. & Zheng, Q. (2006). Automatic extraction of titles from general documents using machine learning. Information Processing & Management, 42(5): 1276–1293

    Article  Google Scholar 

  22. Joung, Y.J. & Chuang, F.Y. (2009). OntoZilla: an ontology-based, semi-structured, and evolutionary peer-to-peer network for information systems and services. Future Generation Computer Systems, 25(1): 53–63

    Article  Google Scholar 

  23. Wu, J., Xiong, H., Chen, J. & Zhang, W. (2007). A generalization of proximity functions for K-means. In: Seventh IEEE International Conference on Data Mining, 361–370

  24. Kim, N.H., Jung, S.Y., Kang, C.S. & Lee, Z.H. (1999). Patent information retrieval system. Journal of Korea Information Processing, 6(3): 80–85

    Google Scholar 

  25. Ko, Y., Kim, K. & Seo, J. (2003). Topic keyword identification for text summarization using lexical clustering. In: IEICE Trans. Inform. System, 1695–1701. Available via DIALOG.

  26. Kupiec, J., Pedersen, J. & Chen, F. (1995). A trainable document summarizer. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’95), 68–73, Seattle, WA, USA

  27. Lam-Adesina, A.M. & Jones, G.J.F. (2001). Applying summarization techniques for term selection in relevance feedback. In: Proceedings of the 24th Annual International ACM SIGIR’01 Conference on Research and Development in Information Retrieval, 1–9, New Orleans, Louisiana, September 9-13, 2001

  28. Li, Y.R., Wang, L.H. & Hong, C.F. (2008). Extracting the significant-rare keywords for patent analysis. Expert Systems with Applications, In Press, Corrected Proof, Available Online 8 July

  29. Lin, C.Y. & Hovy, E.H. (1997). Identifying topics by position. In: Proceedings of the Applied Natural Language Processing Conference (ANLP-97), 283–290, Washington, D.C., March 31–April 3, 1997

  30. Lin, F.R. & Liang, C.H. (2008) Storyline-based summarization for news topic retrospection. Decision Support Systems, 45(3): 473–490

    Article  MathSciNet  Google Scholar 

  31. Lorch, R.F., Lorch, E.P., Ritchey, K., McGovern, L. & Coleman, D. (2001). Effects of headings on text summarization. Contemporary Educational Psychology, 26: 171–191

    Article  Google Scholar 

  32. Luhn, H.P. (1957), A statistical approach to mechanized encoding and searching of literary information. IBM Journal of Research and Development, 1(4): 309–317

    MathSciNet  Google Scholar 

  33. Luhn, H.P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2): 159–165

    Article  MathSciNet  Google Scholar 

  34. Mani, I. & Bloedorn, E. (1999). Summarizing similarities and differences among related documents. Information Retrieval, 1(1–2): 35–67

    Article  Google Scholar 

  35. Mani, I., House, D., Klein, G., Hirschman, L., Firmin, T. & Sundheim, B. (1998). The TIPSTER SUMMAC Text Summarization Evaluation. MITRE Technical Report, Washington, D.C., 1–47

  36. Mani, I. & Maybury, M.T. (1999). Advances in Automated Text Summarization. The MIT Press, Cambridge, MA

    Google Scholar 

  37. Morris, J. & Hirst, G. (1991). Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17(1): 21–43

    Google Scholar 

  38. Princeton University. (2006). WordNet 3.0. Available via DIALOG.

  39. Reeve, L.H., Han, H. & Brooks, A.D. (2007). The use of domain-specific concepts in biomedical text summarization. Information Processing & Management, 43(6): 1765–1776

    Article  Google Scholar 

  40. Rodrigues, T., Rosa, P. & Cardoso, J. (2008). Moving from syntactic to semantic organizations using JXML2OWL. Computers in Industry, 59(8): 808–819

    Article  Google Scholar 

  41. Salton, G. & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Journal of Information Processing & Management, 24(5): 513–523

    Article  Google Scholar 

  42. Salton, G., Singhal, A., Mitra, M. & Buckley, C. (1997). Automatic text structuring and summarization. Information Processing & Management, 33(2): 193–207

    Article  Google Scholar 

  43. Sharma, S.C. (1996). Applied Multivariate Techniques. John Wiley & Sons, Hoboken, New York

    Google Scholar 

  44. Teufel, S. & Moens, M. (1997). Sentence extraction as a classification task. In: Proceedings of the ACL/EACL Workshop on Intelligent Scalable Summarization, 58–65, Madrid, Spain

  45. Trappey, A.J.C., Hsu, F.C., Trappey, C.V. & Liu, C.I. (2006). Development of a patent document classification and search platform using a back-propagation network. Expert Systems with Applications, 31: 755–765

    Article  Google Scholar 

  46. Trappey, A.J.C. & Trappey, C.V. (2008). An R&D knowledge management method for patent document summarization. Industrial Management and Data System, 108(2): 245–257

    Article  Google Scholar 

  47. Tseng, Y.H., Lin, C.J. & Lin, Y.I. (2007). Text mining techniques for patent analysis. Information Processing & Management, 43(5): 1216–1247

    Article  Google Scholar 

  48. Wu, J., Xiong, H., Chen, J. & Zhou, W. (2007). A generalization of proximity functions for k-means. In: Seventh IEEE International Conference on Data Mining, 28–31, Omaha, NE, USA

  49. Ye, J.S., Chua, H.T., Kan, W.M. & Qiu, I.L. (2007). Document concept lattice for text understanding and summarization. Information Processing & Management, 43(6): 1643–1662

    Article  Google Scholar 

  50. Yeh, J.Y., Ke, H.R. & Yang, W.P. (2002). Chinese text summarization using a trainable summarizer and latent semantic analysis. In: Proceedings of the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology, 76–87, ISBN: 3-540-00261-8. Available via DIALOG.

  51. Yeh, J.Y., Ke, H.R., Yang, W.P. & Meng, I.H. (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing & Management, 41(1): 75–95

    Article  Google Scholar 

  52. Yeh, J.Y., Ke, H.R & Yang, W.P. (2008). iSpreadRank: ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network. Expert Systems with Applications, 35(3): 1451–1462

    Article  Google Scholar 

  53. Young, S.R. & Hayes, P.J. (1985). Automatic classification and summarization of banking telexes. In: Proceedings of the 2nd Conference on Artificial Intelligence Application, 402–408

  54. Zhang, W., Yoshida, T. & Tang, X. (2008). Text classification based on multi-word with support vector machine. Knowledge-Based Systems, In Press, Corrected Proof, Available Online 4 April

  55. Zheng, H.T., Kang, B.Y. & Kim, H.G. (2008). An ontology-based approach to learnable focused crawling. Information Sciences, 178(23): 4512–4522

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Charles V. Trappey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Trappey, A.J., Trappey, C.V. & Wu, CY. Automatic patent document summarization for collaborative knowledge systems and services. J. Syst. Sci. Syst. Eng. 18, 71–94 (2009).

Download citation

  • Published:

  • Issue Date:

  • DOI: