Text Simplification of Patent Documents

  • Jeongwoo KangEmail author
  • Achille Souili
  • Denis Cavallucci
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 541)


This paper represents an automatic text simplification system for patent documents. The simplification system is embedded in the broader context of an information retrieval system which extracts IDM related knowledge from patent documents. Extracting elements of IDM ontology from patents involves training machine-learning model. However, an accuracy of the model is compromised when the given text is too long, hence the need of simplifying the texts to improve machine learning. There have been precedent studies on automatic text simplification based on hand-written rules or statistical approach. However, few researches addressed simplifying patent documents. Patent document has its particularity in its lengthy sentences and multiword expression terminology, which often hinder accurate parsing. Therefore, in this research, we present our method to automatically simplify texts of patent documents and scientific papers by analyzing their syntactic and lexical patterns.


Inventive Design Method TRIZ Information extraction Text simplification Syntactic analysis Text mining 


  1. 1.
    Souili, W.M.A.: Contribution à la méthode de conception inventive par l’extraction automatique de connaissances des textes de brevets d’invention (2015)Google Scholar
  2. 2.
    Cavallucci, D., Khomenko, N.: From TRIZ to OTSM-TRIZ: addressing complexity challenges in inventive design. Int. J. Prod. Dev. 4, 4–21 (2006)CrossRefGoogle Scholar
  3. 3.
    Altshuller, G.: And suddenly the inventor appeared: TRIZ, the theory of inventive problem solving. Technical Innovation Center, Inc. (1996)Google Scholar
  4. 4.
    Hua, Z., Yang, J., Coulibaly, S., Zhang, B.: Integration TRIZ with problem-solving tools: a literature review from 1995 to 2006. Int. J. Bus. Innov. Res. 1, 111–128 (2006)CrossRefGoogle Scholar
  5. 5.
    Souili, A., Cavallucci, D.: Toward an automatic extraction of IDM concepts from patents. In: Chakrabarti, A. (ed.) CIRP Design 2012, pp. 115–124. Springer, Heidelberg (2013). Scholar
  6. 6.
    Cavallucci, D., Rousselot, F.: Structuring Knowledge Use in Inventive Design. Springer, New York (2007)Google Scholar
  7. 7.
    Lee, J., Don, J.B.K.P.: Splitting complex English sentences. In: Proceedings of the 15th International Conference on Parsing Technologies, pp. 50–55 (2017)Google Scholar
  8. 8.
    Carroll, J., Minnen, G., Pearce, D., Canning, Y., Devlin, S., Tait, J.: Simplifying text for language-impaired readers. In: Ninth Conference of the European Chapter of the Association for Computational Linguistics (1999)Google Scholar
  9. 9.
    Inui, K., Fujita, A., Takahashi, T., Iida, R., Iwakura, T.: Text simplification for reading assistance: a project note. In: Proceedings of the Second International Workshop on Paraphrasing, vol. 16, pp. 9–16. Association for Computational Linguistics (2003)Google Scholar
  10. 10.
    Chandrasekar, R., Doran, C., Srinivas, B.: Motivations and methods for text simplification. In: Proceedings of the 16th Conference on Computational Linguistics, vol. 2, pp. 1041–1044. Association for Computational Linguistics (1996)Google Scholar
  11. 11.
    Knight, K., Marcu, D.: Statistics-based summarization-step one: sentence compression. In: AAAI/IAAI, pp. 703–710 (2000)Google Scholar
  12. 12.
    Poornima, C., Dhanalakshmi, V., Anand, K.M., Soman, K.P.: Rule based sentence simplification for english to tamil machine translation system. Int. J. Comput. Appl. 25, 38–42 (2011)Google Scholar
  13. 13.
    Siddharthan, A.: An architecture for a text simplification system. In: 2002 Proceedings of Language Engineering Conference, pp. 64–71. IEEE (2002)Google Scholar
  14. 14.
    Siddharthan, A.: Text simplification using typed dependencies: a comparison of the robustness of different generation strategies. In: Proceedings of the 13th European Workshop on Natural Language Generation, pp. 2–11. Association for Computational Linguistics (2011)Google Scholar
  15. 15.
    Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1353–1361. Association for Computational Linguistics (2010)Google Scholar
  16. 16.
    Narayan, S., Gardent, C.: Hybrid simplification using deep semantics and machine translation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 435–445 (2014)Google Scholar
  17. 17.
    Mille, S., Wanner, L.: Making text resources accessible to the reader: the case of patent claims. In: LREC (2008)Google Scholar
  18. 18.
    Sheremetyeva, S.: Automatic text simplification for handling intellectual property (the case of multiple patent claims). In: Proceedings of the Workshop on Automatic Text Simplification-Methods and Applications in the Multilingual Society (ATS-MA 2014), pp. 41–52 (2014)Google Scholar
  19. 19.
    Bott, S., Saggion, H., Figueroa, D.: A hybrid system for spanish text simplification. In: Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies, pp. 75–84. Association for Computational Linguistics (2012)Google Scholar
  20. 20.
    Siddharthan, A.: A survey of research on text simplification. ITL-Int. J. Appl. Linguist. 165, 259–298 (2014)Google Scholar
  21. 21.
    Shinmori, A., Okumura, M., Marukawa, Y., Iwayama, M.: Patent claim processing for readability: structure analysis and term explanation. In: Proceedings of the ACL-2003 Workshop on Patent Corpus Processing, vol. 20, pp. 56–65. Association for Computational Linguistics (2003)Google Scholar
  22. 22.
    Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 173–180. Association for Computational Linguistics (2005)Google Scholar
  23. 23.
    Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, p. 31. Association for Computational Linguistics (2004)Google Scholar
  24. 24.
    Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel (1975)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2018

Authors and Affiliations

  • Jeongwoo Kang
    • 1
    Email author
  • Achille Souili
    • 1
  • Denis Cavallucci
    • 1
  1. 1.CSIP/INSA StrasbourgStrasbourg CedexFrance

Personalised recommendations