Collecting Bilingual Technical Terms from Japanese-Chinese Patent Families by SVM

  • Lijuan Dong
  • Zi Long
  • Takehito Utsuro
  • Tomoharu Mitsuhashi
  • Mikio Yamamoto
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 593)


This paper proposes how to collect bilingual technical terms from Japanese-Chinese patent families. In the proposed method, the phrase translation table of a statistical machine translation model is used within the procedure of estimating Japanese-Chinese translation of technical terms. In this procedure, first, we extract Japanese technical terms from the Japanese side of parallel patent sentences. Then, we collect all the sentences that contain the extracted Japanese term. Next, we generate Chinese translation of the Japanese technical term, where we refer to the phrase translation table of a statistical machine translation model. Finally, we apply the Support Vector Machines (SVMs) to the task of identifying bilingual technical terms. As the overall performance, we achieve over 90 % precision with the condition of more than or equal to 60 % recall.


Translation acquisition Statistical machine translation Phrase translation table SVM 


  1. 1.
    Bouamor, D., Semmar, N., Zweigenbaum, P.: Context vector disambiguation for bilingual lexicon extraction from comparable corpora. In: Proceedings of 51st ACL, pp. 759–764 (2013)Google Scholar
  2. 2.
    Erdmann, M., Nakayama, K., Hara, T., Nishio, S.: Improving the extraction of bilingual terminology from Wikipedia. ACM Trans. Multimedia Comput. Commun. Appl. 5(4), 31:1–31:17 (2009)CrossRefGoogle Scholar
  3. 3.
    Huang, F., Zhang, Y., Vogel, S.: Mining key phrase translations from Web corpora. In: Proceedings of HLT/EMNLP, pp. 483–490 (2005)Google Scholar
  4. 4.
    Itagaki, M., Aikawa, T., He, X.: Automatic validation of terminology translation consistency with statistical method. In: Proceedings of MT Summit XI, pp. 269–274 (2007)Google Scholar
  5. 5.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of 45th ACL, Companion Volume, pp. 177–180 (2007)Google Scholar
  6. 6.
    Lin, D., Zhao, S., Van Durme, B., Paşca, M.: Mining parenthetical translations from the web by word alignment. In: Proceedings of 46th ACL: HLT, pp. 994–1002 (2008)Google Scholar
  7. 7.
    Lu, B., Tsou, B.K.: Towards bilingual term extraction in comparable patents. In: Proceedings of 23rd PACLIC, pp. 755–762 (2009)Google Scholar
  8. 8.
    Matsumoto, Y., Utsuro, T.: Lexical knowledge acquisition. In: Dale, R., Moisl, H., Somers, H. (eds.) Handbook of Natural Language Processing, chap. 24, pp. 563–610. Marcel Dekker Inc., New York (2000)Google Scholar
  9. 9.
    Morin, E., Hazem, A.: Looking at unbalanced specialized comparable corpora for bilingual lexicon extraction. In: Proceedings of 52nd ACL, pp. 1284–1293 (2014)Google Scholar
  10. 10.
    Morishita, Y., Utsuro, T., Yamamoto, M.: Integrating a phrase-based SMT model and a bilingual lexicon for human in semi-automatic acquisition of technical term translation lexicon. In: Proceedings of 8th AMTA, pp. 153–162 (2008)Google Scholar
  11. 11.
    Tonoike, M., Kida, M., Takagi, T., Sasaki, Y., Utsuro, T., Sato, S.: A comparative study on compositional translation estimation using a domain/topic-specific corpus collected from the web. In: Proceedings of 2nd International Workshop on Web as Corpus, pp. 11–18 (2006)Google Scholar
  12. 12.
    Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter for Sighan bakeoff 2005. In: Proceedings of 4th SIGHAN Workshop on Chinese Language Processing, pp. 168–171 (2005)Google Scholar
  13. 13.
    Utiyama, M., Isahara, H.: A Japanese-English patent parallel corpus. In: Proceedings of MT Summit XI, pp. 475–482 (2007)Google Scholar
  14. 14.
    Yasuda, K., Sumita, E.: Building a bilingual dictionary from a Japanese-Chinese patent corpus. In: Gelbukh, A. (ed.) CICLing 2013, Part II. LNCS, vol. 7817, pp. 276–284. Springer, Heidelberg (2013)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Singapore 2016

Authors and Affiliations

  • Lijuan Dong
    • 1
  • Zi Long
    • 1
  • Takehito Utsuro
    • 1
  • Tomoharu Mitsuhashi
    • 2
  • Mikio Yamamoto
    • 1
  1. 1.Graduate School of Systems and Information EngineeringUniversity of TsukubaTsukubaJapan
  2. 2.Japan Patent Information OrganizationKoto-kuJapan

Personalised recommendations