Automatic Collection of Useful Phrases for English Academic Writing

  • Shunsuke Kozawa
  • Yuta Sakai
  • Kenji Sugiki
  • Shigeki Matsubara
Part of the Studies in Computational Intelligence book series (SCI, volume 376)


English academic writing is indispensable for researchers to present their own research achievement. It is hard for non-native researchers to write research papers in English. They often refer to phrase dictionaries for academic writing to know useful expressions in academic writing. However, lexica available in the market do not have enough expressions and example sentences to serve the purpose since the lexica are created by hand. In order to respond to the demand for the better lexica, this paper proposes a method for extracting useful expressions automatically from English research papers. The expressions are extracted from research papers based on four characteristics of the expressions. The extracted expressions are classified into five classes; “introduction”, “related work”, “proposed method”, “experiment”, and “conclusion”. In our experiment using 1,232 research papers, our proposed method achieved 57.5% in precision and 51.9% in recall. The f-measure was higher than those of the baselines, and therefore, we confirmed the validity of our method. We developed a phrase search system using extracted phrasal expressions to support English academic writing.


Noun Phrase Academic Writing Section Class Syntactic Constraint Automatic Collection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ando, K., Tsunashima, Y., Okada, M.: A Writing Support Tool for Learners of English and/or Japanese as a Second Language. In: Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2008, pp. 5921–5927 (2008)Google Scholar
  2. 2.
    Bouma, G., Villada, B.: Corpus-based acquisition of collocational prepositional phrases. Language and Computers 45(1), 23–37 (2002)Google Scholar
  3. 3.
    Cook, P., Fazly, A., Stevenson, S.: Pulling their weight: exploiting syntactic forms for the automatic identification of idiomatic expressions in context. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp. 41–48 (2007)Google Scholar
  4. 4.
    Fazly, A., Stevenson, S.: Automatically constructing a lexicon of verb phrase idiomatic combinations. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 337–344 (2006)Google Scholar
  5. 5.
    Ge, S.L., Song, R.: Automated Error Detection of Vocabulary Usage in College English Writing. In: Proceedings of 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 178–181 (2010)Google Scholar
  6. 6.
    Ikeno, A., Hamaguchi, Y., Yamamoto, E., Isahara, H.: Techinical term acquisition from web document collection. Transactions of Information Processing Society of Japan 47(6), 1717–1727 (2006)Google Scholar
  7. 7.
    Kato, Y., Egawa, S., Matsubara, S., Inagaki, Y.: English sentence retrieval system based on dependency structure and its evaluation. In: Proceedings of 3rd International Conference on Information Digital Management, pp. 279–285 (2008)Google Scholar
  8. 8.
    Lawrence, S., Lee Giles, C., Bollacker, K.: Digital libraries and autonomous citation indexing. IEEE Computer 32(6), 67–71 (1999)Google Scholar
  9. 9.
    Lin, D.: Automatic identification of non-compositional phrases. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 317–324 (1999)Google Scholar
  10. 10.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(4), 313–330 (1993)Google Scholar
  11. 11.
    Miyoshi, Y., Ochi, Y., Kanenishi, K., Okamoto, R., Yano, Y.: An illustrative-sentences search tool using phrase structure “SOUP”. In: Proceedings of 2004 World Conference on Educational Multimedia, Hypermedia and Telecommunications, pp. 1193–1199 (2004)Google Scholar
  12. 12.
    Narita, M., Kurokawa, K., Utsuro, T.: Web-based English abstract writing tool using a tagged E-J parallel corpus. In: Proceedings of 3rd International Conference on Language Resources and Evaluation, pp. 2115–2119 (2002)Google Scholar
  13. 13.
    Nishimura, N., Meiseki, K., Yasumura, M.: Development and evaluation of system for automatic correction of English composition. Transactions of Information Processing Society of Japan 40(12), 4388–4395 (1999) (in Japanese)Google Scholar
  14. 14.
    Oshika, H., Sato, M., Ando, S., Yamana, H.: A translation support system using search engines. IEICE Technical Report. Data Engineering 2004(72), 585–591 (2004) (in Japanese)Google Scholar
  15. 15.
    Phan, X.H.: JTextPro: A Java-based text processing toolkit (2006),
  16. 16.
    Project, E.D.: Eijiro, 4th edn. ALC Press Inc. (2008)Google Scholar
  17. 17.
    Sakimura, K.: Useful expressions for research papers in English. Sogen-sha (1991) (in Japanese)Google Scholar
  18. 18.
    Sang, E.F.T.K., Buchholz, S.: Introduction to the CoNLL-2000 shared task: Chunking. In: Proceedings of 4th Conference on Computational Natural Language Learning and of the 2nd Learning Language in Logic Workshop, vol. cs.CL/0009008, pp. 127-132 (2000)Google Scholar
  19. 19.
    Sugino, T., Ito, F.: How to write a better English thesis. Natsume-sha (2008) (in Japanese)Google Scholar
  20. 20.
    Van de Cruys, T., Moirón, B.V.: Semantics-based multiword expression extraction. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp. 25–32 (2007)Google Scholar
  21. 21.
    Widdows, D., Dorow, B.: Automatic extraction of idioms using graph analysis and asymmetric lexicosyntactic patterns. In: Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, pp. 48–56 (2005)Google Scholar
  22. 22.
    Yamanoue, T., Minami, T., Ruxton, I., Sakurai, W.: Learning usage of English KWICLY with WebLEAP/DSR. In: Proceedings of 2nd International Conference on Information Technology and Applications (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Shunsuke Kozawa
    • 1
  • Yuta Sakai
    • 1
  • Kenji Sugiki
    • 1
  • Shigeki Matsubara
    • 1
  1. 1.Graduate School of Information ScienceNagoya UniversityJapan

Personalised recommendations