Predicting Author’s Native Language Using Abstracts of Scholarly Papers

Baba, Takahiro; Baba, Kensuke; Ikeda, Daisuke

doi:10.1007/978-3-030-01851-1_43

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11177))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

828 Accesses

Abstract

Predicting author’s attributes is useful for understanding implicit meanings of documents. The target problem of this paper is predicting author’s native language for each document. The authors of this paper used surface-level features of documents for the problem and tried to clarify the practical tendencies of the writing style as word occurrences. They conducted a classification of the abstracts written in English of approximately 85,000 scholarly papers written in English or in Japanese. As a result of the experiment, the accuracy of the binary classification was 0.97, and they found that a number of distinctive phrases used in the classification were related to typical writing styles of Japanese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Europe PMC: Europe PubMed Central. https://europepmc.org/. Accessed 5 Feb 2018
Oxford Dictionary of English. Oxford University Press (2010)
Google Scholar
Berzak, Y., Nakamura, C., Flynn, S., Katz, B.: Predicting native language from gaze. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 541–551 (2017)
Google Scholar
Ionescu, R.T., Popescu, M., Cahill, A.: String kernels for native language identification: insights from behind the curtains. Comput. Linguist. 42(3), 491–525 (2016)
Article MathSciNet Google Scholar
Paquette, G.: English Composition for Scholarly Works (in Japanese). Kyoto University Press, Kyoto (2004)
Google Scholar
Wong, S.-M.J., Dras, M.: Exploiting parse structures for native language identification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 1600–1610. Association for Computational Linguistics, Stroudsburg (2011)
Google Scholar

Download references

Acknowledgement

This work was supported by JSPS KAKENHI Grant Number 15H02787.

Author information

Authors and Affiliations

Kyushu University, Fukuoka, 819-0395, Japan
Takahiro Baba & Daisuke Ikeda
Fujitsu Laboratories, Kawasaki, 211-8588, Japan
Kensuke Baba

Authors

Takahiro Baba
View author publications
You can also search for this author in PubMed Google Scholar
Kensuke Baba
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Ikeda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takahiro Baba .

Editor information

Editors and Affiliations

Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
American University, Washington, DC, USA
Nathalie Japkowicz
Hong Kong Baptist University, Kowloon, Hong Kong
Jiming Liu
University of Cyprus, Nicosia, Cyprus
George A. Papadopoulos
University of North Carolina, Charlotte, NC, USA
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baba, T., Baba, K., Ikeda, D. (2018). Predicting Author’s Native Language Using Abstracts of Scholarly Papers. In: Ceci, M., Japkowicz, N., Liu, J., Papadopoulos, G., Raś, Z. (eds) Foundations of Intelligent Systems. ISMIS 2018. Lecture Notes in Computer Science(), vol 11177. Springer, Cham. https://doi.org/10.1007/978-3-030-01851-1_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-01851-1_43
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01850-4
Online ISBN: 978-3-030-01851-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics