Abstract
We present an automated approach to classify sentences of scholarly work with respect to their rhetorical function. While previous work that achieves this task of argumentative zoning requires richly annotated input, our approach is robust to noise and can process raw text. Even in cases where the input has noise (as it is obtained from optical character recognition or text extraction from PDF files), our robust classifier is largely accurate. We perform an in-depth study of our system both with clean and noisy inputs. We also give preliminary results from in situ acceptability testing when the classifier is embedded within a digital library reading environment.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cleverdon, C.W.: Optimizing convenient online access to bibliographic databases. Information Services and Use 4, 37–47 (1984)
Shum, S.B.: Evolving the web for scientific knowledge: First steps towards an “HCI knowledge web”. Interfaces, British HCI Group Magazine 39, 16–21 (1998)
Bazerman, C.: Physicists reading physics, schema-laden purposes and purpose-laden schema. Written Communication 2(1), 3–23 (1985)
Kircz, J.G.: The rhetorical structure of scientific articles: The case for argumentational analysis in information retrieval. Journal of Documentation 47(4), 354–372 (1991)
Ingwersen, P.: Cognitive perspectives of information retrieval interaction: Elements of a cognitive ir theory. Journal of Documentation 52, 3–50 (1996)
Swales, J.: Research articles in English. In: Genre Analysis: English in Academic and Research Settings, ch. 7, pp. 110–176. Cambridge University Press, Cambridge (1990)
Teufel, S., Moens, M.: Summarising scientific articles — experiments with relevance and rhetorical status. Computational Linguistics 28(4), 409–446 (2002)
Copestake, A., Corbett, P.T., Murray-Rust, P., Rupp, C.J., Siddharthan, A., Teufel, S., Waldron, B.: An architecture for language technology for processing scientific texts. In: UK e-Science All Hands Meeting (2006)
Nanba, H., Okumura, M.: Towards multi-paper summarization using reference information. In: Proceedings of IJCAI 1999, pp. 926–931 (1999)
Liddy, E.D.: The discourse-level structure of empirical abstracts: An exploratory study. Information Processing and Management 27(1), 55–81 (1991)
Kando, N.: Text-level structure of research papers: Implications for text-based information processing systems. In: Proceedings of BCS-IRSG Colloquium, pp. 68–81 (1997)
Teufel, S., Carletta, J., Moens, M.: An annotation scheme for discourse-level argumentation in research articles. In: Proceedings of European ACL (EACL 1999), Bergen, Norway, pp. 110–117 (1999)
Siegel, S., Castellan, N.J.J.: Nonparametric Statistics for the Behavioral Sciences, 2nd edn. McGraw-Hill, Berkeley (1988)
Siddharthan, A., Teufel, S.: Whose idea was this, and why does it matter? attributing scientific work to citations. In: Proceedings of the North American chapter of the Association of Computational Linguistics, NAACL 2007 (2007)
Hachey, B., Grover, C.: Extractive summarisation of legal texts. Artificial Intelligence and Law: Special Issue on E-government 14(4), 305–345 (2006)
Merity, S., Murphy, T., Curran, J.R.: Accurate argumentative zoning with maximum entropy models. In: Proceedings of ACL-IJCNLP 2009 Workshop on text and citation analysis for scholarly digital libraries (NLPIR4DL), Singapore, pp. 19–26 (2009)
McKnight, L., Arinivasan, P.: Categorization of sentence types in medical abstracts. In: AMIA 2003 Symposium Proceedings, pp. 440–444 (2003)
Lin, J., Karakos, D., Demner-Fushman, D., Khudanpur, S.: Generative content models for structural analysis of medical abstracts. In: Proceedings of the HLT/NAACL 2006 Workshop on Biomedical Natural Language Processing (BIONLP 2006), New York City, USA, pp. 65–72 (2006)
Hirohata, K., Okazaki, N., Ananiadou, S., Ishizuka, M.: Identifying sections in scientific abstracts using conditional random fields. In: Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP 2008), Hyderabad, India, pp. 381–388 (2008) ACL Anthology Ref. I08-1050
Reynar, J.C., Ratnaparkhi, A.: A maximum entropy approach to identifying sentence boundaries. In: Proceedings of the Firth Conference on Applied Natural Language Processing, pp. 803–806 (1997)
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Empirical Methods for Natural Language Processing. Association for Computational Linguistics, New Jersey (1996)
Porter, M.F.: An algorithm for suffix stripping. Program (3), 130–137 (1980)
Teufel, S.: Argumentative Zoning: Information Extraction from Scientific Text. PhD thesis, School of Cognitive Science, University of Edinburgh, Edinburgh, UK (2000)
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Communications of the ACM 18(6), 341–343 (1975)
Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 378–381 (1971)
Carletta, J.: Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics 22(2), 249–254 (1996)
Krippendorff, K.: Content Analysis: An Introduction to its Methodology, 2nd edn. Sage Publications, Beverly Hills (2004)
Krenn, B., Evert, S., Zinsmeister, H.: Determining intercoder agreement for a collocation identification task. In: Proceedings of Konvens 2004 (2004)
Nguyen, T.D., Kan, M.Y., Dang, D.T., Hänse, M., Hong, C.H.A., Luong, M.T., Gozali, J.P., Sugiyama, K., Tan, Y.F.: ForeCite: towards a reader-centric scholarly digital library. Under Review (2010)
Luong, M.T., Nguyen, T.D., Kan, M.Y.: Logical structure recovery in scholarly articles with rich document features. International Journal of Digital Library Systems (2011)
Nakov, P., Schwarz, A., Hearst, M.: Citances: Citation sentences for semantic analysis of bioscience text. In: SIGIR 2004 Workshop on Search and Discovery in Bioinformatics (2004)
Qazvinian, V., Radev, D.R.: Scientific paper summarization using citation summary networks. In: Proceedings of COLING 2008, Manchester, UK (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Teufel, S., Kan, MY. (2011). Robust Argumentative Zoning for Sensemaking in Scholarly Documents. In: Bernardi, R., Chambers, S., Gottfried, B., Segond, F., Zaihrayeu, I. (eds) Advanced Language Technologies for Digital Libraries. NLP4DL AT4DL 2009 2009. Lecture Notes in Computer Science, vol 6699. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23160-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-23160-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23159-9
Online ISBN: 978-3-642-23160-5
eBook Packages: Computer ScienceComputer Science (R0)