Abstract
We formulate the problem of text integrity assessment as learning the discourse structure of text given the dataset of texts with high integrity and low integrity. We use two approaches to formalizing the discourse structures, sentiment profile and rhetoric structures, relying on sentence-level sentiment classifier and rhetoric structure parsers respectively. To learn discourse structures, we use the graph-based nearest neighbor approach which allows for explicit feature engineering, and also SVM tree kernel–based learning. Both learning approaches operate on the graphs (parse thickets) which are sets of parse trees with nodes with either additional labels for sentiments, or additional arcs for rhetoric relations between different sentences. Evaluation in the domain of valid vs invalid customer complains (those with argumentation flow, non-cohesive, indicating a bad mood of a complainant) shows the stronger contribution of rhetoric structure information in comparison with the sentiment profile information. Both above learning approaches demonstrated that discourse structure as obtained by RST parser is sufficient to conduct the text integrity assessment. At the same time, sentiment profile-based approach shows much weaker results and also does not complement strongly the rhetoric structure ones.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berzlánovich, I., Egg, M., Redeker, G.: Coherence structure and lexical cohesion in expository and persuasive texts. In: Proceedings of the Workshop on Constraints in Discourse III (2008)
Mann, W., Matthiessen, C., Thompson, S.: Rhetorical Structure Theory and Text Analysis. In: Mann, W.C., Thompson, S.A. (eds.) Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text, Amsterdam, pp. 39–78 (1992)
Galitsky, B., González, M., Chesñevar, C.: A novel approach for classifying customer complaints through graphs similarities in argumentative dialogues. Decision Support Systems (2009)
Egg, M., Redeker, G.: Underspecified discourse representation. In: Benz, A., Kühnlein, P. (eds.) Constraints in Discourse, pp. 117–138. Benjamins, Amsterdam (2008)
Taboada, M.: The Genre Structure of Bulletin Board Messages. Text Technology 13(2), 55–82 (2004)
Todirascu, A., François, T., Gala, N., Fairon, C., Ligozat, A., Bernhard, B.: Coherence and Cohesion for the Assessment of Text Readability. In: Proceedings of NLPCS 2013, Marseille, France (October 2013)
Fox, B.A.: Discourse Structure and Anaphora: Written and Conversational English. Cambridge University Press, Cambridge (1987)
Kong, K.C.C.: Are Simple Business Request Letters Really Simple? A Comparison of Chinese and English Business Request Letters. Text 18(1), 103–141 (1998)
Pelsmaekers, K., Braecke, C., Geluykens, R.: Rhetorical Relations and Subordination in L2 Writing. In: Sánchez-Macarro, A., Carter, R. (eds.) Linguistic Choice Across Genres: Variation in Spoken and Written English, pp. 191–213. John Benjamins, Amsterdam (1998)
Torrance, M., Bouayad-Agha, N.: Rhetorical Structure Analysis as a Method for Understanding Writing Processes. In: Degand, L., Bestgen, Y., Spooren, W., van Waes, L. (eds.) Multidisciplinary Approaches to Discourse, Nodus, Amsterdam (2001)
Taboada, M., Mann, W.: Rhetorical Structure Theory: Looking Back and Moving Ahead. Discourse Studies 8(3), 423–459 (2006)
Van Dijk, T.: Text and context. Explorations in the semantics and pragmatics of discourse. Longman, London (1977)
Foltz, P.W., Kintsch, W., Landauer, T.K.: The measurement of textual Coherence with Latent Semantic Analysis. Discourse Processes 25, 285–307 (1998)
McNamara, D., Kintsch, E., Songer, N., Kintsch, W.: Are good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in learning from text. Cognition and Instruction (1996)
O’reilly, T., McNamara, D.: Reversing the reverse cohesion effect: Good texts can be better for strategic, high-knowledge readers. Discourse Processes (2007)
Goutsos, D.: Modeling Discourse Topic: Sequential Relations and Strategies in Expository Text. Ablex, Norwood (1997)
Grosz, B., Sidner, C.: Attention, intentions, and the structure of discourse. Comput. Linguist. 12, 175–204 (1986)
DeVillez, R.: Writing: Step by step. Kendall Hunt, Dubuque (2003)
Golightly, K.B., Sanders, G.: Writing and Reading in the Disciplines. Pearson Custom Publishing, New Jersey (2000)
Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976)
Barzilay, R., Lapata, M.: Modeling Local Coherence: An Entity-based Approach. Computational Linguistics 34(1), 1–34 (2008)
Redeker, G.: Coherence and structure in text and discourse. In: Black, W., Bunt, H. (eds.) Abduction, Belief and Context in Dialogue. Studies in Computational Pragmatics, pp. 233–263. Benjamins, Amsterdam (2000)
Charolles, M.: Cohesion, coherence et pertinence de discours. Travaux de Linguistique 29, 125–151 (1995)
Hobbs, J.: Coherence and Coreference. Cognitive Science 3(1), 67–90 (1979)
Schnedecker, C.: Nom propre et chaînes de reference. Recherches Linguistiques 21.Klincksieck, Paris (1997)
Schnedecker, C.: Les chaînes de reference dans les portraits journalistiques: éléments de description. Travaux de Linguistique 2, 85–133 (2005)
Kleiber, G.: Anaphores et pronoms. Duculot, Louvain-la-Neuve (1994)
Grosz, B., Sidner, C.: Attention, intentions, and the structure of discourse. Comput. Linguist. 12(3), 175–204 (1986)
Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of rhetorical structure theory. In: van Kuppevelt, J., Smith, R. (eds.) Current Directions in Discourse and Dialogue, pp. 85–112. Kluwer Academic Publishers, Dordrecht (2003)
Joty, S., Carenini, G., Ng, R., Mehdad, Y.: Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Sofia, Bulgaria (2013)
Galitsky, B., Ilvovsky, D., Kuznetsov, S.O., Strok, F.: Matching sets of parse trees for answering multi-sentence questions. In: Proceedings of the Recent Advances in Natural Language Processing, RANLP 2013, pp. 285–294. INCOMA Ltd., Shoumen (2013)
Ilvovsky, D.: Going beyond sentences when applying tree kernels. In: Proceedings of the Student Research Workshop ACL 2014, pp. 56–63 (2014)
Galitsky, B., Kuznetsov, S.O.: Learning communicative actions of conflicting human agents. J. Exp. Theor. Artif. Intell. 20(4), 277–317 (2008)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer (1995)
Marcu, D.: From Discourse Structures to Text Summaries. In: Mani, I., Maybury, M. (eds.) Proceedings of ACL Workshop on Intelligent Scalable Text Summarization, Madrid, pp. 82–88 (1997)
Severyn, A., Moschitti, A.: Fast Support Vector Machines for Convolution Tree Kernels. Data Mining Knowledge Discovery 25, 325–357 (1997, 2012)
Recasens, M., de Marneffe, M.-C., Potts, C.: The Life and Death of Discourse Entities: Identifying Singleton Mentions. In: Proceedings of NAACL (2013)
Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics 39(4) (2013)
Collins, M., Duffy, N.: Convolution kernels for natural language. In: Proceedings of NIPS, pp. 625–632 (2002)
Moschitti, A.: Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees. In: Proceedings of the 17th European Conference on Machine Learning, Berlin, Germany (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Galitsky, B., Ilvovsky, D., Kuznetsov, S.O. (2015). Text Integrity Assessment: Sentiment Profile vs Rhetoric Structure. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-18117-2_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18116-5
Online ISBN: 978-3-319-18117-2
eBook Packages: Computer ScienceComputer Science (R0)