Abstract
People tend to avoid reading contracts due to their complexity. As such, this research tackles the challenge of improving the accessibility and readability of contracts, which are often lengthy and difficult to understand. This study proposes a system that integrates automated contract review with text simplification. The system leverages a language model fine-tuned on legal data to extract salient clauses from contracts. Complex words are then replaced with simpler alternatives generated by language models trained on legal documents. Moreover, the simplified output is further refined by breaking down the text into shorter sentences based on their semantic hierarchy. Initial results show that the readability of the simplified contracts improved, making them understandable for 10th graders instead of requiring a postgraduate level of education. Human evaluations were generally positive, although the observed improvements were relatively minor. The research concludes that integrating text simplification with automated contract review has the potential to enhance contract readability, but more research is necessary to improve the quality of simplification further.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al-Thanyyan, S.S., Azmi, A.M.: Automated text simplification: a survey. ACM Comput. Surv. (CSUR) 54(2), 1–36 (2021)
Angelidis, I., Chalkidis, I., Koubarakis, M.: Named entity recognition, linking and generation for Greek legislation. In: JURIX, pp. 1–10 (2018)
Bakos, Y., Marotta-Wurgler, F., Trossen, D.R.: Does anyone read the fine print? Consumer attention to standard-form contracts. J. Legal Stud. 43(1), 1–35 (2014)
Benoliel, U., Becher, S.I.: The duty to read the unreadable. BCL Rev. 60, 2255 (2019)
Bernstam, E.V., Shelton, D.M., Walji, M., Meric-Bernstam, F.: Instruments to assess the quality of health information on the world wide web: what can our patients actually use? Int. J. Med. Inform. 74(1), 13–19 (2005)
Blackwell, A.H.: The Essential Law Dictionary. Sphinx Dictionaries. Sphinx Pub. (2008)
Bott, S., Saggion, H.: An unsupervised alignment algorithm for text simplification corpus construction. In: Proceedings of the Workshop on Monolingual Text-To-Text Generation, pp. 20–26 (2011)
Brysbaert, M.: New, Boris, Keuleers, Emmanuel: adding part-of-speech information to the subtlex-us word frequencies. Behav. Res. Methods 44, 991–997 (2012)
Cakebread, C.: You’re not alone, no one reads terms of service agreements. Insider (2017)
Cardellino, C., Teruel, M., Alemany, L.A., Villata, S.: Legal NERC with ontologies, Wikipedia and curriculum learning. In: 15th European Chapter of the Association for Computational Linguistics (EACL 2017), pp. 254–259 (2017)
Carroll, J., Minnen, G., Canning, Y., Devlin, S., Tait, J.: Practical simplification of English newspaper text to assist aphasic readers. In: Proceedings of the AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology, pp. 7–10. Citeseer (1998)
Cemri, M., Çukur, T., Koç, A.: Unsupervised simplification of legal texts. arXiv preprint arXiv:2209.00557 (2022)
Chalkidis, I., Androutsopoulos, I., Michos, A.: Extracting contract elements. In: Proceedings of the 16th Edition of the International Conference on Artificial Intelligence and Law, pp. 19–28 (2017)
Chalkidis, I., Androutsopoulos, I., Michos, A.: Obligation and prohibition extraction using hierarchical RNNs. arXiv preprint arXiv:1805.03871 (2018)
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I.: Legal-BERT: the muppets straight out of law school. arXiv preprint arXiv:2010.02559 (2020)
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Androutsopoulos, I.: Neural contract element extraction revisited. In: Workshop on Document Intelligence at NeurIPS 2019 (2019)
Collantes, M., Hipe, M., Sorilla, J.L., Tolentino, L., Samson, B.: Simpatico: a text simplification system for senate and house bills. In: Proceedings of the 11th National Natural Language Processing Research Symposium, pp. 26–32 (2015)
Coster, W., Kauchak, D.: Simple English Wikipedia: a new text simplification task. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 665–669 (2011)
Dale, R.: Law and word order: NLP in legal tech. Nat. Lang. Eng. 25(1), 211–217 (2019)
Elhadad, N., Sutaria, K.: Mining a lexicon of technical terms and lay equivalents. In: Biological, Translational, and Clinical Language Processing, pp. 49–56 (2007)
Evans, R., Orasan, C., Dornescu, I.: An evaluation of syntactic simplification rules for people with autism. In: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations, pp. 131–140. Association for Computational Linguistics (2014)
Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221 (1948)
Gallegos, I., George, K.: The right to remain plain: summarization and simplification of legal documents. Unpublished (n.d.)
Hendrycks, D., Burns, C., Chen, A., Ball, S.: Cuad: an expert-annotated NLP dataset for legal contract review. arXiv preprint arXiv:2103.06268 (2021)
Inui, K., Fujita, A., Takahashi, T., Iida, R., Iwakura, T.: Text simplification for reading assistance: a project note. In: Proceedings of the Second International Workshop on Paraphrasing, pp. 9–16 (2003)
Jiang, C., Maddela, M., Lan, W., Zhong, Y., Xu, W.: Neural CRF model for sentence alignment in text simplification. arXiv preprint arXiv:2005.02324 (2020)
Kajiwara, T., Matsumoto, H., Yamamoto, K.: Selecting proper lexical paraphrase for children. In: Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013), pp. 59–73 (2013)
Kalk, N.J., Pothier, D.D.: Patient information on schizophrenia on the internet. Psychiatric Bull. 32(11), 409–411 (2008)
Kincaid, J.P., Fishburne Jr., R.P., Rogers, R.L., Chissom, B.S.: Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, Naval Technical Training Command Millington TN Research Branch (1975)
Koreeda, Y., Manning, C.D.: Contractnli: a dataset for document-level natural language inference for contracts. arXiv preprint arXiv:2110.01799 (2021)
Leitner, E., Rehm, G., Moreno-Schneider, J.: Fine-grained named entity recognition in legal documents. In: Acosta, M., Cudré-Mauroux, P., Maleshkova, M., Pellegrini, T., Sack, H., Sure-Vetter, Y. (eds.) SEMANTiCS 2019. LNCS, vol. 11702, pp. 272–287. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-33220-4_20
Leivaditi, S., Rossi, J., Kanoulas, E.: A benchmark for lease contract review. arXiv preprint arXiv:2010.10386 (2020)
Leroy, G., Endicott, J.E.: Combining NLP with evidence-based methods to find text metrics related to perceived and actual text difficulty. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 749–754 (2012)
Lippi, M., et al.: Claudette: an automated detector of potentially unfair clauses in online terms of service. Artif. Intell. Law 27(2), 117–139 (2019)
Maddela, M., Alva-Manchego, F., Xu, W.: Controllable text simplification with explicit paraphrasing. arXiv preprint arXiv:2010.11004 (2020)
Manor, L., Li, J.J.: Plain English summarization of contracts. arXiv preprint arXiv:1906.00424 (2019)
Martin, L., Fan, A., de la Clergerie, É., Bordes, A., Sagot, B.: Muss: multilingual unsupervised sentence simplification by mining paraphrases. arXiv preprint arXiv:2005.00352 (2020)
Harry, G., Laughlin, Mc.: Smog grading-a new readability formula. J. Read. 12(8), 639–646 (1969)
Niklaus, C., Cetto, M., Freitas, A., Handschuh, S.: Transforming complex sentences into a semantic hierarchy. arXiv preprint arXiv:1906.01038 (2019)
Obar, J.A., Oeldorf-Hirsch, A.: The biggest lie on the internet: ignoring the privacy policies and terms of service policies of social networking services. Inf. Commun. Soc. 23(1), 128–147 (2020)
Paetzold, G., Specia, L.: Unsupervised lexical simplification for non-native speakers. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Pellow, D., Eskenazi, M.: An open corpus of everyday documents for simplification tasks. In: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), pp. 84–93 (2014)
Petersen, S.E., Ostendorf, M.: Text simplification for language learners: a corpus analysis. In: SLaTE (2007)
Qiang, J., Li, Y., Zhu, Y., Yuan, Y., Wu, X.: LSBERT: a simple framework for lexical simplification. arXiv preprint arXiv:2006.14939 (2020)
Rello, L., Baeza-Yates, R., Dempere-Marco, L., Saggion, H.: Frequent words improve readability and short words improve understandability for people with dyslexia. In: Kotzé, P., Marsden, G., Lindgaard, G., Wesson, J., Winckler, M. (eds.) INTERACT 2013. LNCS, vol. 8120, pp. 203–219. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40498-6_15
Shaghaghian, S., Feng, L.Y., Jafarpour, B., Pogrebnyakov, N.: Customizing contextualized language models for legal document reviews. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 2139–2148. IEEE (2020)
Shardlow, M.: A survey of automated text simplification. Int. J. Adv. Comput. Sci. Appl. 4(1), 58–70 (2014)
Siddharthan, A., Katsos, N.: Reformulating discourse connectives for non-expert readers. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 1002–1010 (2010)
Van Heuven, W.J.B., Mandera, P., Keuleers, E., Brysbaert, M.: Subtlex-UK: a new and improved word frequency database for British English. Q. J. Exp. Psychol. 67(6), 1176–1190 (2014)
Williams, R.T.: A table for rapid determination of revised Dale-Chall readability scores. Read. Teach. 26(2), 158–165 (1972)
Wolf, T., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45 (2020)
Woodsend, K., Lapata, M.: Learning to simplify sentences with quasi-synchronous grammar and integer programming. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 409–420 (2011)
Wei, X., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. Assoc. Comput. Linguist. 3, 283–297 (2015)
Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., Lee, L.: For the sake of simplicity: unsupervised extraction of lexical simplifications from Wikipedia. arXiv preprint arXiv:1008.1986 (2010)
Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 1353–1361 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Justo, J.M., Recario, R.N.C. (2024). Text Simplification System for Legal Contract Review. In: Arai, K. (eds) Advances in Information and Communication. FICC 2024. Lecture Notes in Networks and Systems, vol 919. Springer, Cham. https://doi.org/10.1007/978-3-031-53960-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-53960-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53959-6
Online ISBN: 978-3-031-53960-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)