Abstract
We report explorations into prompt engineering with large pre-trained language models that were not fine-tuned to solve the legal entailment task (Task 4) of the 2023 COLIEE competition. Our most successful strategy used simple text similarity measures to retrieve articles and queries from the training set. We report on our efforts to optimize performance with both OpenAI’s GPT-4 and FLaN-T5. We also used an ensemble approach to find the best combination of models and prompts. Finally, we analyze our results and suggest ideas for future improvements.
Similar content being viewed by others
Notes
Because sampling was disabled, the temperature does not affect these models’ predictions.
As we suspected our GPT-4 submission would likely be disqualified, we chose not to use this model in the ensemble.
We refer the reader to the respective papers for details on how each model was trained. Note that, at the time of publication, OpenAI has released no details on what data GPT-4 was trained on.
Although the exact number of parameters in GPT-4 is unknown, it is likely to be on the order of hundreds of billions, given the known size of GPT-3.
‘gpt-3.5-turbo’.
https://github.com/Advancing-Machine-Human-Reasoning-Lab/COLIEE-2023-Task4.
References
Hart, H. (1961). The concept of law. Clarendon Press.
Franklin, J. (2012). How much of commonsense and legal reasoning is formalizable? A Review of Conceptual Obstacles Law, Probability and Risk, 11(2–3), 225.
Prakken, H. (2017). On the problem of making autonomous vehicles conform to traffic law. Artificial Intelligence and Law, 25(3), 341.
Lawless, W. F., Mittu, R., & Sofge, D. A. (Eds.). (2020). Human-machine shared contexts. NY: Academic Press.
Licato. J., Marji, Z., & Abraham, S. (2019). Proceedings of the AAAI 2019 Fall Symposium on Human-Centered AI, Arlington, VA.
Licatom, J., & Marji, Z. (2018). Proceedings of the 2018 International Conference on Robot Ethics and Standards, ICRES.
Waismann, F. (1965). The principles of linguistic philosophy. St. Martins Press.
Licato J. (2021). How should AI interpret rules? A defense of minimally defeasible interpretive argumentation arXiv e-prints.
Vecht, J. J. (2020). Open texture clarified. Inquiry. https://doi.org/10.1080/0020174X.2020.1787222
Licato, J., Fields, L., & Hollis, B. (2023). Proceedings of The 36th International Florida Artificial Intelligence Research Society Conference (FLAIRS-34), AAAI Press.
Fields, L., & Licato, J. (2023) Proceedings of the 36th International Florida Artificial Intelligence Research Society Conference (FLAIRS-34), AAAI.
Licato, J. (2022). Proceedings of the AAAI 2022 Spring Workshop on “Ethical Computing: Metrics for Measuring AI’s Proficiency and Competency for Ethical Reasoning".
Licato, J. (2022). Proceedings of the 2022 Advances on Societal Digital Transformation (DIGITAL) Special Track on Explainable AI in Societal Games (XAISG).
Sartor. G., Walton, D., Macagno, F., & Rotolo, A. (2014). Legal Knowledge and Information Systems. In: Proceedings of JURIX 14, pp. 21–28.
Bongiovanni, G., Postema, G., Rotolo, A., Sartor, G., Valentini, C., & Walton, D. (Eds.). (2018). Handbook of legal reasoning and argumentation (pp. 519–560). Netherlands, Dordrecht: Springer. https://doi.org/10.1007/978-90-481-9452-0_18
Walton, D., Macagno, F., & Sartor, G. (2021). Statutory interpretation: Pragmatics and argumentation. Cambridge University Press.
Araszkiewicz, M. (2021). Critical questions to argumentation schemes in statutory interpretation. Journal of Applied Logics - IfCoLog Journal of Logics and Their Applications, 8(1), 291–320.
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., & Brunskill, E., et al. (2021). On the opportunities and risks of foundation models, arXiv preprint arXiv:2108.07258
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877.
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). Pre-train,prompt, and predict: A systematic survey of prompting methods in natural language processing arXiv:abs/2107.13586
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E.H., Le Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models CoRR abs/2201.11903. https://arxiv.org/abs/2201.11903
Ye, X., & Durrett, G. (2023). Explanation selection using unlabeled data for in-context learning, arXiv preprint arXiv:2302.04813
Rubin, O., Herzig, J., & Berant, J. (2022). Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pp. 2655–2671.
Song, C., Cai, F., Wang, M., Zheng, J., & Shao, T. (2023). TaxonPrompt: Taxonomy-aware curriculum prompt learning for few-shot event classification. Knowledge-Based Systems, 264, 110290. https://doi.org/10.1016/j.knosys.2023.110290
Qu, Y., Ding, Y., Liu, J., Liu, K., Ren, R., Zhao, W. X., Dong, D., Wu, H., & Wang, H. (2021). Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics, Online). pp. 5835–5847. https://doi.org/10.18653/v1/2021.naacl-main.466
Wang, S., Xu, Y., Fang, Y., Liu, Y., Sun, S., Xu, R., Zhu, C., & Zeng, M. (2022). Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland. pp. 3170–3179. https://doi.org/10.18653/v1/2022.acl-long.226
Reimers, N., & Gurevych, I. (2019). in Proceedings of the 2019 Conference on Empirical Methods. In S. Padó & R. Huang (Eds.), Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (pp. 3982–3992). Hong Kong: Association for Computational Linguistics.
Lyu, Q., Havaldar, S., Stein, A., Zhang, L., Rao, D., Wong, E., Apidianaki, M., & Callison-Burch, C. (2023). Faithful chain-of-thought reasoning, arXiv preprint arXiv:2301.13379
Zelikman, E., Wu, Y., Mu, J., & Goodman, N. (2022). The flan collection: Designing data and methods for effective instruction tuning. Advances in Neural Information Processing Systems, 35, 15476.
Jung, J., Qin, L., Welleck, S., Brahman, F., Bhagavatula, C., Bras, R. L., & Choi, Y. (2022). Maieutic prompting: Logically consistent reasoning with recursive explanations arXiv preprint arXiv:2205.11822
Srivastava, A., Rastogi, A., Rao, A., Shoeb, A. A. M. , Abid, A., Fisch, A., & Brown, A. R. A., Santoro, A. Gupta, A. Garriga-Alonso, et al. (2022). Beyond the imitation game: Quantifying and extrapolating the capabilities of language models arXiv preprint arXiv:2206.04615
Yu, F., Quartey, L., & Schilder, F. (2023) Findings of the Association for Computational Linguistics: ACL 2023 , pp. 13582–13596.
Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., & Ba, J. J. (2022). Large language models are human-level prompt engineers, arXiv preprint arXiv:2211.01910
Devlin, J., Chang, M.W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805
Nguyen, H. T., Vuong, H. Y. T., Nguyen, P. M., Dang, B. T., Bui, Q. M., Vu, S. T., & Nguyen, C. M., Tran, V., Satoh, K. Nguyen, M. L. (2020) Jnlp team: Deep learning for legal processing in coliee, arXiv preprint arXiv:2011.08071
He, P., Liu, X., Gao, J., & Chen, W. (2020). Deberta: Decoding-enhanced bert with disentangled attention, arXiv preprint arXiv:2006.03654
Lin, J., Nogueira, R., & Yates, A. (2022). Pretrained transformers for text ranking: Bert and beyond. Springer Nature.
Rosa, G. M., Rodrigues, R. C., de Alencar Lotufo, R., & Nogueira, R. (2021). Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, pp. 295–300.
Shao, Y., Mao, J., Liu, Y., Ma, W. , Satoh, K., Zhang, M., & Ma, S. (2020). IJCAI, pp. 3501–3507.
Shao, Y., Liu, B., Mao, J., Liu, Y., Zhang, M., & Ma, S. (2020). Thuir@ coliee-2020: leveraging semantic understanding and exact matching for legal case retrieval and entailment. Corr arXiv:2012.13102
Rosa, G.M. , Rodrigues, R.C. , Lotufo, R., & Nogueira, R. (2021). Yes, bm25 is a strong baseline for legal case retrieval, arXiv preprint arXiv:2105.05686
Althammer, S., Askari, A. , Verberne, S., & Hanbury, A. (2021). Proceedings of the eighth international competition on legal information extraction/entailment (COLIEE 2021), pp. 8–14.
Askari, A., Peikos,G., Pasi, G., & Verberne, S. (2022). Leibi@ coliee 2022: Aggregating tuned lexical models with a cluster-driven bert-based model for case law retrieval, arXiv preprint arXiv:2205.13351
Savelka, J., Ashley, K. D., Gray, M. A., Westermann, H., & Xu, H. (2023). Can gpt-4 support analysis of textual data in tasks requiring highly specialized domain expertise? arXiv preprint arXiv:2306.13906
Savelka, J., Ashley, K. D., Gray, M. A., Westermann, H., & Xu, H. (2023). Explaining legal concepts with augmented large language models, arXiv preprint arXiv:2306.09525
Goebel, R., Kano, Y., Kim, M. Y., Rabelo, J., Satoh, K., & Yoshioka, M. (2023). Proceedings of the Nineteenth International Conference on Artificial Intelligence and Law, pp. 472–480.
Berryessa, C. M., Dror, I. E., & McCormack, C. J. B. (2023). Prosecuting from the bench? Examining sources of pro-prosecution bias in judges. Legal and Criminal Psychology, 28(1), 1.
Liu, J. Z., & Li, X. (2019). Legal techniques for rationalizing biased judicial decisions: Evidence from experiments with real judges. Journal of Empirical Legal Studies, 16(3), 630.
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2022). A survey on bias and fairness in machine learning. ACM Computing Surveys, 54(6), 1. https://doi.org/10.1145/3457607
Wachter, S., Mittelstadt, B., & Russell, C. (2020). Bias preservation in machine learning: The legality of fairness metrics under EU non-discrimination laws. West Virginia Law Review, 123, 735.
Yeung, D., Khan, I., Kalra, N., Osoba, O. A. (2021). Identifying systemic bias in the acquisition of machine learning decision aids for law enforcement applications. RAND Corporation, Santa Monica, CA. https://doi.org/10.7249/PEA862-1
Costantini, S., & Lanzarone, G. A. (1995). Explanation-based interpretation of open-textured concepts in logical models of legislation. Artificial Intelligence and Law, 3, 191. https://doi.org/10.1007/BF00872530
Ashley, K. D., & Walker, V. R. (2013) ICAIL ’13: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Law. Association for Comuting Machinery, pp. 176–180. https://doi.org/10.1145/2514601.2514622
Bayamlıoğlu, E., Leenes, R. E. (2018) Data-driven decision-making and the ‘rule of law’ Tilburg Law School Research Paper.
Workshop, B., Scao, T. L., Fan, A., Akiki, C., Pavlick, E., Ilić, S., Hesslow, D., Castagné, R., Luccioni, A. S., Yvon, F., Gallé, M., Tow, J., Rush, A. M., Biderman, S., Webson, A., Ammanamanchi, P. S., Wang, T., Sagot, B., Muennighoff, N., del Moral, A. V., Ruwase, O., Bawden, R. , Bekman, S., McMillan-Major, A. , Beltagy, I., H. Nguyen, L. Saulnier, S. Tan, P.O. Suarez, V. Sanh, H. Laurenčon, Y. Jernite, J. Launay, M. Mitchell, C. Raffel, A. Gokaslan, A. Simhi, A. Soroa, A.F. Aji, A. Alfassy, A. Rogers, A.K. Nitzav, C. Xu, C. Mou, C. Emezue, C. Klamm, C. Leong, D. van Strien, D.I. Adelani, D. Radev, E.G. Ponferrada, E. Levkovizh, E. Kim, E.B. Natan, F.D. Toni, G. Dupont, G. Kruszewski, G. Pistilli, H. Elsahar, H. Benyamina, H. Tran, I. Yu, I. Abdulmumin, I. Johnson, I. Gonzalez-Dios, J. de la Rosa, J. Chim, J. Dodge, J. Zhu, J. Chang, J. Frohberg, J. Tobing, J. Bhattacharjee, K. Almubarak, K. Chen, K. Lo, L.V. Werra, L. Weber, L. Phan, L.B. allal, L. Tanguy, M. Dey, M.R. Muñoz, M. Masoud, M. Grandury, M. Šaško, M. Huang, M. Coavoux, M. Singh, M.T.J. Jiang, M.C. Vu, M.A. Jauhar, M. Ghaleb, N. Subramani, N. Kassner, N. Khamis, O. Nguyen, O. Espejel, O. de Gibert, P. Villegas, P. Henderson, P. Colombo, P. Amuok, Q. Lhoest, R. Harliman, R. Bommasani, R.L. López, R. Ribeiro, S. Osei, S. Pyysalo, S. Nagel, S. Bose, S.H. Muhammad, S. Sharma, S. Longpre, S. Nikpoor, S. Silberberg, S. Pai, S. Zink, T.T. Torrent, T. Schick, T. Thrush, V. Danchev, V. Nikoulina, V. Laippala, V. Lepercq, V. Prabhu, Z. Alyafeai, Z. Talat, A. Raja, B. Heinzerling, C. Si, D.E. Taşar, E. Salesky, S.J. Mielke, W.Y. Lee, A. Sharma, A. Santilli, A. Chaffin, A. Stiegler, D. Datta, E. Szczechla, G. Chhablani, H. Wang, H. Pandey, H. Strobelt, J.A. Fries, J. Rozen, L. Gao, L. Sutawika, M.S. Bari, M.S. Al-shaibani, M. Manica, N. Nayak, R. Teehan, S. Albanie, S. Shen, S. Ben-David, S.H. Bach, T. Kim, T. Bers, T. Fevry, T. Neeraj, U. Thakker, V. Raunak, X. Tang, Z.X. Yong, Z. Sun, S. Brody, Y. Uri, H. Tojarieh, A. Roberts, H.W. Chung, J. Tae, J. Phang, O. Press, C. Li, D. Narayanan, H. Bourfoune, J. Casper, J. Rasley, M. Ryabinin, M. Mishra, M. Zhang, M. Shoeybi, M. Peyrounette, N. Patry, N. Tazi, O. Sanseviero, P. von Platen, P. Cornette, P.F. Lavallée, R. Lacroix, S. Rajbhandari, S. Gandhi, S. Smith, S. Requena, S. Patil, T. Dettmers, A. Baruwa, A. Singh, A. Cheveleva, A.L. Ligozat, A. Subramonian, A. Névéol, C. Lovering, D. Garrette, D. Tunuguntla, E. Reiter, E. Taktasheva, E. Voloshina, E. Bogdanov, G.I. Winata, H. Schoelkopf, J.C. Kalo, J. Novikova, J.Z. Forde, J. Clive, J. Kasai, K. Kawamura, L. Hazan, M. Carpuat, M. Clinciu, N. Kim, N. Cheng, O. Serikov, O. Antverg, O. van der Wal, R. Zhang, R. Zhang, S. Gehrmann, S. Mirkin, S. Pais, T. Shavrina, T. Scialom, T. Yun, T. Limisiewicz, V. Rieser, V. Protasov, V. Mikhailov, Y. Pruksachatkun, Y. Belinkov, Z. Bamberger, Z. Kasner, A. Rueda, A. Pestana, A. Feizpour, A. Khan, A. Faranak, A. Santos, A. Hevia, A. Unldreaj, A. Aghagol, A. Abdollahi, A. Tammour, A. HajiHosseini, B. Behroozi, B. Ajibade, B. Saxena, C.M. Ferrandis, D. Contractor, D. Lansky, D. David, D. Kiela, D.A. Nguyen, E. Tan, E. Baylor, E. Ozoani, F. Mirza, F. Ononiwu, H. Rezanejad, H. Jones, I. Bhattacharya, I. Solaiman, I. Sedenko, I. Nejadgholi, J. Passmore, J. Seltzer, J.B. Sanz, L. Dutra, M. Samagaio, M. Elbadri, M. Mieskes, M. Gerchick, M. Akinlolu, M. McKenna, M. Qiu, M. Ghauri, M. Burynok, N. Abrar, N. Rajani, N. Elkott, N. Fahmy, O. Samuel, R. An, R. Kromann, R. Hao, S. Alizadeh, S. Shubber, S. Wang, S. Roy, S. Viguier, T. Le, T. Oyebade, T. Le, Y. Yang, Z. Nguyen, A.R. Kashyap, A. Palasciano, A. Callahan, A. Shukla, A. Miranda-Escalada, A. Singh, B. Beilharz, B. Wang, C. Brito, C. Zhou, C. Jain, C. Xu, C. Fourrier, D.L. Periñán, D. Molano, D. Yu, E. Manjavacas, F. Barth, F. Fuhrimann, G. Altay, G. Bayrak, G. Burns, H.U. Vrabec, I. Bello, I. Dash, J. Kang, J. Giorgi, J. Golde, J.D. Posada, K.R. Sivaraman, L. Bulchandani, L. Liu, L. Shinzato, M.H. de Bykhovetz, M. Takeuchi, M. Pámies, M.A. Castillo, M. Nezhurina, M. Sänger, M. Samwald, M. Cullan, M. Weinberg, M.D. Wolf, M. Mihaljcic, M. Liu, M. Freidank, M. Kang, N. Seelam, N. Dahlberg, N.M. Broad, N. Muellner, P. Fung, P. Haller, R. Chandrasekhar, R. Eisenberg, R. Martin, R. Canalli, R. Su, R. Su, S. Cahyawijaya, S. Garda, S.S. Deshmukh, S. Mishra, S. Kiblawi, S. Ott, S. Sang-aroonsiri, S. Kumar, S. Schweter, S. Bharati, T. Laud, T. Gigant, Kainuma, T., Kusa, W., Labrak, Y., Bajaj, Y. S., Venkatraman, Y., Xu, Y., Xu, Y., Xu, Y., Tan, Z., Xie, Z., Ye, Z., Bras, M., Belkada, Y., Wolf, T. (2023). Bloom: A 176b-parameter open-access multilingual language model.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama,K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L. , Simens, M., Askell, A., Welinder, P., Christiano, P. J., Leike, R. Lowe, R. (2022). Training language models to follow instructions with human feedback.
Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., & Amodei, D. (2017). Deep reinforcement learning from human preferences. Advances in neural information processing systems 30
Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., & Le, Q. V. (2021). Finetuned language models are zero-shot learners, arXiv preprint arXiv:2109.01652
Chung, H. W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, E., Wang, X., Dehghani, M., Brahma, S., et al. (2022). Scaling instruction-finetuned language models, arXiv preprint arXiv:2210.11416
Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Scao, T. L., Raja, A., Dey, M., Bari, M. S., Xu, C., Thakker, U., Sharma, S. S., Szczechla, E. , Kim, T., Chhablani, G., Nayak, N., Datta, D., Chang, J., Jiang, M. T. J., Wang, H., Manica, M., Shen, S., Yong, Z. X., Pandey, H., Bawden, R., Wang, T., Neeraj, T., Rozen, J., Sharma, A., Santilli, A., Fevry, T., Fries, J. A., Teehan, R., Biderman, S., Gao, L., Bers, T., Wolf, T., Rush, A.M. (2021). Multitask prompted training enables zero-shot task generalization.
Chia, Y. K., Hong, P., Bing, L., Poria, S. (2023). Instructeval: Towards holistic evaluation of instruction-tuned large language models, arXiv preprint arXiv:2306.04757
Wolf, T., Debut, L., Sanh, V. , Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf R.,, Funtowicz, M., Davison, J., Shleifer , S., von Platen, P., Ma, C. Jernite, Y., Plu, J., Xu, C., Scao T. L, Gugger, S., Drame, M. , Lhoest, Q., Rush, A. M. (2020) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, pp. 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
Dietterich, T. G. (2000). Multiple Classifier Systems: First International Workshop, MCS 2000 Cagliari, Italy, June 21–23, Proceedings 1. Springer. pp. 1–15.
Abbas, A., & Deny, S. (2022). Progress and limitations of deep networks to recognize objects in unusual poses.
Zhou, K., Yang, J., Loy, C. C, & Liu, Z. (2022). Learning to prompt for vision-language models.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273.
Ho, T. K. (1995). Proceedings of 3rd international conference on document analysis and recognition, vol. 1. IEEE. pp. 278–282.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825.
Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Chapman and Hall/CRC.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M. , Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V., (2019). Scikit-learn: Machine learning in Python, Journal of Machine Learning Res, CoRR abs/1907.11692http://arxiv.org/abs/1907.11692
Chalkidis, I., Fergadiotis, M., Malakasiotis, P., Aletras, N., Androutsopoulos, I. (2020). Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, pp. 2898–2904. https://doi.org/10.18653/v1/2020.findings-emnlp.261
Zhang, S., Roller, S., Goyal, N., Artetxe, M., Chen, M., Chen, S., Dewan, C., Diab, M., Li, X., Lin, X. V., Mihaylov, T., Ott, M., Shleifer, S., Shuster, K., Simig, D., Koura, P. S., Sridhar, A., Wang, T., Zettlemoyer, L. (2022). Opt: Open pre-trained transformer language models.
Wang, B., Komatsuzaki, A. (2021). GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax
OpenAI (2022). Introducing chatgpt. https://openai.com/blog/chatgpt
OpenAI (2023) ArXiv, https://arxiv.org/pdf/2303.08774.pdf
Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (Eds.) (2022). Advances in Neural Information Processing Systems, vol. 35. Curran Associates. pp. 22199–22213. https://proceedings.neurips.cc/paper_files/paper/2022/file/8bb0d291acd4acf06ef112099c16f326-Paper-Conference.pdf
Lu, J., Shen, J., Xiong, B., Ma, W., Staab, S., Yang, C. (2023). Hiprompt: Few-shot biomedical knowledge fusion via hierarchy-oriented prompting, arXiv preprint arXiv:2304.05973
Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R. K. W., Lim, E. P. (2023). lan-and-solve prompting: Improving zero-shot chain-ofthought reasoning by large language models, arXiv preprint arXiv:2305.04091
Takama, Y., Yada, K., Satoh, K., & Arai, S. (Eds.). (2023). New frontiers in artificial intelligence (pp. 51–67). Cham: Springer Nature Switzerland.
Floridi, L., & Chiriatti, M. (2020). Its nature, scope, limits, and consequences. Minds and Machines, 30, 681.
Chen, Y., Zhao, C., Yu, Z., McKeown, K., He, H. (2023). On the relation between sensitivity and accuracy in in-context learning.
Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J. W. (Eds.). (2021). Advances in Neural Information Processing Systems.
Zhao, Z., Wallace, E., Feng, S., Klein, D., Singh, S. (2021) Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 139. In: Meila, M., Zhang, T. (Eds.). Proceedings of Machine Learning Research (PMLR), pp. 12697–12706. https://proceedings.mlr.press/v139/zhao21c.html
Leskovec, J., Rajaraman, A., & Ullman, J. (2014). Mining of massive datasets (3rd ed.). Stanford University.
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., Artzi, Y. (2020) International Conference on Learning Representations.
Jurafsky, D., Chai, J., Schluter, N., Tetreault, J. (Eds.) (2020). Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. pp. 7881–7892. https://doi.org/10.18653/v1/2020.acl-main.704
Liévin, V., Hother, C. E., Winther, O. (2023) Can large language models reason about medical questions?
Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N. A., Lewis, M. (2022). Measuring and narrowing the compositionality gap in language models, arXiv preprint arXiv:2210.03350
Chen, S. F., Beeferman, D., Rosenfeld, R. (1998). Evaluation metrics for language models.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bilgin, O., Fields, L., Laverghetta, A. et al. Exploring Prompting Approaches in Legal Textual Entailment. Rev Socionetwork Strat 18, 75–100 (2024). https://doi.org/10.1007/s12626-023-00154-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12626-023-00154-y