Skip to main content
Log in

Automatic deception detection in Italian court cases

  • Published:
Artificial Intelligence and Law Aims and scope Submit manuscript

Abstract

Effective methods for evaluating the reliability of statements issued by witnesses and defendants in hearings would be an extremely valuable support to decision-making in court and other legal settings. In recent years, methods relying on stylometric techniques have proven most successful for this task; but few such methods have been tested with language collected in real-life situations of high-stakes deception, and therefore their usefulness outside lab conditions still has to be properly assessed. In this study we report the results obtained by using stylometric techniques to identify deceptive statements in a corpus of hearings collected in Italian courts. The defendants at these hearings were condemned for calumny or false testimony, so the falsity of (some of) their statements is fairly certain. In our experiments we replicated the methods used in previous studies but never before applied to high-stakes data, and tested new methods. We also considered the effect of a number of variables including in particular the homogeneity of the dataset. Our results suggest that accuracy at deception detection clearly above chance level can be obtained with real-life data as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. To be precise, Art. 372 reads:

    Chiunque, deponendo come testimone innanzi all’Autorità Giudiziaria, afferma il falso o nega il vero, ovvero tace, in tutto o in parte, ciò che sa intorno ai fatti sui quali è interrogato, è punito con la reclusione da due a sei anni.

    I.e., this article punishes who, in front of the Judicial Authority, says the false or denies the truth, or does not reveal what he knows about the investigated facts.

  2. Specifically, Art. 368 reads:

    Chiunque, con denunzia, querela, richiesta o istanza, anche se anonima o sotto falso nome, diretta all’Autorità Giudiziaria o ad altra Autorità che a quella abbia obbligo di riferirne, incolpa di un reato taluno che egli sa innocente, ovvero simula a carico di lui le tracce di un reato, è punito con la reclusione da due a sei anni.

    I.e., this article is violated whenever an individual tries to shift the blame for some crime on someone who he knows being innocent.

  3. When in doubt, side with the accused.

  4. In particular, until 2005 the hearings were mainly recorded on tapes, which were used to be re-employed several times once the transcription was carried out. Therefore the audio tracks of the earliest hearings are definitively lost. Since 2006, instead, the audio tracks are recorded on CD-rom, and an attempt to get them is in process.

  5. http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html.

  6. Because our utterances are transcriptions of spoken language, the punctuation marks were inserted by the transcriber. They seemed nevertheless essential to understand the meaning of many utterances, hence their inclusion.

  7. The LIWC for several languages can be obtained from http://www.liwc.net.

  8. “They”, “Passive” and “Formal”, respectively.

  9. Here and in the rest of the paper we indicate the highest accuracy achieved in bold.

  10. “xxxxx” substitutes an anonymized token, such as proper names or surnames, names of places and so on.

  11. http://paleo.di.unipi.it/it/parse.

References

  • Adams SH (1996) Statement analysis: what do suspects’ words really reveal? FBI Law Enforc Bull 65(10):12–20

    Google Scholar 

  • Alparone F, Caso S, Agosti A, Rellini A (2004) The Italian LIWC2001 dictionary. LIWC.net, Austin

  • Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics. Comput Linguist 34(4):555–596

    Article  Google Scholar 

  • Bachenko J, Fitzpatrick E, Schonwetter M (2008) Verification and implementation of language-based deception indicators in civil and criminal narratives. In: Proceedings of the 22nd international conference on computational Linguistics—volume 1, COLING ‘08, pp 41–48, Stroudsburg, PA, USA. Association for Computational Linguistics

  • Bond CF, De Paulo BM (2006) Accuracy of deception judgments. Pers Soc Psychol Rev 10(3):214–234

    Article  Google Scholar 

  • Buller D, Burgoon J (1996) Interpersonal deception theory. Commun Theory 6:203–242

    Article  Google Scholar 

  • Chinchor N (1992) Muc-4 evaluation metrics. In: Proceedings of the 4th conference on message understanding, MUC4 ’92, pp 22–29, Stroudsburg, PA, USA. Association for Computational Linguistics

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Google Scholar 

  • Coulthard M (2004) Author identification, idiolect, and linguistic uniqueness. Appl Linguist 25(4):431–447

    Article  Google Scholar 

  • Davatzikos C, Ruparel K, Fan Y, Shen D, Acharyya M, Loughead J, Gur R, Langleben D (2005) Classifying spatial patterns of brain activity with machine learning methods: application to lie detection. NeuroImage 28(3):663–668

    Article  Google Scholar 

  • De Paulo BM, Lindsay JJ, Malone BE, Muhlenbruck L, Charlton K, Cooper H (2003) Cues to deception. Psychol Bull 129(1):74–118

    Article  Google Scholar 

  • Ekman P (2001) Telling lies: clues to deceit in the marketplace, politics, and marriage. W.W. Norton

  • Feng S, Banerjee R, Choi Y (2012) Syntactic stylometry for deception detection. In: Proceedings of the 50th annual meeting of the association for computational linguistics (volume 2: Short Papers), pp 171–175, Jeju Island, Korea. Association for Computational Linguistics

  • Fitzpatrick E, Bachenko J (2009) Building a forensic corpus to test language-based indicators of deception. Lang Comput 71(1):183–196

    Google Scholar 

  • Fitzpatrick E, Bachenko J (2012) Building a data collection for deception research. In: Proceedings of the workshop on computational approaches to deception detection, pp 31–38, Avignon, France. Association for Computational Linguistics

  • Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305

    MATH  Google Scholar 

  • Fornaciari T, Poesio M (2011) Sincere and deceptive statements in Italian criminal proceedings. In: Proceedings of the international association of forensic linguists 10th biennial conference, pp 126–138, Cardiff, Wales, UK

  • Frank MG, Feeley TH (2003) To catch a liar: challenges for research in lie detection training. J Appl Commun Res 31(1):58–75

    Article  Google Scholar 

  • Frank MG, Menasco MA, O’Sullivan M (2008) Human behavior and deception detection. In: Voeller JG (ed) Wiley handbook of science and technology for homeland security. Wiley, New York

    Google Scholar 

  • Ganis G, Kosslyn S, Stose S, Thompson W, Yurgelun-Todd D (2003) Neural correlates of different types of deception: an fMRI investigation. Cereb Cortex 13(8):830–836

    Article  Google Scholar 

  • Giannone C, Basili R, Del Vescovo C, Naggar P, Moschitti A (2009) Kernel-based relation extraction from investigative data. In: Proceedings of the third workshop on analytics for noisy unstructured text data, AND ’09, pp 93–100, New York, NY, USA. ACM

  • Gokhmann S, Hancock J, Prabhu P, Ott M, Cardie C (2012) In search of a gold standard in studies of deception. In: Fitzpatrick E, Bachenko J, Fornaciari T (eds) Proceedings of the EACL workshop on computational approaches to deception detection, pp 23–30

  • Hancock JT, Curry LE, Goorha S, Woodworth M (2008) On lying and being lied to: a linguistic analysis of deception in computer-mediated communication. Discourse Process 45(1):1–23

    Article  Google Scholar 

  • Hauch V, Blandón-Gitlin I, Masip J, Sporer SL (2012) Linguistic cues to deception assessed by computer programs: a meta-analysis. In: Fitzpatrick E, Bachenko J, Fornaciari T (eds) Proceedings of the workshop on computational approaches to deception detection, pp 1–4, Avignon

  • Ireland ME, Slatcher RB, Eastwick PW, Scissors LE, Finkel EJ, Pennebaker JW (2011) Language style matching predicts relationship initiation and stability. Psychol Sci 22(1):39–44

    Article  Google Scholar 

  • Jensen ML, Meservy TO, Burgoon JK, Nunamaker JF (2010) Automatic, multimodal evaluation of human interaction. Group Decis Negot 19(4):367–389

    Article  Google Scholar 

  • Karatzoglou A, Meyer D, Hornik K (2006) Support vector machines in r. J Stat Softw 15(9):1–28

    Google Scholar 

  • Koppel M, Schler J, Argamon S, Pennebaker J (2006) Effects of age and gender on blogging. In: AAAI 2006 spring symposium on computational approaches to analysing weblogs

  • Levine TR, Feeley TH, McCornack SA, Hughes M, Harms CM (2005) Testing the effects of nonverbal behavior training on accuracy in deception detection with the inclusion of a bogus training control group. West J Commun 69(3):203–217

    Article  Google Scholar 

  • Lord RD (1958) Studies in the history of probability and statistics.: Viii. de morgan and the statistical study of literary style. Biometrika 45(1/2):282–282

    Article  Google Scholar 

  • Lutoslawski W (1898) Principes de stylomtrie. Revue des tudes grecques 41:61–81

    Google Scholar 

  • Luyckx K, Daelemans W (2008) Authorship attribution and verification with many authors and limited data. In: Proceedings of the 22nd international conference on computational linguistics—volume 1, COLING ’08, pp 513–520, Stroudsburg, PA, USA. Association for Computational Linguistics

  • Merikangas JR (2008) Commentary: functional mri lie detection. J Am Acad Psychiatry Law 36(4):499–501

    Google Scholar 

  • Mosteller F, Wallace D (1964) Inference and disputed authorship: the federalist. Addison-Wesley, Reading

    MATH  Google Scholar 

  • Newman ML, Pennebaker JW, Berry DS, Richards JM (2003) Lying words: predicting deception from linguistic styles. Pers Soc Psychol Bull 29(5):665–675

    Article  Google Scholar 

  • Niederhoffer KG, Pennebaker JW (2002) Linguistic style matching in social interaction. J Lang Soc Psychol 21(4):337–360

    Article  Google Scholar 

  • Peersman C, Daelemans W, Van Vaerenbergh L (2011) Age and gender prediction on netlog data. Presented at the 21st Meeting of Computational Linguistics in the Netherlands (CLIN21), Ghent, Belgium.

  • Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count (LIWC): LIWC2001. Lawrence Erlbaum Associates, Mahwah

  • Pepe G (ed) (1996) La falsa donazione di Costantino. Tea storica. TEA

  • Porter S, Woodworth M, Birt AR (2000) Truth, lies, and videotape: an investigation of the ability of federal parole officers to detect deception. Law Hum Behav 24(6):643–658

    Article  Google Scholar 

  • Sasaki Y (2007) The truth of the F-measure. Teach Tutor mater, pp 1–5

  • Schmid H (1994) Probabilistic part-of-speech tagging using decision trees. In: Proceedings of international conference on new methods in language processing

  • Simpson JR (2008) Functional mri lie detection: too good to be true? J Am Acad Psychiatry Law 36(4):491–498

    Google Scholar 

  • Solan LM, Tiersma PM (2004) Author identification in american courts. Appl Linguist 25(4):448–465

    Article  Google Scholar 

  • Stein B, Koppel M, Stamatatos E (2007) Plagiarism analysis, authorship identification, and near-duplicate detection pan’07. SIGIR Forum 41:68–71

    Article  Google Scholar 

  • Strapparava C, Mihalcea R (2009) The lie detector: explorations in the automatic recognition of deceptive language. In: Proceeding ACLShort ’09—proceedings of the ACL-IJCNLP 2009 conference short papers

  • Undeutsch U (1967) Beurteilung der Glaubhaftigkeit von Aussagen [Veracity assessment of statements]. In: Undeutsch U (ed) Handbuch der psychologie: vol 11. Forensische Psychologie. Hogrefe, Gottingen, pp 26–181

  • Undeutsch U (1982) Statement reality analysis. In: Trankell A (ed) Reconstructing the past: the role of psychologists in criminal trials. Kluwer, Deventer, pp 27–56

  • Undeutsch U (1984) Courtroom evaluation of eyewitness testimony. Appl Psychol 33(1):51–66

    Article  Google Scholar 

  • Vaassen F, Daelemans W (2011) Automatic emotion classification for interpersonal communication. In: 2nd workshop on computational approaches to subjectivity and sentiment analysis (WASSA 2.011)

  • Vrij A (2008) Detecting lies and deceit: pitfalls and opportunities. Wiley series in psychology of crime, policing and law, 2nd edition. Wiley, Chichester

  • Vrji A (2005) Criteria-based content analysis—a qualitative review of the first 37 studies. Psychol Public Policy Law 11(1):3–41

    Article  Google Scholar 

  • Walczyk JJ, Roper KS, Seemann E, Humphrey AM (2003) Cognitive mechanisms underlying lying to questions: response time as a cue to deception. Appl Cogn Psychol 17(7):755–774

    Article  Google Scholar 

  • Wang JT, Spezio M, Camerer CF (2010) Pinocchio’s pupil: using eyetracking and pupil dilation to understand truth telling and deception in sender-receiver games. Am Econ Rev 100(3):984–1007

    Article  Google Scholar 

  • Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’99. ACM, New York, pp 42–49

  • Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. CiteSeerX—Scientific Literature Digital Library and Search Engine [http://citeseerx.ist.psu.edu/oai2] (United States)

  • Zhou L, Shi Y, Zhang D (2008) A statistical language modeling approach to online deception detection. IEEE Trans Knowl Data Eng 20(8):1077–1081

    Article  Google Scholar 

Download references

Acknowledgments

To create DeCour has been very complex, and it would not have been possible without the kind collaboration of a lot of people. Many thanks to Dr. Francesco Scutellari, President of the Court of Bologna, to Dr. Heinrich Zanon, President of the Court of Bolzano, to Dr. Francesco Antonio Genovese, President of the Court of Prato and to Dr. Sabino Giarrusso, President of the Court of Trento.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tommaso Fornaciari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fornaciari, T., Poesio, M. Automatic deception detection in Italian court cases. Artif Intell Law 21, 303–340 (2013). https://doi.org/10.1007/s10506-013-9140-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10506-013-9140-4

Keywords

Navigation