Keywords

Introduction

The growing number of reported crimes suspected of involving online sexual abuse [1], along with the proliferation of online social networking communities which mainly contain young people between 15 and 24 years old [2], has made the growing phenomenon of child sexual abuse (CSA) and especially online grooming activities, even more prominent. In particular, the increase in online risky behaviours such as sexting, grooming, and child prostitution has raised concerns among parents, educators, and mental health professionals [3]. Among these activities, grooming specifically poses a significant risk to the safety and well-being of children. Grooming is defined as the process of preparing a child, significant individuals, and the environment for the purpose of sexually abusing the child [4], with specific goals such as gaining access to the child, ensuring compliance, and maintaining secrecy to prevent disclosure. Overall, grooming not only reinforces the abusers’ patterns but can also be used to justify or deny their actions.

In the context of child grooming, Information and Communication Technologies (ICTs) are commonly utilised to recruit and exploit young individuals for sexual purposes within relationships based on trust between minors and adults [5]. The grooming process often begins with the perpetrator engaging in inappropriate online sexual activities or sending explicit content. To create a safer online space for children, machine learning methods have been developed to enable the automatic detection of grooming activities in online platforms. In this chapter, we analyse the current methods created to deal with online grooming and explore their challenges. Based on our findings, we propose future directions, as part of CESAGRAM project’s response to online child sexual exploitation and abuse, in improving online grooming detection and allowing for its use in many languages; thus far, the focus has been on English content only. CESAGRAMFootnote 1 is a two-year European-funded project (GA No. 101084974) which aims at tackling online child sexual exploitation and abuse through enhancing the understanding of the process of grooming, and more particularly, the way it is facilitated by technology, as well as its link to CSA and missing-children’s cases, a sector currently under-researched. Research, training, and awareness raising, development of a set of artificial intelligence (AI) tools which will facilitate the detection and prevention of grooming content online, and advocacy are the main pillars of the project activities per se during its 2-year lifespan.

Background

Child grooming commonly starts in an online setting, instigated by adults by forwarding inappropriate content or employing sexual activities to children. These actions aim to desensitise the child and increase the likelihood of future sexual abuse [6]. Although grooming methods may vary, certain constants can be observed throughout the process. The perpetrator intentionally desensitises the child both physically and psychologically, making them more susceptible to engaging in sexual activities. Techniques such as active involvement, power dynamics, and control are employed to manipulate the child and reduce their inhibitions [7]. A comprehensive understanding of the nature and characteristics of grooming is crucial in addressing the risks associated with online activities and ensuring the protection of young children. By recognising the complex aspects of grooming and its manifestation in the digital realm, strategies to effectively prevent and respond to this form of abuse can be developed.

In order to not only prevent and combat child grooming, both offline and online, but also, to protect children’s rights, several legislative efforts have been proposed and adopted on national, European, and international level. The United Nations Convention on the Rights of the Child (UNCRC),Footnote 2 the Universal Declaration of Human Rights,Footnote 3 along with the Charter of Fundamental Rights of the European Union and the European Convention on Human Rights (ECHR),Footnote 4 have been crucial treaties that ensure among others, the proper respect and protection of children’s rights and well-being. Parallel to those, the Council of Europe Convention (Lanzarote Convention)Footnote 5 has adopted specific measures on the Protection of Children against Sexual Exploitation and Sexual Abuse, complemented with the Directive 2011/93/EU of the European Parliament,Footnote 6 while a new RegulationFootnote 7 has been proposed to empower the prevention, detection, reporting, and removal of child sexual abuse material and grooming online, and the further support of the victims. Furthermore, the European Union has proposed a five-year strategy (2020–2025)Footnote 8 focusing on the need for better coordination among responsible stakeholders through multi-stakeholder cooperation, with the goal of having a strong legal framework in place and establishing a strengthened law enforcement response that facilitates Member States in addressing the new challenges stemming from emerging technological advancements.

Landscape of Available Grooming Data

Machine learning has been extensively leveraged to develop solutions that could enable effective detection of potential online child grooming activities. To allow for a robust creation of machine and deep learning models, the need for qualitatively and quantitatively labelled (annotated) datasets is more than mandatory. However, the availability of publicly available datasets containing grooming examples is rather limited possibly due to the sensitivity of the subject under study. Nevertheless, thus far there have been some initial attempts to create datasets that could be exploited by machine/deep learning models to tackle to the extent possible the problem at hand. One of the largest sources of predatory conversations comes from Perverted-Justice (PJ),Footnote 9 which contains chat logs of individuals convicted of grooming, conversing with decoy operators rather than actual victims. However, the majority of logs are over a decade old, with 2016 being the most recent, while the largest part of them is from earlier than 2010. Having data from so many years back can negatively affect the effective detection of potential grooming activities, as the models developed may struggle or even fail to capture recent changes in dialogue and predatory tactics, due to their potential outdated content.

ChatCoder2 [8], another source of data, consists of only predatory conversations extracted from Perverted-Justice. Overall, it contains 497 conversations (chats) and was mainly built for studying the semantic segmentation of grooming chats, characterising each segment as predatory or not. Moreover, the PAN12 [9] dataset consists of non-grooming conversations, obtained from the logs of IRC (Internet Relay Chat) channels and of the chatting site Omegle,Footnote 10 in conjunction with grooming ones. Non-grooming conversations also include cybersex between consenting adults among other non-predatory ones, while similarly to the PJ conversations, grooming chats are from decoy operations, whereas the non-grooming chats are with real people. Conversations are split into segments, with the dataset containing 222 k segments with only 2.58% of them being grooming; this distribution aims to mimic the real distribution of grooming chats on the Web. Finally, since the dataset was introduced in 2012, the conversations comprising it are up to that date.

In addition, a dataset combining the aforementioned two is known as PANC [10] and consists of non-predatory segments from PAN12 and full-length predator chats from ChatCoder2 divided into segments. Finally, PJZC [11] contains data originating from PAN12, organised in JSON format and re-organised by the authors in a way to fit their task of early grooming detection. Specifically, the authors combined predatory segments belonging to the same conversation and labelled the entire conversation as predatory, rather than the individual segments, with the aim of detecting early signs of grooming attempts in entire conversations.

Based on the above, several limitations can be observed with the grooming-related datasets available. First, the data come from earlier years, hindering the process of effectively identifying more recent manifestations of grooming activities. Additionally, they all contain decoys and not real victims, which can also hinder the effectiveness of machine/deep learning models when trying to detect grooming in conversations consisting of real victims and perpetrators; it is particularly difficult to imitate the real behaviour of other persons as each person has a unique way of reacting to a situation and consequently expressing their feelings. Finally, another less apparent limitation is the lack of multilingual grooming datasets, as all existing ones contain only conversations written in English, thus restricting their use in grooming detection for other languages.

Machine and Deep Learning Methods for Grooming Detection

As mentioned, machine and deep learning have been leveraged thus far to enable the detection of online grooming activities. In particular, grooming detection is tackled as a text-based binary classification problem over a set of chat messages, typically split into segments, with the goal of identifying whether a segment contains grooming or not. Each text consists of a sequence of words that must be converted into machine-readable representations before being fed to a machine/deep learning model. One of the most commonly used methods to this end is Term Frequency-Inverse Document Frequency (TF-IDF) [12, 13]. However, more recently, to enable a better representation of textual data, pre-trained word embeddings obtained from Word2Vec or GloVe [14] have been exploited that allow also capturing the relationships between words in a sequence.

Focusing on the models themselves, most works apply traditional machine learning-based solutions, including Support Vector Machines (SVM) [13, 15], k-Nearest Neighbors (kNN) [15], and Logistic Regression (LR) [16]. Additionally, feature extraction and grooming characteristic detection inside the examined chats were shown to aid in the classification process. In detail, for instance, it was found that providing the model with a binary vector denoting the existence of seventeen distinct grooming characteristics extracted from the text at hand (e.g. asking questions to know the risk of conversation and asking if the child is alone or under adult or friend supervision), instead of the TF-IDF vector representation of the actual text, leads to increased detection performance [15].

Deep learning-based models have also been employed for grooming detection, including multi-layer perceptron (MLP) [12] and convolutional neural networks (CNN) [14]. The former followed an author-based approach, where all messages in a conversation originating from the same author were grouped together to deduce whether there is any grooming activity. In the latter, it was first explored using recurrent neural networks (RNN), concluding that their performance would be inadequate when dealing with large segments of conversation as is typical in the field of grooming detection. To this end, they instead proposed the use of a CNN-based model whose performance is not degraded with that issue. Finally, through experimentation, they additionally found that providing CNN with the input data directly so that the model can learn the embeddings itself, can help increase the performance. In particular, in such a case the model will be able to learn task-specific word representations, especially for words commonly used in grooming chats, which are not present in pre-trained embedding models (such as Word2Vec or GloVe) that are often used in classification tasks.

Open Challenges

Despite the ever-increasing efforts to develop effective methods to deal with online grooming activities, the field still faces significant challenges that impede progress. First, as pointed out earlier, publicly available datasets are scarce and mostly come from a single source (namely, Perverted-Justice). However, even for the existing ones, their suitability is somewhat questionable, as they include data from even more than a decade ago, and therefore, there is an increased possibility that they cannot effectively capture today’s way of expressing (e.g. higher prominence of transliterations in recent years and different slang terms) and manifesting grooming overall. Additionally, as mentioned, the currently available datasets do not contain conversations conducted by real victims but only decoy operators, which raises questions as to whether the models that will be developed will be able to be effective in real-life scenarios. Finally, the absence of multilingual grooming datasets makes it difficult to apply grooming detection to non-English conversations, giving rise to another open challenge. As a countermeasure, language models (LMs) can be used to translate the existing datasets into the desired language; however, this approach could introduce bias to the dataset, or fail to capture the unique idioms of each language, thereby hindering detection effectiveness overall.

Focusing on the models themselves, and in particular on the important step of text representation, thus far existing approaches mostly make use of simple solutions, such as TF-IDF or non-contextual embeddings like Word2Vec and GloVe. With such approaches, the structure of a text (sentence) is not taken into consideration, but instead the representations are extracted for each utterance regardless of the context being used. These representations, while effective in certain scenarios, may not be the most effective in grooming detection, where the context in which each word is used could be vital in determining whether or not a conversation potentially contains grooming attempts. Thus, to facilitate the detection process, a possible solution could be to train and use contextual embeddings, such as BERT [17], that consider the context of a word in a sentence in contrast to the non-contextualised ones, while also enabling the representation of slang words commonly used in chats, that may not be present in pre-trained embedding models such as GloVe [18]. However, training models to provide contextual embeddings for grooming detection is a challenging task, as the amount of grooming-related data is limited, and such models require a large amount of instances as well as resources to provide high-quality embeddings.

Future Directions

In the online world, individuals can maintain multiple identities across different platforms, or even within the same one, with the goal of either deceiving a wider range of individuals or better concealing and maintaining their online identity; e.g. even if an account is detected for infringing behaviour, their activity can seamlessly continue [19]. As mentioned, often perpetrators resort to a similar course of action with the aim of deceiving their victims [20], e.g. through victim isolation and trust development, making it difficult to identify accounts that are managed by the same person in a timely manner. However, each individual’s personality is made up of a unique set of behaviours, experiences, and feelings, which is also reflected in the way of writing. To that end, the writing blueprint could be leveraged by automatic mechanisms known as identity resolution that allow for the uncovering of potential links among the unprecedented high number of online user accounts [21]. So far, identity resolution has been employed by law enforcement as a way to uncover previously unknown connections between actors that share common characteristics (e.g. similar address) [22], thus paving the way for its use in the fight against grooming activities as well. Stylometric attributes (e.g. vocabulary diversity or writing idiosyncrasies), as well as contextualised distributional semantic features (e.g. captured by BERT) can be leveraged in an attempt to identify multiple accounts likely to be operated by the same perpetrator [23]. Ultimately, unknown, well-hidden relationships can be revealed, thus allowing identification of further potential victims at early stages.

Similarly, adapting the way language is perceived by LMs through fine-tuning [24] to better reflect current trends in written language in online settings, such as the change in word meanings over time, slang terms, and transliteration, will be an invaluable asset in the development of more effective grooming detection systems. While such approaches require unlabelled and generic data, they do not circumvent the lack of training data for grooming detection. As such, in-depth experimental investigation is required in annotating new gold-standard data, creating synthetic data that simulate real behaviours to the extent possible [25], or considering transfer learning approaches such as few-shot and meta-learning [26, 27].