There is no greater inhumanity in the world than hurting or belittling a child. Child abuse is the purposeful harming of a minor physically, psychologically, or sexually (Lippard & Nemeroff, 2020 Jan 1). Any action or inaction by a parent or carer that causes actual or possible harm to a kid is considered child abuse. This can happen in the child’s home, as well as in the businesses, institutions, and communities with whom the child engages. Child sexual abuse (CSA) is a form of abuse in which sexual interaction with a child/juvenile is involved (Lippard & Nemeroff, 2020). A total of 9.2% of children who experienced abuse were sexually abused, according to the Children’s Bureau report (National Center for Victims of Crime). The report also claims that children between 7 and 13 years of age are most vulnerable (National Center for Victims of Crime). According to self-report surveys, 5–10% of male adults and 20% of adult females recall experiencing sexual abuse as children (National Center for Victims of Crime). Physical interactions between offenders and children are not always essential in cases of child sexual abuse. Notable examples of child sexual assault are fondling, having obscene conversations (text messages, phone calls), sex trafficking, intercourse, and creating/owning pornographic photos or films of children (Russell et al., 2020). Most offenders are someone known to both children and family. They could be a family member, caretaker, sibling, teacher, playmate, or anybody else who has a relationship with the minor (Tener et al., 2021). Perpetrators may employ different methods to urge victims to remain silent about the assault. A subject of child sexual abuse is often blackmailed by the abuser by their position of authority over them. They could also inform children that the behavior is routine. The offender may threaten the children if they decline to cooperate or decide to notify someone else (Assink et al., 2019).

Recognizing child sexual abuse can be difficult, and some kids may not exhibit visible signs. (McTavish et al., 2019). It may be much more difficult to identify the assailant if they are well known to the child’s parents (friends/relatives). However, a few physical and behavioral signs can be observed. Physical signs include genital bleeding/bruising, strained/torn underclothes, difficulty sitting/walking, urinary infections, and a burning/itching sensation in the genital area (Scoglio et al., 2021). Behavioral signs include hygiene changes, development of phobias, depression, suicidal behavior (in teenagers), poor academic performance in schools, bed wetting, showing regressive behavior (thumb sucking, nail biting, eyebrow plucking), and getting threatened by physical contact (Christ et al., 2019).

Child sexual abuse awareness is highly crucial in the modern digital world. Recognizing signs of abuse is essential, and understanding warning signs early can prevent the catastrophic effects. Some behavioral signs, verbal cues, and physical signs have been discussed above. Children of any race, social status, religion, or culture may experience sexual abuse (Gushwa et al., 2019). There are measures which can be taken to lessen the risk of sexual assault, but there is no surefire method to do. Being proactively engaged in a child’s life might make indications of child sexual abuse more visible and encourage him/her to speak up if something is wrong (Knack et al., 2019). Genuine interest must be shown in their daily activities. They must also know the people who spend time with their children. Caretakers/babysitters must be hired after rigorous screening. Child sexual abuse can also happen online. The children must be monitored carefully when they use electronic gadgets such as phones and computers (Joleby et al., 2021). Children must also be encouraged to speak up. They must also be taught about boundaries. The children must be reminded that no one has the right to touch them if it makes them uncomfortable. The child must also know the names of body parts. The children who learn these terms can tell their parents if they are touched inappropriately. Many perpetrators use concealment or intimidation to persuade minors not to report abuse. It is critical to reassure the child on a regular basis that talking openly will not lead to punishment, irrespective of what they speak. Figure 1 describes some of the ways to prevent sexual abuse of children.

Fig. 1
figure 1

Essential steps required to prevent child sexual abuse

When discussing a sensitive topic such as CSA, ethical implications must also be considered (McTavish et al., 2019). It is important to take prior consent from the child’s parents/regulatory bodies. Privacy is also of utmost concern and we need to make sure that the sensitive data does not get leaked or shared. It can have a devastating psychological impact on the patient if the data security is compromised.

Modern day technologies can be used to battle CSA. Subjects such as artificial intelligence (AI), blockchain, virtual reality, cybersecurity, and others can play a crucial role. AI is the imitation of human-like intelligence that is expected to act and reason like people. The best quality of AI is the ability to reason and make decisions with the best likelihood of reaching a specific objective (Glikson & Woolley, 2020). Machine learning (ML) is a subset of AI. ML is a developing technology that allows computers to learn autonomously from historical data (Waring et al., 2020). It uses various techniques to create mathematical algorithms and predictions based on previous data or information. Supervised, unsupervised, and reinforcement learning are the three approaches in ML. ML applications have been used in fields such as engineering, medicine, finance, pharmacy, bioinformatics, social science, psychometric analysis, and others (Arena et al., 2022; Önden et al., 2023).

A few studies have used AI to mitigate child sexual abuse. A convolutional neural network was used to detect CSA using self-figure drawings (Kissos et al., 2020). A maximum accuracy of 88% was obtained by the algorithm. Machine learning and text mining were used to detect child abuse in another research (Amrit et al., 2017). The data was collected from various medical facilities in the Netherlands. A maximum accuracy of 83% was obtained by the ensemble algorithm. Ucuz et al. (2022) developed a decision support system to identify PTSD and depression in children after being sexually abused. A total of 21 boys and 121 girls were included in this study. It has been demonstrated that the framework created utilizing this data can be used to distinguish between psychiatric illnesses and others in the immediate aftermath of the abuse. Identification of sexual offenders during online chats were found using an ensemble method (Fauzi & Bours, 2020). This article addresses the issue of child grooming. Multiple algorithms were used and an F 0.5 score of 0.9348 was obtained by the naïve Bayes algorithm.

Explainable artificial intelligence (XAI) is a collection of frameworks and tools designed to assist in understanding and interpreting predictions provided by ML models (Loh et al., 2022). It contributes to the definition of model correctness, integrity, visibility, and results in intelligent decision-making. It provides a lot of visualization, too. Hence, ML and XAI methods have been used to predict child sexual abuse awareness in this research. Sexual abuse in children is a significant public health issue and not many people are aware of it. Many parents do not bother to learn this sensitive topic. Hence, it is vital to use ML algorithms to predict a person’s knowledge level regarding sexual abuse. If the person lacks enough information, they can be made aware of it. Adults are accountable for ensuring that their children have secure, stable, and healthy relationships. The contributions of our research are as follows:

  • Various machine learning and deep learning algorithms have been tested to predict child sexual abuse awareness in people. Further, the machine learning algorithms are stacked in multiple levels to increase the accuracy. At the time of writing this manuscript, no other articles have been published regarding this topic.

  • Mutual information technique has been used prior to model training to rank the features.

  • XAI techniques such as Shapley additive values (SHAP), local interpretable model-agnostic explanations (LIME), QLattice, and Eli5 models have been used to understand and interpret model predictions.

  • Further discussion about child sexual abuse prevention has also been provided.

The rest of the article is as follows: Materials and methods are described in the second section. Results and discussion are described in the third section. Conclusion and future directions are included in the last section.

Materials and Methods

Dataset Description

The child sexual abuse awareness prediction was obtained from Kaggle (a public dataset repository) (Child Sexual Abuse Awareness Prediction). The questionnaire dataset was used to predict the knowledge level of adults regarding child sexual abuse. Three thousand and two people answered the questions (tuples). The number of attributes was nine, including the label. The questions and choices provided to the people in the survey are described in Table 1. The label contains the knowledge level as dichotomous values (beginner, intermediate). The number of people labeled as beginners was 1711 (57%) and the number of people labeled as intermediates was 1291 (43%).

Table 1 Description of the questions (attributes) asked in the survey

Dataset Preprocessing

Data preprocessing is a crucial step in machine learning. During this phase, the data is made ready for model training (Wang et al., 2021). The primary step is the removal of null values. However, this dataset did not suffer from null values. The frequency of choices chosen by the people for each question is described using bar graphs in Fig. 2. Figure 2(a) shows that most people with beginner-level knowledge think that children are safe with their relatives. This is not always true since most perpetrators are well known to the children and their parents (Tener et al., 2021). From Fig. 2(b), it can be seen that most people with intermediate-level awareness disagree that strangers mainly abuse children. Children are often exploited by people known to them, such as relatives, siblings, caretakers, and teachers. From Fig. 3(c), it can be seen that both sections of people (beginner and intermediate knowledge levels) agree that male children need information about sexual abuse, too. The abusers are known to exploit both male and female children (Sivagurunathan et al., 2019). However, female children are generally more vulnerable to sexual abuse (Tozdan et al., 2019). From Fig. 2(d), it can be inferred that most people with beginner-level awareness think that sexual abuse prevention must not be taught in school since it can make children curious about sex. Parents and teachers must teach their children about sexual abuse prevention. Most children who suffer from sexual abuse do not know what sexual abuse is. Hence, they will not be able to communicate about it with their loved ones. From Fig. 2(e), it can be inferred that most people with beginner-level knowledge do not know about “child grooming.” Child grooming is the practice of developing close bonds with a child to sexually abuse him/her (Ringenberg et al., 2022). Child grooming is also used to entice children into engaging in illegal activities such as child trafficking, child pornography, and prostitution. From Fig. 2(f), it can be inferred that most people with beginner-level awareness do not know how to identify if their child has been abused. From Fig. 2(g), it can be inferred that most people (both beginner and intermediate-level knowledge) think that children require counseling if they have been sexually exploited. From Fig. 2(h), it can also be inferred that people (both beginner and intermediate-level knowledge) agree that they should take legal action against the offender.

Fig. 2
figure 2

The frequencies of choices chosen by the people regarding child sexual abuse awareness

Fig. 3
figure 3

Pearson’s correlation heatmap indicating the relationship between two variables

Most ML models do not handle string/textual data. Hence, all the string values were numerically encoded into “0”s and “1”s. The encoding performed on the categorical attributes is described in Table 2. Categorical values must be encoded further before using it for model training. This is a crucial step since ML models give preference to larger numbers. Effect encoding, binary encoding, one-hot encoding, hash encoding, base-N encoding, and target encoding are some of the methods of encoding (Al-Shehari & Alsowail, 2021). The categorical dataset was encoded using one-hot encoding in this study. In one hot encoding, an attribute is generated for each category of categorical variable. One hot encoding has various advantages such as its ability to preserve information, prevents misinformation, improves model performance, reduces bias, and many more (Okada et al., 2019). Data scaling is an essential step in ML (Schulz et al., 2020). The models favor attributes with higher values irrespective of the units used if scaling is not performed. However, categorical variables are immune to this bias. Since all the attributes were categorical in nature, scaling was not performed in this research. Many real-time datasets are imbalanced in nature. When there is an imbalance, the classifiers favor the majority class. Hence, major imbalance between the classes does not exist in this dataset. The number of people labeled as beginners were 1711 (57%) and the number of people labeled as intermediates were 1291 (43%). The intermediate class was not significantly lower compared to beginner class. Hence, data balancing was not performed to protect the integrity of the data. The input was then split into training and testing in the ratio of 80:20. The preprocessed data was used for model training. Outliers were not handled in this study since the results might become biased during real-time testing.

Table 2 Categorical values encoding from text to numeric scale

Pearson’s correlation can be used to understand the relationship between two variables. The Pearson’s correlation coefficient “r” describes the relationship quantitatively. The value of “r” lies between “ − 1” and “1.” If the value is close to “ − 1,” it indicates an inverse relationship between the variables. If the value is close to “1,” it indicates a linear relationship between the variables. If the value lies near “0,” no meaningful relationship exists between the two variables. The Pearson’s correlation heatmap is described in Fig. 3. According to the graph, the most significant variables which contribute to knowledge level are “Do you know what signs to look for to identify if your child has been abused?” and “Do you know what child grooming is?”.

For inferential statistics, chi-square test has been utilized. Chi square test determines the importance of each categorical variable with respect to the target variable. The results obtained by the chi square test are described in Table 3. It can be seen that the p value obtained by all the variables are < 0.001. This indicates that all the variables are crucial in predicting the model output.

Table 3 Inferential statistics using chi square tests

Mutual Information to Rank the Features

Feature selection is an important step in machine learning. The accuracies improve when redundant features are minimized. A well-known and a practical method for determining correlation between explanatory variables is mutual information (Song et al., 2021). Entropy is a metric to measure uncertainty between random variables. Entropy decreases when new insights/information is obtained. If A is a random variable and \(A=\{{a}_{1}, {a}_{2}, {a}_{3}\dots {a}_{m}\}\), the entropy of A is represented by H(A). H(A) is calculated using the below equation.

$$H\left(A\right)= - \mathop{\sum}\nolimits_{j=1}^{m}p\left({a}_{j}\right)\mathrm{log}\;p\;({a}_{j})$$

where probability of \({a}_{j}\) is \(p({a}_{j})\). If B is another random variable and \(B=\{{b}_{1},{b}_{2},{b}_{3}\dots {b}_{n}\}\), the joint entropy of (A, B) is represented as

$$H\left(A,B\right)=H \left(A\right)+H(A|B)$$

where \(H(A|B)\) is the conditional entropy. Let A and B be the to random variables. Mutual information \(M(A;B)\) signifies the decrease in A’s uncertainty when B is present. Mutual information is represented using entropy and conditional entropy as given below.


where \(H\left(A\right)\) and \(H\left(B\right)\) are the entropies of A and B. \(H\left(A|B\right)\) and \(H(B|A)\) are the conditional entropies of A and B.

Mutual information was implemented using the mutual_info_classif library in this study. The features were organized in descending order of significance, as given in Fig. 4. The most important attributes were “Do you know what signs to look for to identify if your child has been abused?” and “Do you know what child grooming is ?”.

Fig. 4
figure 4

Feature ranking using mutual information

Results and Discussion

Performance Metrics

The metrics considered for the ML classifiers are confusion matrix, accuracy, precision, recall, f1-score, ROC curve, AUC, precision-recall curve, and average precision (Luque et al., 2019).

  • Confusion matrix: It is a 2 × 2 matrix for binary classification. The predictions are categorized into four types: true positives, true negatives, false positives, and false negatives. True positives (TP) and true negatives (TN) are correctly identified predictions. False positives (FP) and false negatives are wrongly predicted instances (FN). All the other metrics such as accuracy, precision, recall, f1-score, and AUC can be calculated using confusion matrix.

  • Accuracy: The number of total predictions is correctly made by the classifier. The accuracy increases when the false positive and false negative instances decrease.

  • Precision: The precision identifies the number of positive instances correctly identified. For the child sexual abuse awareness dataset, the intermediate class is considered positive. The precision is high when false positive cases are minimum. It is denoted using the below equation.

  • Recall: Recall emphasizes on false negative cases. The recall increases when the number of false negative cases decreases. It is denoted using the below equation.

  • F1-score: F1-score combines recall and precision. F1-score is high when both recall and precision are high. It is denoted using the below equation.

    $$F1=2\times \frac{Precision\times Recall}{Precison+Recall}$$
  • AUC-ROC curve: True positive rates are plotted against false positive rates in ROC curve. The area under this curve is called AUC. The model performs well when the AUC is close to 1. If the AUC is 0.5, the model is unable to distinguish between the two classes.

  • Precision-recall curve: Precision values are plotted against the recall values in precision-recall curves. The area under this curve is called average precision (AP).

Machine Learning Models to Predict Child Sexual Abuse Awareness

Several heterogenous ML classifiers have been tested to predict child sexual abuse awareness in people. Further, the algorithms have been stacked at various levels to improve the efficiency and trustworthiness of models. The stacked architecture is depicted in Fig. 5. Initially, algorithms such as logistic regression, decision tree, KNN, naïve Bayes, and support vector machine have been tested (Hallajian et al., 2022). They were further ensembled to create the first stacked classifier (Chadaga et al., 2022). Four bagging and boosting algorithms, such as random forest, adaboost, XGBoost, and catboost, were also used for prediction (González et al., 2020). The above models were ensembled to create the second stacked classifier. To obtain maximum results, the first and second stacked models were ensembled to form the final stacking classifier. The final stacking classifier is suitable for prediction since it is an ensemble of many classifiers.

Fig. 5
figure 5

Custom stacked architecture

Python programming language was used for coding. Anaconda python distribution was installed, and the “Jupyter” integrated development environment was used to run the models. Important libraries such as numpy, pandas, sci-kit learn, Keras, seaborn, and matplotlib were installed to facilitate ML programming. The processor used was Intel(R) Core(TM) i5-10210U CPU @ 1.60 GHz 2.11 GHz with 8 GB ram. The operating system used was Windows 11. In this study, no graphical processing unit (GPU) was installed. The training and testing data ratio was 80:20. Logistic regression utilizes the sigmoid function to classify instances. It performs very well if a linear relationship exists between the two classes. The logistic regression was able to obtain a maximum accuracy of 88%. Decision trees use tree-branching structures to perform classification. Attributes are presented as internal nodes and the outcomes are represented using leaf nodes. They perform exceptionally well for nonlinear and high-dimensional data. The decision tree algorithm obtained an accuracy of 93%. Support vector machine (SVM) uses the concept of hyperplane for classification. A marginal line represents the two classes. The closest points near the marginal plane on each side are known as support vectors. The SVM was able to obtain an accuracy of 84%. Naïve Bayes theorem uses the Bayes conditional probability theorem for classification. The influence of every attribute on the result is considered equal in this technique. The naïve Bayes classifier was able to obtain an accuracy of 77%. The stacking methodology was used to combine the above classifiers. It is a popular ensemble technique which combines the existing classifiers to form a new model. It is generally reliable and efficient since it considers many models for prediction. The stacked model was able to obtain an accuracy of 94%.

Random forest is an ensemble of decision trees. It uses a concept called “bagging (combining multiple decision trees)” to obtain better performance. When the number of trees increases, the accuracy increases. Overfitting can also be prevented by using a large number of trees. The accuracy obtained by the random forest is 93%. Boosting is another technique used in classification which uses the concept of turning weak learners into strong learners (Mosavi et al., 2021). In adaptive boosting (adaboost), the wrongly classified instances are assigned higher weights. When the instances are passed to the next tree, the higher-weight instances are given more preference. The adaboost obtained an accuracy of 81%. Extreme gradient boosting (XGBoost) was developed to increase efficiency. It performs well on larger datasets. Other advantages include parallel processing and scaling. The XGBoost obtained an accuracy of 93%. Categorical boosting (catboost) is a boosting algorithm which performs well when there are a lot of categorical variables. It works well for smaller datasets too. The catboost obtained an accuracy of 93%. The bagging and boosting models were ensembled to create the second stacked model. The model obtained accuracy, precision, recall, f1-score, AUC, and average precision of 94%, 94%, 94%, 94%, 98%, and 97%, respectively. The first and second stacks were ensembled to further enhance the performance to form the final stack model. The final stacked model obtained accuracy, precision, recall, f1-score, and AUC of 94%, 94%, 94%, 94%, 98%, and 97%. Figure 6 depicts the confusion matrices obtained by all the stacked models. The figure demonstrates that the amount of true positive and true negative outcomes is extremely high. Hence, the models were able to generate good predictive performance. The AUCs and precision-recall curves of the three stacked models are described in Figs. 7 and 8.

Fig. 6
figure 6

Confusion matrices obtained by the stacked models for test dataset. a First stack, b second stack, and c final stack

Fig. 7
figure 7

ROC curves obtained by the stacked models. a First stack, b second stack, and c final stack

Fig. 8
figure 8

Precision-recall curves obtained by the stacked models for test dataset. a First stack, b second stack, and c final stack

Further, we used a deep neural network to predict child sexual abuse awareness prediction (Samek et al., 2021). The ratio of training, validation, and test data were 80:15:5. The batch size was set to 32. Two hidden layers consisted of four and two nodes, respectively. The activation function used for all the layers except the output layer was rectified linear unit (ReLU) (sigmoid activation function was used for the output layer). Binary cross entropy was used as the loss function. Adam was the optimizer used, and the number of epochs was set to 50. The DNN model obtained an accuracy of 89%. The accuracy curve, loss curve, confusion matrix, and ROC curve obtained by the DNN model are described in Fig. 9. The results obtained by all the ML classifiers are summarized in Table 4.

Fig. 9
figure 9

a Accuracy curve, b loss curve, c confusion matrix for test data, and d ROC curve

Table 4 Summary of the results obtained by the classifiers for the test dataset in classifying child sexual abuse awareness

Explainable Artificial Intelligence (XAI) to Interpret Results

Explainable artificial intelligence (XAI) is artificial intelligence in the form people can comprehend. It gives justifications for its choices and behavior patterns. This enables individuals to understand and trust what is going on, rather than feeling their data is being misused (Angelov et al., 2021). Model explainability is a must in many disciplines such as healthcare and engineering (Saraswat et al., 2022). The models must be trustworthy if they will be relied upon. XAI enhances the reliability, dependability, and integrity of real-time AI models. XAI has gained significance as more people have begun challenging the predictions made by ML algorithms. In this study, four XAI techniques have been utilized. They are SHAP, LIME, Eli5, and QLattice.

SHAP is a method for explaining ML models’ results based on the game theory concept (Gramegna & Giudici, 2021). It offers a way to calculate and show how the contributions of each attribute affect the prediction. SHAP values are computed for every attribute and their values. They represent the feature’s influence on the result predicted. For each row(tuple), a contribution made by the features is approximated. Figures 10 and 11 interpret the model explainability made by SHAP for the random forest classifier. The random forest obtained good results, and the SHAP libraries still need to be available for the stacked architecture. The SHAP beeswarm plot is described in Fig. 10 (a). The attributes are arranged based on their importance in descending order. The color blue indicates a lower value and red indicates a higher value. The vertical line divides the model prediction into binary classes. The beginner-level knowledge predictions are present on the left side and the intermediate knowledge-level predictions are on the right side. From the figure, it can be seen that the most important attribute is “Do you know what child grooming is?.” Most people with intermediate-level child sexual abuse awareness knew what child grooming was. The next important attribute is “Do you think children need post-abuse counselling for recovering?.” Most people with intermediate-level awareness knew what signs to look for if the child had been abused. The average impact on the model output magnitude is described in Fig. 10(b). The essential attributes are “Do you know what child grooming is?,” “Do you know what signs to look for to identify if your child has been abused?,” and “Do you think children need post-abuse counselling for recovering?.” SHAP can better comprehend the predictions for a specific person using the force plot. Figure 11 describes a SHAP force plot for a specific person whose knowledge level has been predicted as a beginner by the random forest model. Even though few features point predictions toward “intermediate,” important features such as “Do you know what child grooming is?,” “Do you know what signs to look for to identify if your child has been abused?,” and “Do you think children need post-abuse counselling for recovering?” shift the prediction toward the beginner. Similarly, SHAP force plots can be used to understand the reasoning behind predictions for every person.

Fig. 10
figure 10

SHAP interpretation. a Beeswarm plot and b mean bar plot

Fig. 11
figure 11

SHAP force plot for a particular instance

Marco Ribeiro developed the LIME model in 2016. LIME seeks to identify the most crucial attributes for a single prediction in a particular region of feature space (Nagaraj et al., 2022). LIME performs feature selection on the acquired data and generates a surrogate dataset using sampling. It also uses ridge regression after choosing the initial set of features. The model output matches existing information in the black box model. Figure 12 describes the LIME model interpretation for both beginner- and intermediate-level knowledge predictions. The red color indicates predictions toward beginner and the green color indicates predictions toward intermediate. Figure 12(a) describes the reasoning behind the person’s knowledge level predicted as intermediate by the classifier. Attributes such as “Do you know what child grooming is?” and “Do you think children need post-abuse counselling for recovering?” point to the same. Figure 12(b) describes the LIME interpretation for a person predicted as beginner. This is because features such as “Do you know what child grooming is?” and “Do you know what signs to look for to identify if your child has been abused?” point to the same.

Fig. 12
figure 12

Model explainability using LIME. a Intermediate child sexual abuse knowledge; b beginner child sexual abuse knowledge

Another XAI method to evaluate and analyze model output is Eli5 (Islam et al., 2022). It enables researchers to comprehend black box models easily. Eli5 uses the concept of decision trees and gini index. Figure 13 depicts the model interpretation made by Eli5. Each feature’s contribution value is also provided. According to Eli5, the most important attribute is “Do you know what signs to look for to identify if your child has been abused?.” The bias parameter is also taken into consideration by the Eli5 model.

Fig. 13
figure 13

Model explainability using Eli5 for a particular instance

QLattice is an interpretable framework for ML models (Wenninger et al., 2022). QLattice explores many models before settling on the one that best fits the problem. The programmers must first set up a few parameters, including input properties, variables, and other labels. In QLattice, the variables are named registers. Models are constructed after the registers are specified. The newly generated model is known as “QGraph” which consists of nodes and edges. Every node has an activation function and a weight assigned to it. QLattice is implanted using the “Feyn” library in Python (Riyantoko & Diyasa, 2021). Figure 14 depicts a QGraph generated by the QLattice model. Green nodes represent inputs and output. The nodes in white with pink borders depict an interaction. Interactions process input values, construct functions with various transformations, and predict the final output. According to QLattice, the most important attributes are “Do you know what child grooming is?,” “Children are safe among family members such as grandparents, uncles, aunts, cousins,” and “Do you know what signs to look for to identify if your child has been abused?.” The transfer function used by the QLattice model is described below.

Fig. 14
figure 14

Model interpretation using QGraphs

$$\begin{aligned} &\text{logreg} (2.1\cdot(0.59Doyouknowwhatchildgroomingis?-0.85)(1.7\mbox{"}childrenaresafeamongfamilymemberssuchasgrandparents,uncles,aunts,cousins\mbox{"}\\&-2.9Doyouknowwhatsignstolookfortoidentifyifyourchildhasbeenabused?+3.2)+1.9) \end{aligned}$$

The most important features according to XAI techniques were “Do you know what child grooming is?” and “Do you know what signs to look for to identify if your child has been abused?.” The above two questions were highly useful for the classifiers to distinguish between the two classes.


One of the major public health issues is child sexual abuse. Many children are afraid to report it or never report it (Russell et al., 2020). This can have long-term behavioral, mental, and physical problems in children. Many people, including parents, are not fully aware of this topic (Assink et al., 2019).

Hence, ML has been utilized to predict child sexual abuse awareness in people. A questionnaire dataset which contained nine attributes was considered for this study. The knowledge level of people regarding child sexual abuse has been categorized into two classes (beginner/intermediate). To identify the early warning signs, people with beginner-level knowledge must be made more aware of this issue. After initial data preprocessing, mutual information was utilized to determine the most important attributes before model training. Various heterogeneous algorithms, including deep neural networks, were utilized for accurate prediction. All the models performed exceptionally well except the naïve Bayes classifier. To boost performance, the algorithms were stacked on several layers. The final STACK obtained a maximum accuracy of 94%. The accuracies obtained by the logistic regression, decision tree, KNN, support vector machine, naïve Bayes, first stack, random forest, adaboost, xgboost, catboost, and second stack were 88%, 93%, 93%, 84%, 77%, 94%, 93%, 81%, 93%, 93%, and 93%. The deep neural network was able to obtain an accuracy of 89%. The classifiers also obtained excellent precision and recall values. The false negative and false positive predictions were comparatively lower in this study. XAI techniques such as SHAP, LIME, Eli5, and QLattice have been used to understand and interpret the predictions. SHAP uses Shapley values to decipher the predictions. According to SHAP, a lot of people with beginner-level knowledge did not know what child grooming is. LIME provided similar explanations based on local interpretation (individual predictions). A lot of people with beginner knowledge level did not know how to identify if their child had been abused, according to Eli5. QLattice considered three questions to interpret the predictions.

The perpetrator is already known to the child’s family on most occasions (Tener et al., 2021). Parents often think that their children are safe with them. However, this is not always true, and children must be monitored carefully. Because it occurs during the early stages of living, sexual abuse by a family member is traumatizing and can cause children to lose trust in people. From our research, it can be seen that many people think that male children are safe from child sexual abuse (Sivagurunathan et al., 2019). However, this is not true and children of both genders are vulnerable to child sexual abuse. The number of male child sexual abuse cases is more in countries such as India (Aarambh India). A lot of people did not know about “child grooming” (Ringenberg et al., 2022). Many offenders try to befriend a child to exploit them sexually. Child grooming can happen online, too. A child is groomed when an individual establishes a friendship, trust, and emotional connection with them in order to control, take advantage of, and abuse them. Groomed kids and teens run the risk of experiencing sexual abuse, exploitation, and trafficking. Speaking with children is extremely important to understand whom they are spending their time with. According to our research, many people do not know what to look for if their child has been abused (Xiang et al., 2023). Many warning signs can be identified such as bleeding, bed wetting, depression, phobias, and others (Xiang et al., 2023). According to our study, most people agree that children require counseling if sexually abused. Counseling and therapy are vital since child sexual abuse can have long-term behavioral, psychiatric, and mental effects (Tichelaar et al., 2020). Conditions such as PTSD and depression are common in child sexual abuse victims. Further, most people agree that legal action must be taken against the offenders (Tener et al., 2021). Parents and care takers must report the incident as soon as possible.

Child sexual abuse awareness should be increased among people. Parents and caretakers must learn about this sensitive topic beforehand. They should also know the child’s everyday routine. It is also important to know all the people who are involved with the child. They should be able to read the warning signs which the children present. Schools and other institutions can also educate people about child sexual abuse.

There are a few limitations to this research. Since the dataset was obtained from a public repository, very little is known about the dataset. The dataset description does not provide any information regarding the methodology or process of data collection. It is important to describe how the questionnaire was administered, whether it was conducted online or in person, and any sampling techniques employed to ensure the representativeness of the participants. The dataset description does not mention any demographic information about the participants, such as age, gender, or geographical location. These demographic factors could have an impact on the knowledge level of adults regarding child sexual abuse. The methodology behind classification of people’s knowledge into beginner and intermediate is not known. Only supervised learning algorithms were used in this study. Clustering and reinforcement learning algorithms can be used in the future. The deep learning models run faster when graphical processing units are used. However, they were not utilized in this study. Future work can include a cloud-based system to store data and ML models. Various companies such as Amazon, Microsoft, and Oracle support cloud-based infrastructure. Other datasets, such as images and videos, can be collected to identify the behavior patterns of offenders using deep learning algorithms. Neural networks can analyze the patterns and behavior of the child abusers. More number of questions can be included in the questionnaire. Subjective questions can also be included. Other deep learning algorithms, such as transfer learning models, can also be considered. The transfer learning algorithms perform better than the deep learning classifiers. In this study, only bar graphs and Pearson’s correlation matrix have been utilized for visualization. Other related plots could be added in the future. Bias could arise from various sources, such as the survey design, respondent demographics, or data collection process. Appropriate steps should be taken to reduce the bias obtained during the machine learning process. In this study, we split the data into training and testing. However, validation data was not considered. The models can also be deployed in real time in various facilities such as schools and workplaces to educate children, parents, and other people. A prospective survey can be conducted to validated the results on a new population. Awareness campaigns can be conducted to make the people more aware about child sexual abuse.


Child sexual abuse awareness is required to protect children from sexual offenders and perpetrators. However, many people, including parents, do not have enough knowledge about child sexual abuse. Hence, ML models have been used to predict the knowledge level of people regarding sexual abuse in children. A questionnaire dataset consisting of eight questions was considered in this study. A custom multi-level stacked classifier was used to predict child sexual abuse awareness. The ensembled stacked model obtained accuracy, precision, recall, f1-score, AUC, and average precision of 94%, 94%, 94%, 94%, 98%, and 97%, respectively. Four XAI methods such as SHAP, Eli5, LIME, and QLattice were used to interpret the results. The models can be used in real time to increase awareness among people regarding sexual abuse in children. The classifiers can be used in schools, day cares, work places, play grounds, hospitals, and other relevant settings. The ML and XAI techniques can be used in prevention, early intervention, or the well-being of children and communities.