1 Introduction

Crime has indirect losses its victims face, including pain, suffering, decreased quality of life, psychological distress, and several physical and spiritual impacts (Reyns et al. 2024). According to the Global Burden of Disease study, more than 400,000 people die from homicide yearly, and most of the victims are younger than 50 years of age, (Roser and Ritchie 2019) around three times the number killed in armed conflict and terrorism combined. CSIS (Center for Strategic and International Studies) and McAfee explore the growth of cybercrime from an economic impact perspective. It is estimated that cybercrime costs the world over $1 trillion, or 1% of the global GDP (McAfee 2020). In India, according to the National Crime Records Bureau (NCRB), under violent crimes, 28,522 cases of murder, 107,588 cases of kidnapping and abduction were registered in 2022, and the rate of crime per lac women and children population is 66.4 and 36.6, respectively (Ministry of home affairs 2022).

The dramatic rise in urban crime rates has become one of the most pressing public security challenges, affecting many facets of social sustainability, such as jobs, livelihood, health care, and education (Zhu et al. 2022). As a result, it is necessary to construct a predictive model capable of identifying high-crime locations and detecting trends in crime occurrence in such areas to allocate precious resources and engage in crime prevention and reduction initiatives. Observing the enormity of the adverse effects of crime, the researchers are focusing on crime analysis (Adeyemi et al. 2021), crime patterns (van Dijk et al. 2022), crime detection or prediction, crime hotspots (Mondal et al. 2022), CCTV surveillance (Piza et al. 2019). Besides, research is ongoing to study the impact of fear of crime on citizens' lifestyles (Ceccato et al. 2023) and criminology to prevent erroneous activities (Wikstro die m and Kroneberg 2021). The scale of crime represents a considerable challenge for law enforcement agencies, which need a coordinated approach supported by Artificial Intelligence (AI) to predict and prevent crime. It will help them design optimal police patrol strategies (Basilio et al. 2022). Basilio et al. proposed a method with the help of POMETHEE software for effectively selecting a policing strategy to lower instances of crime in certain regions (Basilio et al. 2021). The introduction of advanced domains like AI, Deep Learning, Machine Learning, Internet of Things (IoT) devices, Geographical Information Systems (GIS), Fog, and Cloud computing has significantly contributed to developing intelligent monitoring systems (Zantalis et al. 2019). To increase public safety and reduce economic loss, crime prevention is a crucial task that can be achieved by predicting a crime event (Shah et al. 2021). Crime events in a particular region depend upon several factors (Wheeler and Steenbeek 2020). With the help of AI and other incorporating technologies, we can predict and prevent crimes.

Social media platforms such as Twitter, Facebook, and Foursquare are becoming primary data sources in the digital city era (Kaur and Saini 2022). Social media data provides excellent insights into human behavior patterns, including not just individual daily actions but also a person’s emotional shifts and opinions about events. In a study performed by Wang and Li in the London region, their research was undertaken on using social media data (i.e., Twitter and Foursquare) to evaluate human mobility incurred violent crimes and on utilizing solely geolocational information (Wang and Li 2022). Furthermore, Zhu et al. created a prediction model based on K-means clustering, signal decomposition, and neural networks to detect crime distribution in metropolitan regions and effectively estimate the variation tendency of the number of crimes in each location. A high-precision anticrime information support system is built by integrating temporal and geographical effects (Zhu et al. 2022). Identifying criminal needs informs law enforcement decision-makers on the most effective police measures to reduce crime rates. Basilio et al. proposed a policing strategy choice model using Latent Dirichlet Allocation (LDA) integrated with ELECTRE I (Basilio et al. 2020). This approach helps identify criminal activities and their features using police reports from emergency response databases. In addition, to predict crime locations and trends, suspicious human activity can also be monitored by human action recognition through video surveillance (Pareek and Thakkar 2021).

The scientometric analysis has been performed in different areas to analyze the emerging trends of AI or Information and Communication Technology (ICT). The most popular scientometric research works and domains are as follows: ICT-assisted disaster management (Sood and Rawat 2021), AI and industry 4.0 in healthcare (Sood et al. 2022), the role of AI in cyberbullying or cyberhate detection (Kaur and Saini 2023), bibliometric analysis on policing strategies (Basilio et al. 2022), 3-D printing technology (Sood et al. 2024), and emerging trends of ICT in airborne disease prevention (Sood et al. 2023). Scientometric analysis studies have been published on spatial crime analysis, cybersecurity, and other contexts, however, there is a lack of research on crime prediction and pattern analysis (CPPA). Since crime-related research is of multidisciplinary interest other than criminology, subjects like social science, arts, humanities, psychology, engineering, computer science, economics, demographics, and others, (Ward et al. 2019) should be taken into account. The safety of a community is one of the top concerns, and the usage of crime prediction techniques and pattern analysis has become indispensable all around the world. Motivated by the significance of crime-related research, a scientometric analysis was conducted to analyze the bibliographic data. This in-depth analysis assisted in revealing the advancement of all the branches related to crime studies and identifying important publications and factors that must be considered before carrying out any research. Besides scientometric analysis, this study also makes an effort to categorize the research themes based on clustering using the keywords and other feasible correlation methods to draw out a complete picture recognizing the various emerging areas of research that are performed in the field of CPPA with the inculcation of AI.

1.1 Objectives

This review primarily presents a qualitative and quantitative evaluation of research activities on various facets of published research studies related to crime(s) over the last decade. This study examines (i) annual growth in publications volume and prominent subject categories, (ii) publications and journals citation analysis, (iii) the distributions of research power in terms of countries, organizations, and authorship, as well as (iv) the core research topics and trends.

1.2 Organization of paper

This paper is organized as follows. Section 2 is about the primary research work conducted related to CPPA using AI. Section 3 discusses the database used for bibliographic analysis and visualization tools. In Sect. 4, there are seven subsections where Sect. 4.1 elaborates on different types of documents, their yearly progression, and the contribution of various subject categories. Section 4.2 presents the top publications, journal citation networks, and Sect. 4.3 enlightens the significant contributions and collaborations by countries in research related to CPPA. While Sect. 4.4 highlights the influential organizations based on average normalized citations and their connections with other institutions. Section 4.5 represent major contributing authors and the collaboration among various researchers. Section 4.6 elaborate on trending topics and their evolution over different time phases, and Section 4.7 describes the connectivity of country-journal-keywords and journals-authors-keywords. Section 5 discusses the themes of AI in CPPA research, keyword evolution, influential journals, and publications. The last three sections are about Sects. 67, and 8 of this study.

2 Related work

Different AI-based models are employed for crime prediction depending on the crime category, datasets, and prediction criteria, and they are improving by the day. Shah et al. proposed a computer vision and machine learning strategy for monitoring crime hotspots using CCTV as well as recognizing persons based on voice notes. The technology can help police officers prevent crimes from occurring (Shah et al. 2021). Solomon et al. proposed a novel framework DeePrison for predicting burglaries using numerous contextual cues. It can capture the relationships between burglaries and their many contextual elements in terms of time and geography, allowing it to deliver reliable burglary prediction (Solomon et al. 2022). Gandapur developed a surveillance system that employs video records from surveillance cameras to track and identify suspicious/criminal behaviors in public places, and the identified activities are notified to a security system. The suggested architecture comprises many phases before detection, including video preprocessing, feature extraction, identification, and detection. The system receives video input from several security cameras, and frames are retrieved from the video to train a deep learning model composed of Convolution Neural Network (CNN) and BiGRUs (Gandapur 2022). Numerous algorithms, models, and techniques are deployed for CPPA research.

On the other hand, few researchers have surveyed the published work by conducting either bibliometrics or systematic reviews (refer to Table 1). Gian Maria Campedelli has conducted a study to review the intersection of research on Crime and AI using the Scopus database. It investigated the keyword co-occurrence graphs, co-authorship, and country-level collaboration. This study concluded that Machine learning is trending in crime-related studies, and other emerging technologies are AI, deep learning, data mining, classification, and big data. This research has a few limitations, such as the query string being selected from limited keywords and the evolution of emerging research themes not being presented in the study. Moreover, a detailed analysis is missing related to the publications (Campedelli 2021). In a bibliometric analysis by Roy and Chowdhury, they explored the application of Geographic Information Systems (GIS) in Spatial Crime Analysis using Scopus metadata of the last three decades. Using Biblioshiny, the top cited documents, publishing journals, influential authors, countries, institutes, and their collaboration have been analyzed. In keyword analysis, it has been explored that Crime prediction and analysis using machine learning is the most advanced topic of research (Roy and Chowdhury 2023).

Table 1 Summary of the related studies

Furthermore, Mandalpu et al. did a systematic review of crime prediction methods, and various machine learning and deep learning algorithms have been explored in this survey. Machine learning algorithms such as XGBoost, Logistic Regression, Random Forest, Decision tree, and others are discussed based on their efficacy. Deep learning-based algorithms like Convolution Neural Network (CNN), K-nearest Neighbor (KNN), Naïve Bayes (NB), Support Vector Machine (SVM), Gated Recurrent Unit networks (GRU), Binary Encoder Representations from Transformers (BERT), and Long Short Term Memory (LSTM), used in various studies, their performance, and corresponding datasets used have been explored (Mandalapu et al. 2023). In a study by Dakalbab et al., various categories of crime and their analysis type, depending on factors like human behavior, spatial, crime, and neighborhood, are discussed. Supervised and unsupervised techniques and tools used in crime prediction studies and utilized datasets have been discussed (Dakalbab et al. 2022).

We have conducted a scientometric analysis of the studies of the last decade, and the evolution of keywords predicts the prominent algorithms, techniques, and research interests and their shift over the decade. The top cited manuscripts, active researchers, Institutes, countries, and their collaboration in the CPPA domain are determined.

3 Analytical methodology

The methodology section discusses the source of the dataset adopted for scientometric analysis. It elaborates on the tools utilized for performing the study. The data collection and cleaning explain the search string employed for fetching the dataset and it lists the computational tool preferred for each type of data analysis.

3.1 Data source and tool selection

The most extensively used bibliographic databases are Thomson Reuters’ Web of Science, Elsevier Scopus, and Google Scholar. There are several comparative analyses on journal coverage by three data sources in different subject areas and it found that Scopus is best suited for this analysis because it has a sufficient number of publications that make analytic results more accurate and comprehensive (Kaur and Sood 2019). Moreover, Scopus is a multi-disciplinary repository with a clear framework for categorizing subject areas (Sood and Rawat 2021). The Web of Science has high quality, but the number of publications is limited (Sood et al. 2021). However, Google Scholar also has several hindrances, such as an inaccurate number of citations (Martín-Martín et al. 2021).

The most commonly used software tools for constructing and visualizing the bibliometric network are Bibexcel, CiteSpace, CitNetExplorer, SciMAT, Sci2 Tool, VOSviewer, Biblioshiny, etc. In this study, VOSviewer (Eck and Waltman 2010) is used to construct, visualize, and explore a bibliometric network from an extensive database. VOSviewer is a freely available software package offering great visualization and advanced functionality like text mining and supporting efficient front-end graphics with prominent features (Moral-Muñoz et al. 2020). Bibliometrix (version 1.7), an R-Tool of R-Studio (Version 3.6.1), is employed in this study for comprehensive research mapping analysis, where Biblioshiny (bibliometrix’s shiny interface) provides a web interface (Aria and Cuccurullo 2017). Finally, any other graphical visuals in the article were created using Microsoft Excel and Word.

3.2 Data collection and cleaning

All peer-reviewed scientific publications relating to CPPA research were retrieved from the Scopus database. A search string was formed by incorporating alternative terms and synonyms, and then combined using Boolean expressions ‘OR’ and ‘AND.’ The following general search string was created for the identification of primary studies:

figure a

The results were filtered by publication from 2012 through 2022 to retrieve results for the last 10 years. The complete metadata tallying a count of 10,834 was compiled and manually exported on March 13, 2022. Afterward, the publication content was reviewed by reading the author’s keywords, title, and abstracts for all publications available on Scopus. The data obtained is filtered by eliminating erroneous data entries by using the NOT operator for other research domains like Crimea*, and others, which have “crime” as a substring in their keywords. The International Organization for Standardization (ISO) defined the Digital Object Identifier (DOI) to uniquely identify items in virtual worlds, which is critical in filtering out unique articles. Data covering legitimate publications suitable for research is collected after cleaning the entries with duplicate or blank DOIs (Sood et al. 2021). As an outcome of the process, 9,861 genuine and appropriate papers were taken for investigation.

For the AI based research, keywords have been manually selected. We first downloaded a few of the most cited papers using the AND operator between the “artificial intelligence” and the above-mentioned search string. After carefully analyzing the documents and their closely related documents, the final set of keywords is finalized. Moreover, various studies suggest that research related to AI uses keywords like AI, machine learning, deep learning, robotics, automation, big data, neural network, Bayes, text mining, natural language processing, data mining, soft computing, fuzzy logic, biometrics, geotagging, wearable, IoT, Internet of things (Mustak et al. 2021). The huge dataset with a count of 2490 is captured, and after cleaning and removing duplicates, only 2396 documents met the selection criteria.

figure b

3.3 Data analysis

In scientometric analysis, we mapped the scientific area using two platforms: VOSviewer and Biblioshiny. The bibliometric records were imported and extracted in Comma-separated values (CSV) format, and further algorithms were applied to analyze the critical research areas, landmark publications, authors’ networks, and other institutional collaborations. The scientometric analysis performed and algorithmic computational tools applied are listed in Table 2.

Table 2 List of scientometric analyses performed in this study

4 CPPA scientometric analysis

A detailed scientometric analysis of CPPA is discussed in this section:

4.1 Publications and subject dispersion

The progress of any research field is determined by the number of publications and their annual increment. The literature data used in this study were downloaded from Scopus in the time frame 2012–2022. In 10 years of literature of 9861 publications, 12 document types were found.

The most frequent type was an article (7107), comprising 72% of total publications, and conference papers (2069) accounting for 21%. Other document types include book chapters (309), books (31), reviews (261), editorials (24), erratum (14), notes (23), short surveys (13), letters (4), and data papers (3). Table 3 lists the frequency and proportion of each data type.

Table 3 Types of documents retrieved

4.1.1 Yearly progression of publications

Figure 1 plots the annual trend of crime-related publications, showing a percentage increase of 148% (from 582 yearly outcomes to 1444) over the past 10 years. From 2012 to 2021, research shows slow growth, but from 2016 onwards, more and more researchers demonstrated their interest, which led to a jump in the number of publications in 2017. The year 2013 was an apex year with 17,187 citations from 666 publications. In 2018, there were 1143 publications with 12,914 citations, and in 2021, 2268 citations were received for 1444 publications.

Fig. 1
figure 1

Annual trend of crime-related publications and citations

4.1.2 Contribution of various subject categories

For subject categorization, the All Science Journal Classification (ASJC) schemeFootnote 1 is used to classify serial titles by in-house specialists when the serial title is first set up for Scopus coverage, depending on the publication’s objectives and scope, as well as the content it publishes. Moreover, a publication may lie in more than one subject domain. Figure 2 represents the number of publications according to the major subject categories in which the research related to crime was carried out from 2012 to 2022. More than 75% of publications were contributed from Social Science, Medicine, Computer Science, and Psychology subjects, and the number of publications in Computer Science and Social Science quadrupled (from 136 to 530) and doubled (from 232 to 422) over a decade, respectively. In 2012, the publication proportion in Medicine and Social Science was a maximum of 232 publications each, but its percentage contribution slowly declined in 10 years. On the other hand, the number of publications by Decision Sciences is multiplied by eight, resulting in a gradual rise in its total contribution. The number of publications in Psychology and Arts and Humanities also doubled from 187 to 360 and 39 to 65, respectively. The Engineering field accounts for an average of 4.9% of the total. Another 7% of the contribution is by summing up publications in subjects of Decision Sciences, Environmental Science, Economics, Econometrics & Finance, Earth & Planetary Science, and Agricultural & Biological Sciences. Physics & Astronomy, Biochemistry, Genetics & Molecular Biology, Pharmacology, Toxicology & Pharmaceutics, Nursing, Multidisciplinary, Material Science, and Neuroscience jointly contribute 3%.

Fig. 2
figure 2

Number of publications according to the major subject categories

4.2 Citation analysis

4.2.1 Publications and citations

For qualitative analysis and evaluation of publication productivity, parameters like normalized citations or citations based on the number of years of publication are considered. Normalized Citations (NC)Footnote 2 of a document is calculated by dividing the total citations of a document by the average number of citations of all documents published in the same year and included in the database provided to the VOSviewer (Basson et al. 2021).

$$\text{NC}=\frac{\text{N}^o\text{. } \text{Citations}}{\frac{\sum_{i=1}^{\text{m}}{\text{N}^o\text {. } \text{Citations}}_{\text{i}}}{\text{m}}}$$
(1)

No. Citations  is the  number of citations of a document; m  is the total number of documents published in the same year of the selected document.

Another parameter considered is the Citation Indicator (CI), as the number of citations depends upon the publication’s longevity (number of years of existence) (Kaur and Sood 2019). CI is the ratio of the number of citations of a publication to the citation window, where the citation window is the number of years between the publication and this scientometric analysis.

$$\text{CI }= \frac{\text{number of citations per publication}}{\text{citation window}}$$
(2)

where citation window  is the year of evaluation − year of publication.

To identify the most influential publications, 3291 documents meet the requirement of a minimum of 10 citations; the top 30 have been selected, and their productivity is measured based on NC and CI. The most cited publication is “The epidemiology of traumatic event exposure worldwide: results from the world mental health survey consortium” by Banjet C. This article was published in Psychological Medicine in 2016 and got 412 citations. Even when the normalized citation is considered, it has the highest value of 22.57. Next, the paper “Initial evidence on the relationship between the coronavirus pandemic and crime in the United States” by Ashby M.P.J., published in Crime Science in 2020, has 21.67 normalized citations. But when we consider CI, "Electricity theft detection in Ami using customers’ consumption patterns”, published in IEEE Transactions on Smart Grid by Jokar P., has the second-largest value. A few publications ranked according to Normalized Citations related to this field are listed in Table 4. In addition, the CI value is also taken into account to analyze the significance of the publication. Much deviation is not seen in the ranking when comparing NC and CI values.

Table 4 Publications ranked according to normalized citations

4.2.2 Journals according to citations

This section will concentrate on prestigious journals that have been heavily referenced, making them the pinnacle of CPPA studies. This study shortlists the journals as having a higher level of credibility because their articles are peer-reviewed and only publish high-quality papers. Using VOSviewer, a journal citation network is constructed. Only the journals satisfying the minimum number of publications (p) are chosen out of all the total journals (n). The journal citation network is constructed using a selected number of journals (j). The nodes represent the journal, its size depends upon the number of documents, and the link shows citations to two journals. The link strength represents the total number of references to all other journals (included in the database) it has made. Out of 3223 sources (n), 74 sources meet the requirement of having a minimum of 20 publications (p). Among them, the 30 top publications (j) have been selected. Figure 3 shows that the Journal of Interpersonal Violence has a maximum number of documents with a value of 359, with maximum links of 27, and a citation value of 4267. The total link strength of the Journal of Interpersonal Violence is 205. Child Abuse and Neglect have the second largest number of documents and citations, with values of 103 and 2736, respectively. International Journal of Offender Therapy and Comparative Criminology also has 103 publications and 1095 citations, followed by PLoS ONE (92 documents and 1933 citations).

Fig. 3
figure 3

Journal citation network (n = 3223, p = 20, j = 30)

Based on average citation per publication, the top 15 journals with their impact factors are shown in Table 5. Average citation of a journal is the total number of citations of a journal divided by the total number of publications in that journal included in the database for analysis (Manu and Akotia 2021), i.e.

Table 5 Journals with a maximum average citation
$$\text{Avg. citation} = \frac{\text{total citations of a journal}}{\text{total number of publications of a journal}}$$
(3)

On an average citations basis, the Journal of Adolescent Health, with average citations of 42, is the leading journal, followed by Criminology (36.8), Computers in Human Behavior (31.8), Journal of Abnormal Child Psychology (26.9), Child Abuse and Neglect (26.6), Journal of Youth and Adolescence (25), Justice Quarterly (22.4), Psychological Assessment (21.7), PLOS One (21) and Journal of Quantitative Criminology (20.7), respectively.

4.3 Countries according to publications and collaboration

Figure 4 shows a Geographical heat map where the color intensity of a country is directly proportional to the number of publications. In terms of publications, the United States has the highest number of publications, i.e., 4290, the United Kingdom (1855), China (826), and India (655).

Fig. 4
figure 4

Geographical heat map based on publications and the country collaboration network

It also represents the country’s collaboration network constructed using Biblioshiny, showing the countries’ more than ten co-authored documents and their association with other countries. It reveals that the United States has made a maximum number of collaborations with 75 links and 1273 total link strength, followed by the United Kingdom, making 66 links and 807 total link strength. Australia has collaborated with 54 links and 406 total link strength, whereas Spain has 53 links with 243 total link strength. The Netherlands publishes the same number of documents as Spain, with 50 links and 336 link strength. The highest collaboration is made between the United States—China with a link strength value of 157, then the United Kingdom—United States (152), the United States—Canada (131), the United States—Australia (84), the United Kingdom—Australia (61), the United Kingdom—Canada (59), the United States—Netherland (52), respectively.

4.4 Prime affiliations and collaboration network

According to their average normalized citation (ANC) rank, the most productive and influential organizations are listed in Table 6. A small number of publications with a large number of citations might exaggerate a journal’s total number of citations. So, ANC is computed for quantitative and qualitative analysis of research activity (Manu and Akotia 2021). Older documents have had more time to receive citations than recent ones corrected by Normalization.

$$\text{ANC }= \frac{\sum_{\text{i=1}}^{\text{p}}{{\text{NC}}}_{\text{i}}}{\text{p}}$$
(4)

where p is the total publications by an organization included in the data.

Table 6 Most influential organizations

Out of 19,440 organizations, 29 have been selected with a minimum of 10 documents. The Sam Houston State University, the United States provides the highest number of published documents, with 29 publications, followed by the School of Criminal Justice, Michigan State University, East Lansing, MI, United States, and the School of Criminal Justice, University of Cincinnati, Cincinnati, OH, the United States with 20 publications each, and Arizona state university, the United States with 19 publications. Maximum ANC value is reported by Public Safety Canada, Ottawa, ON, Canada, and the Department Of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden having a value of 3.8 and 2.9, respectively. Then Carleton University, Ottawa, ON, Canada, and George Mason University, United States have approximately the same values, 2.0 and 1.9, respectively.

4.4.1 Collaboration among various organizations

For the co-authorship network among various organizations, the fractional counting method is used to avoid the effect caused by a small number of publications responsible for a large number of co-authorship links and to obtain clear information. Figure 5 depicts the overlay visualization indicating the average publication year of the document in which the keyword occurs is published by the respective organizations (where total number of organizations = 19,454, minimum documents per organization = 5, minimum citations of organization = 5, organizations meeting threshold = 163, organizations collaborated = 98). Florida State University, Tallahassee, United States has made a maximum number of collaborations with nine links, succeeding King Abdulaziz University, Jeddah, Saudi Arabia, and the Sam Houston State University, Huntsville, TX, United States with eight links each. University of Cincinnati, Cincinnati, OH, United States; the University of Texas at Dallas, Richardson, TX, United States; Arizona State University, Phoenix, AZ, United States; Iowa State University, Ames, IA, United States; School of Criminal Justice, University of Cincinnati, Cincinnati, Oh, United States; Wayne State University, Detroit, MI, United States; Arizona State University, United States; Vu University Amsterdam, Netherlands; Netherlands Institute for the Study of Crime and Law Enforcement, Netherlands; Georgia State University, the United States; School of Criminology and Criminal Justice; Northeastern University, the United States, and Florida State University, United States are few more prime institutes that collaborated with other institutes in this research domain.

Fig. 5
figure 5

Overlay visualization of organization co-authorship network

4.5 Influential authors and co-authorship network

Further, the study focuses on the most influential authors and their co-authorship patterns. The collaboration of various authors having a minimum of 5 publications with ten citations is considered for analyzes, in which 606 meet the threshold, and 100 were selected, as shown in Fig. 6. The size of the circle represents the total number of publications by an author, and the link is to depict co-authorship between them. The strength of the link shows the number of papers they co-authored. The total link strength of an author is the total of all his co-authorship link strengths. Li X. has the maximum number of publications with a value of 39. The other active authors are: Wang Y. (39), Liu Y. (35), Wang X. (34), Zhang Y. (34), Wang Z. (33), Zhang H. (33), Piquero A.R. (31), Liu X. (30), Liu L. (29), Zhang X. (29), and Adresen M.A. (29). In terms of average citation per publication, Belsky D.W. has the highest value, 64.2, while Cao Y. (59.87), Fazel S. (52.81), Capsi A. (49.81), Moffitt T.E. (49), Fisher H.L. (44.45), Babchishin K.M. (44.26), Arseneault L. (43.86), Tremblay R.E. (33.27), Brendgen M. (33.25), Larrson H. (32.25), and Lichtenstein P. (31.78).

Fig. 6
figure 6

Authors’ co-authorship network

Analyzing the collaboration of authors, the highest total link strength is of Arseneault L. (59) followed by Kessler R.C. (53), Sampson N.A. (51), Stein M.B. (49), Ursano R.J. (49), Moffitt T.E. (48), Caspi A. (47), Wang Z. (47), Rosellini A.J. (39), and Fisher H.L. (38). From the co-authorship network, it is seen that intercountry collaborations are more likely to occur than international collaborations.

4.6 Key thematic topics and evolution

Keywords represent the contents of publications. Analytic methods are used to locate and track hot topics, build keyword similarities, and integrate clustering (Hailin and Chunpei 2022). The evolution of the research field can be elaborated by analyzing trending topics. For this purpose, the whole timespan is classified yearly, and the most prominent keywords and their evolution are shown in Fig. 7. Deep learning, machine learning, predictive analysis, fraud detection, and feature extraction keywords in CPPA have been trending topics in the last 2 years.

Fig. 7
figure 7

Trend topics and their evolution over a decade

4.7 Three field plot

Another critical feature of Bibliometrix is the ability to plot a Sankey diagram to visualize numerous characteristics simultaneously. Figure 8 shows a three-field plot (Sankey Diagram) that lists the countries of the corresponding authors (left), the journals in which their articles are published (center), and the author keywords (right side) and their association with each other. As shown in the figure, the majority of the manuscripts from the USA are in the Journal of Interpersonal Violence, with most of the work done based on the keyword “victimization”.

Fig. 8
figure 8

Country-journal-keyword association chart

Publications from China are in the journal Lecture Notes in Computer Science, and this journal has publications related to the keywords “deep learning”, “crime”, and “risk assessment”, etc., and the majority of publications from India are made in Advances in Intelligent Systems and Computing. Countries, their major publishing journals, and research domains can be visualized in Fig. 8. It also provides an idea that on which research domain the influential journals are publishing their documents.

Figure 9 demonstrates the association between prominent authors, their crucial research topics, and the most relevant journals. Most publications related to CPPA by Olver M.E. and Hanson R.K. were on “recidivism” and “risk assessment” and were published in Sexual Abuse: Journal of Research and Treatment. Moving further, the research topics of Zhang J. were “machine learning”, “crime prediction”, “deep learning”, “fraud detection”, and “bullying”. Also, other authors working in a similar domain, like Wang X., Wang Z., and Zhang X., published their major manuscripts in Lecture Notes in Computer Science. In addition, the publications in the Journal of Criminal Justice were from Beauregard E., Walter G.D., and Piquero A.R. Beaver K.M. performed research on “crime”, published majorly in IEEE Access. It provides a clear insight into authors and their respective research domains.

Fig. 9
figure 9

Journal-author-keyword association chart

5 The amalgamation of CPPA and artificial intelligence

Even though this field of study is still in its early stages, research at the intersection of AI and crime is progressing, not only in terms of practical applications, use cases, and targeted experiments, but also in terms of mobilizing the scientific community toward a fair and non-discriminatory use of AI systems in the real world. Nevertheless, while the quantitative change in the study of crime is undeniable, academics have not yet scanned the scientific output that applies AI to analyze crime-related problems. Because of the combined influence of various events and phenomena connected to the study of AI technology and statistical learning during the last 10 years, the success of algorithms built to learn existing patterns in data, without being specifically programmed, to do so has been remarkable. AI is becoming more crucial to corporate operations since traditional black-box technologies, such as neural networks, provide solutions that are difficult to comprehend and manage. Association rules provide tremendous interpretability of the findings and the model, making them extremely simple and extensible, and they must deal with information from social networks. In addition to interpretability, the promise of association rules in Big Data-framed social media mining solutions rests in the ability to apply pattern mining algorithms without pre-labeled datasets (Diaz-Garcia et al. 2022). This section will explore the study related to CPPA incorporated with AI.

5.1 Major contribution by countries, institutions, and journals

2396 documents published over the last decade were analyzed to find the AI contribution to CPPA. After evaluating the data statistically, it is depicted that the United States of America makes the maximum contribution in publications related to CPPA with AI (1590), followed by India (1388) and China (1312). Among various organizations, Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, India; Florida Atlantic University, Boca Raton, FL, United States; Carnegie Mellon University, the United States; and Azienda Sanitaria Locale Asl Napoli 3 Sud, Torre Del Greco, Italy are prime institutions working in research related to CPPA with AI. Depending upon the number of publications, Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) has the maximum contribution in a decade (104), followed by Advances in Intelligent System Computing (72), and ACM International Conference Proceeding Series and IEEE Access having 64 publications each. The major 11 journals are listed in Table 7 with yearly cumulative frequency.

Table 7 Major contributing journals

5.2 Prominent authors and co-authorship network

Figure 10 depicts the most productive authors, with the size of the bullet denoting the number of articles and the color intensity indicating the total citations per year. It can be seen that Liu Y. received the most citations per year for publications in 2016 and 2018, while Liu X. and Liu H. received the most citations in 2020. From 2019 onwards, there has been an upsurge in overall interest and publications linked to CPPA with AI. Figure 11 represents the co-authorship network of CCPA with AI, of 6550 authors in the dataset, with a minimum of 5 publications, 84 authors meet the threshold. Liu Y., with 20 publications, has topped the list in terms of the total number of publications, and Zhang C. has maximum total link strength (18), while Chen J. has the highest average citations per publication with a value of 31.57.

Fig. 10
figure 10

Yearly publications of prominent authors related to CPPA with AI

Fig. 11
figure 11

Authors co-authorship network in publications related to CPPA with AI

5.3 Keyword co-occurrence

The trending topics in CPPA are analyzed by creating a co-occurrence network based on authors’ keywords manually selected for AI domain coverage, and this association depicts the correlation among various research topics. As shown in Fig. 12, the themes of research publications during the decade are classified into four themes: the basic theme of research is machine learning, cybersecurity, and AI. Internet of things and supervised learning are niche themes, while neural networks, fraud detection systems, and random forest intersect motor and niche themes. Data mining, crime prediction, and big data applications analytics are basic and motor themes.

Fig. 12
figure 12

Thematic map

For the clustering of keywords, the Louvain algorithm, an unsupervised technique, is used as described in Algorithm 1 (Blondel et al. 2008). It begins with a weighted network of N nodes. First, a unique community is allocated to each network node. So, in this first split, there are as many communities as nodes. Then, for each node i, assess its neighboring node j and calculate the increase in modularity that should result from removing i from its community and putting it in the community of j. The node i is then assigned to the community with the highest gain, but only if the gain is positive. If no positive gain is achievable, i return to its original community. This method is repeated and successively for all nodes until no more improvement is possible. The first part of the algorithm is completed when the local maximum of modularity is reached, i.e., when no more iteration can enhance the modularity.

The second part of the algorithm is to build a graph from the communities found in part 1. The weights of the links between the new nodes are equivalent to the sum of the weights of the links between nodes in the corresponding two communities. Once part 2 is accomplished, part 1 is processed again to the resulting weighted network of nodes and repeats the iteration. The algorithm is terminated when maximum modularity is obtained. The algorithm is evocative of the self-similar nature of large networks and automatically integrates a concept of hierarchy as communities of communities are established during the process. The number of passes determines the height of the created hierarchy, which is usually a minimal number. Finally, a keyword co-occurrence map is formed using this approach.

Algorithm 1
figure c

Pseudo-code of Louvain method

Using the Louvain algorithm, 250 keywords with a minimum of 10 occurrences are selected and organized in five clusters. Each cluster of keywords represents the theme based on co-word analysis through keyword co-occurrences. Figure 13 depicts the formation of five clusters: (i) Machine Learning, (ii) Neural Networks, (iii) Deep Learning, (iv) Internet of Things, and (v) Data Mining. The keywords in each cluster concisely represent the contents of the research documents, and the size of the node depicts the number of occurrences of a keyword in their respective cluster.

Fig. 13
figure 13

Clusters with trending keywords

The evolution of a research field can be described by analyzing topics in different phases of the timespan. This whole timespan is divided into three phases: Phase I (2012–2016), Phase II (2017–2019), and Phase III (2020–2022). Based on occurrences, essential research topics are classified into three phases:

  • Phase I (from 2012–2016): AI was in an introductory phase in CPPA research domains. Still, a few emerging topics were decision support systems (Li et al. 2010; Zhouet al. 2012), game theory, geographical information systems, prediction tasks, neural networks, and computer simulation (Malleson et al. 2010; Birks et al. 2012), mathematical models, unsupervised approaches, network forensics (Adeyemi et al. 2015), data mining, spatial data mining, Bayes theorem, hotspot mapping (Hart and Zandbergen 2014), honeypots, hybrid models, conformal mapping, kernel density estimation (Gerber 2014), botnets (Wagen and Pieters 2015), support vector machines, cyberbullying (Aboujaoude et al. 2015), malware analysis, intrusion, anti-phishing, android, social science computing, biometrics, apriori algorithm, and others were popular one.

  • Phase II (from 2017–2019): the major emphasis was on android malware, video surveillance systems, statistics and numerical data, data mining applications, malware classification, mobile security, predictive analysis, fuzzy systems, fuzzy logic, android, KNN (K-nearest neighbor) algorithm, social networking sites, crime mapping, machine learning (Lin et al. 2018), smart city (Chiodi 2016; Ghosh et al. 2016), big data (Sanders and Sheptycki 2017) (Williams et al. 2016), deep neural networks (Chun et al. 2019), risk perception, spatiotemporal pattern, GIS (geographic information system), particle swarm optimization, pattern matching, artificial neural networks, genetic algorithms, botnet, text mining techniques, decision support system, text mining, data mining, multilayer perceptron, web crawler, back propagation neural networks, correlation analysis, juvenile delinquency, threat detection, crime patterns, cryptography, cloud computing, network intrusion detection, digital forensics, fuzzy neural networks, crime forecasting, fraud detection system, and cybercrime.

  • Phase III (from 2020–2022): prominent research areas are feature selection, attack prediction, time series data, supervised learning approaches, intrusion detection systems (Thakkar and Lohiya 2022), cyberbullying (Kaur and Saini 2022), social media, and cyber threats (Humayun et al. 2020). Moreover, prediction modeling, network security, data mining techniques, support vector machine, BERT (Bidirectional Encoder Representations from Transformers), deep learning, machine learning algorithm, graph neural network, XGboost (extreme gradient boosting), object detection, generative adversarial networks, gradient boosting, edge computing, CNN (convolution neural networks), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory), IoT, KNN (K-nearest neighbor), spatiotemporal, vehicles, credit cards, cyber security, optimization algorithms, turing machines are the trending topics.

5.4 Landmark publications

Among the AI-based CPPA publications, “DeepXplore: Automated Whitebox Testing of Deep Learning Systems” by Pei et al. has a maximum citation of 457 when the database is downloaded and has a maximum value of total citations per year, as well as having the highest normalized citation. “DroidMat: Android Malware Detection through Manifest and API Calls Tracing” by Wu et al. ranks second in terms of total citations. “Predicting crime using Twitter and kernel density estimation” by Gerber is also the most widely cited publication for crime prediction.

6 Discussion

Defining, predicting, and investigating the crime is a challenging task. Many research organizations, governments, and international agencies are working vigorously to predict and control the crime rate. They are proactively adopting advanced technology, intelligent systems, and efficient data-driven methods to prevent crime, as well as provide a sustainable environment for their citizens (Shah et al. 2021; Walczak 2021). In this study, we presented an inclusive and detailed evaluation of 10 years of bibliographic data related to research activities on crime prediction, prevention, and analysis using VOSviewer. This tool is used for better data visualization and understanding. Various parameters like yearly publications, contributions by different subject categories, major publishers, citation analysis of various publications, collaboration analysis among authors, organizations, major bibliographically coupled documents, evolution in the research topics, and research advancement of CPPA with AI were examined in this study to get a substantial outlook on the research in this last decade. The number of publications shows a tremendous increase over a decade, and a peak in publications was noted in 2021. In crime-related research fields, not only articles but many books and book chapters made a remarkable contribution. Computer Science and Social Science are the dominant subject categories in research related to CPPA, making more than half of the total contribution in all the manuscripts from the statistical data of Fig. 2 (see Sect. 4.1). Psychology, Medicine, and Engineering also made an exceptional contribution. After Computer Science, Social Science is at the forefront, with the second-largest number of research articles over the last 2 years. Citation analysis mentioned in Table 4 (see Sect. 4.2) depicts that “The epidemiology of traumatic event exposure worldwide: results from the world mental health survey consortium”, “Initial evidence on the relationship between the coronavirus pandemic and crime in the United States”, “Electricity theft detection in Ami using customers’ consumption patterns”, “A survey on malware detection using data mining techniques”, and “Predicting crime using Twitter and kernel density estimation” has the highest normalized citations value respectively. As observed from the journal citation network in Fig. 3, the Journal of Interpersonal Violence is the representative journal with the highest scientific output. Child Abuse and Neglect, International Journal of Offender Therapy and Comparative Criminology, and PLOS One are also prevalent in publications. The United States, the United Kingdom, China, and India have made a remarkable contribution and collaborated with other nations to create effective systems for crime prevention (Fig. 4).

As discussed in Table 6 (Sect. 4.4) describing the most influential organizations, Sam Houston State University has the highest number of documents, but based on ANCs, the most dynamic organizations are: Public Safety Canada, Ottawa, ON, Canada; Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden; Carleton University, Ottawa, ON, Canada; and George Mason University, United States.

Graphical visualization in Fig. 6 reveals that based on average citation per publication, Belsky D.W., Cao Y., Fazel S., Capsi A., Moffitt T.E., Fisher H.L., Babchishin K.M., and Arseneault L. are the most competent authors. Moreover, the co-authorship network depicts Arseneault L., Kessler R.C., Sampson N.A., Stein M.B., Ursano R.J., Moffitt T.E., Caspi A., and Wang Z. have published influential work in association with other authors.

Furthermore, key research topics over different phases have been examined in Fig. 7 and found that feature extraction, predictive analysis, deep learning, data analytics, internet of things, convolution neural networks, machine learning, predictive modeling, big data, crime pattern, and cybercrime are the new emerging topics.

This study is intended to help researchers and policy-makers seeking to capture a broad overview of the directions of research associated with CPPA and AI-assisted CPPA. Overall, this research area continually expands its scope to address emerging issues comprehensively. The complexity of the field will only increase along with the amount of research. Moreover, Table 7 provides clear insights into journals publishing research related to CPPA with AI. The most eminent journals are Lecture Notes in Computer Science (including subseries notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Advances in Intelligent Systems and Computing, ACM, and IEEE Access. In CPPA, feature extraction, youth violence, predictive analysis, machine learning, fraud detection, deep learning, cybersecurity, and others are trending topics. Figure 13 identifies how the crime prediction and crime pattern analysis study is progressing with AI and where challenges and opportunities exist, providing knowledge about semantic similarities and connections between CPPA and AI trending technologies through cluster formation. Spotlight themes discovered are deep learning, data mining, machine learning, neural networks, and the internet of things. In machine learning, the top interconnected keywords are cybersecurity, AI, intrusion detection, feature selection anomaly, malware, security, and artificial neural networks. For data mining themes, crime prediction, big data application analytics, classification, crime analysis, social media, predictive policing, Twitter, text mining, and crime forecasting are popular. Deep learning covered convolution neural networks, natural language processing systems, feature extraction, LSTM, deep neural networks, ensemble learning, predictive models, and sentiment analysis. The neural network involved a fraud detection system, random forest, decision tree, phishing, credit card, SVM, logistic regression, fraud, naïve Bayes, decision trees, KNN, and others. Internet of Things encompasses supervised learning, crime detection, genetic algorithms, and pattern recognition. This study provides a road map for professionals in this discipline, identifying key publications, researchers, educational institutions, and nations. Future researchers will benefit from the insights presented in this study in discovering forthcoming patterns and technological innovations.

7 Threats to validity

Identifying threats to validity is crucial for doing reliable technical research. To ensure construct validity, we used an automated search rather than manually scanning headings in online publications. This may have resulted in losing pertinent publications. Query writing is an essential component of construct validity, and every digital repository provides unique alternatives for processing queries. To mitigate the risk of missing relevant CPPA domain manuscripts, the search string is finalized after consultation with authors and AI domain experts. However, any changes to the keywords used in the search string may result in some variation in the results. We may have overlooked relevant data features throughout our evaluation procedure, which could be a challenge to construct validity. Regarding internal validity, the threat is the data source selection. The dataset is solely obtained from Scopus; other data sources can be explored in the future. Secondly, the instruments utilized for scientometric research are VOSviewer and Biblioshiny only. More tools can be deployed in the future. To ensure conclusion validity, there is no threat to the discussion and conclusion because they are based on the findings of the acquired dataset and analysis. It does not include any personal opinions. Reliability validity is related to the research replication. The tools used, Bibliometrix and VOSviewer, are available online. All the necessary details regarding the study conducted are mentioned in the paper.

8 Conclusion

This research uses scientometric analysis to perform a novel, systematic, and comprehensive review of CPPA research and its amalgamation with AI. We constructed knowledge maps of landmark publications, journals, institutions, their interconnectedness, scientific collaborations, leading researchers, and spotlight research topics. Detailed keyword analysis has been performed to understand the evolution of research domains related to CPPA for over a decade. In addition to this, AI’s connectedness with other theoretical structures, disciplines, and technological developments has been explored. Clusters also provide a clear insight into linked keywords of the research domain. Machine learning and data mining techniques are gaining popularity and are widely used for CPPA research. On the other hand, deep learning and neural networks have been emerging fields of research for the last 5 years. Our findings have the restriction of representing available research but not unpublished information or continuing study. Despite these flaws, it clarifies the organization and progression of research in CPPA as well as CPPA with AI. It suggests directions for further research in this critical field. Additional sources of data, such as the Web of Science or Google Scholar, can be investigated in future investigations, and other analyzing tools could potentially be considered in the future.