1 Introduction

The crime scene is a pervasive global social issue. Crimes impact the quality of life of a certain nation, including economic prosperity and reputation. In addition, the crime rate has increased dramatically over the past few years [1, 2]. Preventive steps must be taken for law enforcement to minimize the crime rate. There is a need for advanced systems and innovative techniques to enhance crime analytics to protect their communities.

Augmented Reality (AR) and Virtual Reality (VR) are advanced technologies that have been used for crime scene investigation and analysis. AR is a similar technology to VR that shows holograms [3]. Unlike VR, which completely replaces the user's visual surroundings, AR blends the real world and the digital environment [4]. AR is defined in [5] as a technology that enables real-time viewing and interaction with virtual visuals superimposed on the real world. AR systems bring virtual things to the user, who remains physically stationary while having AR. That is because an AR gadget requires a positioning solution relative to some real-world reference [6]. AR technology enables the creation of novel collaborative experiences in which co-located people can interact with and view 3D virtual objects. Annotating a live video enables a remote user to work with a distant user to enhance the face-to-face collaborative experience [7].

According to Locard's principle of exchange, it is hard to interact without exchanging some material substance. The forensic investigations at crime sites are based on this principle. Among the common material substances, bloodstain pattern analysis, by definition, examines the long-term consequences of transient blood loss. To assist in reconstructing a murder scene, bloodstain patterns may be examined to establish whether witness and victim testimony is believable. According to the International Association of Bloodstain Pattern Analysts (IABPA), a bloodstain pattern is defined as "a grouping or distribution of bloodstains that reveals the method in which the pattern was deposited by regular or repeating form, order, or arrangement.". If the victim was discovered to have suffered blunt force trauma, there are a variety of common bloodstain patterns that can be detected at crime scenes.

Bloodstain pattern recognition is relevant to the crime scene investigation. A large loss of blood almost often follows violence in criminal activity. When huge amounts of blood gather and then dry on the surface, a saturation stain might form on a rug or linen. Blunt-ended things like golf clubs, candelabra, and other similar instruments can cause severe blood loss to the brain. What kind of materials may be uncovered in this case depends on who the victim or offender, or witness is (if any). It is because of the impact mechanism or, more accurately, the impact force that stains the surface. "Impact Pattern" refers to the fact that it is the result of anything affecting the blood [2]. The IABPA uses the term "Transfer Stains'' to describe the patterns of bloodstaining that develop when blood touches another surface. If there are bloody fingers, weapons, half-blooded shoes, and other bloody materials at the crime scene, it may be difficult to remove transfer stains. A crime scene can be partially or virtually rebuilt by using transfer stains, such as voids, saturation stains, and cast-off patterns.

In recent years, Object segmentation aims to partition an image into meaningful regions corresponding to different objects. Wang et al. [8] provide a comprehensive review of modern object segmentation approaches. Their work covers a wide range of techniques, including region-based methods, contour-based methods, and deep learning-based methods. They discuss the strengths and limitations of each approach, emphasizing the importance of context-aware segmentation for accurate object delineation.

Csurka et al. [9] present a survey spanning two decades of research in semantic image segmentation. They trace the evolution of segmentation methods, from early handcrafted features to the recent surge in deep neural networks. The authors highlight key milestones, such as the introduction of fully convolutional networks (FCNs) and the development of large-scale annotated datasets. Understanding this historical context provides valuable insights into the challenges faced by researchers and the progress made over time.

Image segmentation involves localizing objects based on natural language descriptions. Although not as extensively studied as other segmentation tasks, it has gained attention due to its practical applications. An author explores referring image segmentation methods. These methods leverage multimodal information, combining visual cues with textual descriptions. Investigating this area can lead to innovative solutions that bridge the gap between language and vision [10].

Accurate evaluation of segmentation results is crucial for assessing algorithm performance. Huang et al. [11] propose deep neural networks for target segmentation evaluation. They discuss various evaluation metrics, including intersection over union (IoU), Dice coefficient, and pixel-wise accuracy. Researchers must choose appropriate metrics based on their specific segmentation task and dataset. Understanding these evaluation techniques ensures rigorous assessment of segmentation models.

Financing is an issue that may confront HoloLens, a Microsoft tool as a real-life application for augmented reality devices [12]. It is quite burdensome as each investigator should have a set, relying on HoloLens for all investigations might be difficult for crime scene units. This may be overcome by creating a phone-based AR application instead of enabling other officers to use remote assistance on their devices, such as Dynamics 365, which can be used to collaborate instantly without having to use HoloLens. However, all related personnel must undergo training in using both the lens and the remote assistance to ensure the viability of collected evidence [13]. The study of [14] addressed this by using remote spatial interaction with the physical scene offered by mediated Reality.

Recent years have seen a surge in the development of deep learning architectures for References [15, 16]. This discussion contains an up-to-date description of the current hyperspectral classification architectures and a discussion of outcomes for common datasets. Neural network designs for well-known Hyper Spectral Images (HSI) datasets are examined in [17]. Another intriguing aspect of this research is that it focused on the effects of data augmentation, transfer learning, and residual learning on classification accuracy. Learning from a restricted training set is critical in HSI classification since training labels are typically few [18]. It is encouraging that new designs, such as [19], are emerging that can reduce the need for many tagged samples. Hybrid dilated residual networks, described in [20], are another innovative and fascinating technique.

Considering this discussion, it can be observed that there is a need to investigate automatic solutions based on image classification and segmentation to support the crime investigators in their work. Therefore, this paper aims to provide an efficient integrated system for bloodstain pattern recognition in crime scenes based on machine learning and deep learning algorithms. Additionally, we propose an image segmentation approach based on FACM to provide an integrated system.

The rest of the paper is organized as follows: Sect. 2 critically reviews the related works. Section 3 presents the proposed AR crime investigation system. Then, the experimental analysis, including hardware and software specifications, is given in Sect. 4. Section 5 is devoted to the discussion of this study and its findings. The conclusion and future perspectives are drawn in Sect. 6.

2 Related work

2.1 Technical review

As this study focuses on the application of AR in crime scene investigation, it would be prudent to evaluate some relevant literature. Mixed Reality (MR) is a practical method that enables remote collaboration through nonverbal communication, according to [20]. Their research centred on integrating multiple types of MR remote collaboration approaches, allowing for the expansion of MR's capabilities and user experience with a new variety of remote collaboration. In [20], the authors showed an MR system that incorporated 360-degree panorama photos into 3D reconstructed scenes. Using a novel technique that interacts with several 360 panoramic fields within these restructured scenes, a distant user can transition between many 360 scenes, such as live, past, present, and so on, thereby fostering a better knowledge of space and interactivity.

The authors, in [21], provided an innovative MR analysis system that gives 3D representations of several users in a collaborative setting. This strategy emphasized the importance of data on persons' movements and behavior and their interactions with digital objects. The authors recognized the inadequacy of other methods of analysis for this objective, and as a result, a novel device representing individuals using head-mounted gadgets was created. According to this work, the qualities of an MR device, such as the procedures for analysis, cannot be found in other equipment.

In contrast, [22] characterized the fundamental purpose of human factors research such as the development and deployment of Virtual Environments (VE) to improve human lives in various settings. In [7], the authors demonstrated the availability and presence of AR devices, which are explored in [22]. Typically, 2D peripherals are worn by professionals who access video feeds from 3D head-mounted devices and enhance them with spoken or digital data. A relevant concern is if these devices can also be utilized for remote consultations; therefore, the authors in [7] re-evaluated this device and found that participants had preferences for particular settings despite comparable usability scores.

2.2 Crime scene investigation review

Numerous studies, such as [23, 24], have emphasized using AR technology to investigate crime scenes. Recent research, like [25], demonstrated that AR technology could enable dispersed teams to conduct crime scene investigations. The authors, in [24], described the term crime scene as a strategy for comprehending and enhancing locations.

It is essential to record photographs and videos of the crime scene to examine the digital evidence for possible clues thoroughly. This approach was developed by [26] to use the collected footage to create a 3D representation of the crime scene. The results indicated that a realistic reconstruction can be obtained with advanced computer vision algorithms. This objective was reflected in [27], which investigated the usage of an AR annotation tool whose relevance was based on the need for forensic specialists to collect crime evidence promptly and contamination-free. This application enables forensic specialists to tag and share evidence at crime scenes. Using a qualitative methodology, [27] discovered that annotation could result in improved crime scene orientation, a streamlined collection procedure, and reduced administrative pressures. Existing annotation prototypes are technically limited due to time-consuming feature monitoring, but AR annotation is more promising, usable, and valuable for analyzing crime scenes. This is supported by [23], which emphasized the enhanced utility and effectiveness of forensic simulations and crime scene investigation in virtual environments utilizing augmented reality techniques. With the use of AR technology, useful tools, and quick access to key databases, law enforcement, and investigation personnel can mark and highlight evidence and conduct real-time examinations.

Alternatively, [28] illustrated how 3D documentation and data integration resolved reconstructive issues regarding the progression of pattern injuries. Moreover, [21] exhibited an MR system employing 360 panorama photographs in 3D reconstructed landscapes, allowing a remote user to switch between various 360 sceneries. Their study centered on integrating several forms of MR remote collaboration technologies, enabling the growth of MR's capabilities and user experience with a new sort of remote collaboration. They described MR as a plausible method of facilitating distant collaboration through nonverbal communication. AR and MR as new tools for combating crime and terrorism were discussed in [29].

2.3 Bloodstain pattern age recognition review

Detection of the age of bloodstain is an important issue in crime investigation. The detection can be obtained by several non-automatic experimental techniques such as white blood cells and blood plasma tests [30]. Another method proposed in [31] for Blood Stain Pattern Analysis (BPA) based on Raman spectroscopy. This method takes a long time to extract the blood sample from the crime scene and transport it to the experimentation platform before obtaining the result. In addition, it is not cost-effective due to the high cost of the required instruments. Therefore, there is a need to investigate some automatic and cost-effective methods for bloodstain recognition and detection.

There are a few works in the literature that investigated automatic BPA [32, 33]. One notable system, presented by Gee et al. in [34], utilizes AR to enable in-situ 3D annotation of physical objects and environments. This system integrates GPS and UWB positioning technology with real-time computer vision to create a virtual incident map. Investigators can collaboratively create a scene map with the help of a centralized control component.

Another system, known as IC-CRIME [35], supports investigators in creating a detailed and interactive re-creation of a crime scene's physical space. This is achieved through a combination of laser telemetry scans, digital photographs, and user-generated annotations [35].

While systems for presenting crime scene data enhance understanding of crime events, they do not actively assist in the investigative process. To address this gap, researchers have focused on developing software tools to support the processing and analysis of crime scenes. Two widely used tools in this regard are HemoSpat1 and BackTrack2, both of which offer a graphical user interface and automate certain calculations related to BPA.However, these existing tools still have drawbacks. On-scene actions, such as manually measuring stain coordinates, drawing reference lines, and capturing images without perspective distortions, remain tedious. Additionally, substantial user input is required for tasks like indicating reference lines, delineating scales, entering stain coordinates, and guiding the semi-automatic ellipse fitting process.

In recent years, attempts have been made to address these drawbacks and make BPA fully automatic. One approach proposed in [36] employs computer vision techniques to analyze individual stains in a crime scene and calibrate multiple spatter images into an overhead picture with a unified coordinate frame. Although the authors claim to obtain the region of origin, no error evaluation is provided for the fully reconstructed result.

The study in [37] aims to simplify BPA calculations, particularly for inexperienced users, using a simple image processing algorithm. This approach involves a four-step process analysis: blood color identification, marker identification, major axis angle calculation, and impact angle calculation. The proposed approach achieves approximately 10% error compared to the 2% error obtained by the manual process.

Fiducial markers and digital images in an automated and virtual framework are utilized in [38]. Fiducial markers are placed within a crime scene to establish a global coordinate frame, and individual stains are analyzed using an active bloodstain shape model (ABSM). The estimated impact and glancing angles are used to approximate the stains' flight paths linearly. Experiments involving synthetic crime scenes demonstrate the potential of this approach in analyzing bloodstain spatter patterns. However, no quantitative evaluation of error is performed against ground truth data. Like the previous approaches, this work automates only certain steps of BPA and does not address the capturing of digital images.

An examination of existing literature reveals that a limited number of prior studies have explored the subject of bloodstain segmentation in color images. The methods proposed in these studies primarily relied on skin color detection, specifically face segmentation [39]. Notably, these methods focused on determining the appropriate color space and thresholds for detecting pixels similar to red [40,41,42]. Additionally, the fast 8-connected component labeling method was employed to identify the suspected bloodstain region.

Deep learning has been involved in several research fields such as the medical applications [43,44,45], human biometric recognition [46,47,48], behavior detection [49, 50], and cybersecurity [51, 52]. Due to its efficiency and accuracy, this paper deploys several deep learning methods for BPA to detect the age of the bloodstain and recognize it among the objects in the video frames.

After a thorough analysis of all studies employing augmented reality technology with 3D scanning techniques, a few studies have provided evidence of achieving a high level of investigation process efficiency by assisting investigators with communication, interaction, and collaboration among local and remote officers. This study aims to introduce a new type of communicative and collaborative investigation to investigate new ways of enhancing investigative efficiency and speed in determining the actual crime scene. In this study, we aim to propose an enhanced AR system for effective communication and interaction in visual-spatial crime scenes. The contributions of this work can be illustrated as follows:

  1. 1-

    To design a classification model with low processing time based on light-weight networks.

  2. 2-

    To compare the designed model with the traditional deep learning models such as CNNs and LSTMs.

  3. 3-

    To design an image segmentation method based on machine and deep learning methods.

  4. 4-

    To propose a combined bloodstain pattern detection system which is suitable for embedded systems.

3 The proposed crime investigation system

This paper proposes a system for automatic bloodstain pattern detection based on artificial intelligence. This system comprises two main approaches, image classification and segmentation, as shown in Fig. 1. The classification task is requested to identify the age of the bloodstain over a certain surface among seven categories, while the segmentation task is requested to make a contour surrounding the bloodstain.

Fig. 1
figure 1

The proposed framework for bloodstain pattern recognition

The classification approach consists of three stages. The first stage is required to fragment the input videos into frames. The second stage is the data pre-processing which shuffles and splits the fragmented frames from the videos into train, validation, and test subsets. The strategy is to split the dataset into 80% and 20% for training and testing, respectively. The validation process comprises k-fold cross-validation and hold-out calidation techniques. The k-fold is performed with k equals 10, while the hold-out is performed on the train and test subsets to take the average value of the resulting performance among the excuted simulation results. The third stage employs machine learning models to classify the video frames. In this task, we deployed some machine learning models such as SVM, KNN, RF, DT, MLP, QDA, and LR. In addition, we designed some deep learning models comprising some spatial algorithms lightweight models such as shallow networks, ConvLSTM, CNN, and CNN-ConvLSTM, along with some depthwise CNNs.

The second approach is image segmentation, which is based on FACM and consists of five stages. The first stage is required to preprocess the input images, while the second stage includes the fuzzy membership function. The third and fourth stages are required to initialize and evaluate the generated contour. The fifth stage is required to postprocess the segmented image.

3.1 Proposed DLMs

3.1.1 Lightweight CNNs

This section discusses shallow networks based on lightweight networks. The main objective of this method is to design a lightweight model which has a low processing time. To accomplish this objective, we designed a depthwise networks based on CNNs.

Depthwise convolutional neural networks (DCNNs) are a variant of CNNs that have gained popularity in recent years due to their improved efficiency and reduced computational cost. Unlike traditional CNNs, which perform convolution operations on the entire input feature map with a fixed number of filters, DCNNs perform convolutions separately for each input channel with a much smaller number of filters. This results in a significant reduction in the number of parameters and computations required, making DCNNs ideal for mobile and embedded devices.

The depthwise convolution operation can be represented mathematically as follows:

$${y}_{i,j,k}={\sum }_{r=0}^{n-1}{\sum }_{s-0}^{S-1}{x}_{i+r,j+s,k}{h}_{r,s,k},$$
(1)

where \({x}_{i,j,k}\) is the value of the input feature map at spatial location \(\left(i,j\right)\) and channel \(k\), \({h}_{r,s,k}\) is the value of the depthwise convolution filter at spatial offset \(\left(r,s\right)\) and channel \(k\), and \({y}_{i,j,k}\) is the output value at the spatial location \(\left(i,j\right)\) and channel \(k\). \(R\) and \(S\) are the spatial dimensions of the filter, which are typically much smaller than the spatial dimensions of the input feature map.

In a depthwise convolutional layer, the input feature map has \({C}_{in}\) channels and the depthwise convolution is applied separately to each channel using a different set of filters. The resulting feature map has the same spatial dimensions as the input with \({C}_{in}\) times fewer channels. The output feature map is then fed to a pointwise convolutional layer, which applies a \(1\times 1\) convolution with \({C}_{out}\) filters to combine the reduced channels into a new feature map.

The pointwise convolution operation can be represented mathematically as follows:

$${z}_{i,j,l}={\sum }_{k-0}^{{C}_{in}-1}{y}_{i,j,k}\times {w}_{k,l}$$
(2)

where \({w}_{k,l}\) is the value of the pointwise convolution filter at input channel \(k\) and output channel \(l\), and \({z}_{i,j,l}\) is the output value at spatial location \(\left(i,j\right)\) and output channel \(l\). The pointwise convolution effectively performs a linear combination of the reduced channels to generate the final output feature map.

In this research, a proposed DCNN architecture is introduced, which comprises several layers, including a depthwise 2D convolutional neural network. The depth of this network was set at 192, and the input size is specified as 224 × 224. Additionally, a max pooling layer with a kernel size of 2, a relu activation layer, a flatten layer, and a dense layer are included in the model. The proposed DCNN is found to have a total of 16,860,871 parameters, all of which are deemed trainable without any untrained parameters being left out (Fig. 2).

Fig. 2
figure 2

Architecture of the proposed DCNN

3.1.2 Deep learning models

This paper provides a deep learning-based technique for augmented Reality. CNN and CLSTM are the core of the suggested deep learning strategies (ConvLSTM). The current state of the art is designing a deep learning model that can extract feature maps from input pictures and enroll these feature maps into a classification network to distinguish between normal and abnormal states. The performance of this design is determined by its ability to differentiate between normal and abnormal states with minimum false-positive rates. The key contribution is consequently developing an efficient framework for deep learning. Hierarchically organized convolutional, pooling, and ConvLSTM layers constitute this architecture. In addition, a classification network is acquired to manage the deep learning architecture's feature maps and assess whether the input photographs are normal.

  • CNN Models

CNNs have become a powerful tool for image processing and recognition tasks. A 2D CNN is a type of CNN that is specifically designed to process two-dimensional data, such as images. In this section, we provide an overview of the mathematical concepts behind 2D CNNs. A 2D CNN consists of several layers, including convolutional layers, pooling layers, and fully connected layers. The convolutional layers are responsible for detecting features in the input image, while the pooling layers are used to reduce the dimensionality of the output of the convolutional layers. The fully connected layers are used to classify the image based on the features detected in the convolutional and pooling layers. The main operation in a 2D CNN is the convolution operation, which is defined as follows:

$$\left(f*g\right)\left(i,j\right)={\sum }_{m} {\sum }_{n} f\left(m,n\right)\times g\left(i-m,j-n\right)$$
(3)

where f and g are two matrices, and * represents the convolution operation. The convolution operation involves sliding the filter g over the input matrix f, and computing the dot product between the filter and the corresponding portion of the input matrix. The convolutional layer in a 2D CNN consists of a set of filters, each of which is responsible for detecting a particular feature in the input image. Let F be the set of filters, and let f be the input image. The output of the convolutional layer can be computed as follows:

$${H}_{i,j,k}=\sigma \left({\sum }_{m}{\sum }_{n}{F}_{m,n,k}{\times f}_{i-m,j-n}+{b}_{k}\right)$$
(4)

where H is the output feature map, F_m,n,k is the weight of the k-th filter at position (m,n), f_i-m,j-n is the input pixel value at position (i-m,j-n), b_k is the bias term for the k-th filter, and is the activation function. Common activation functions include ReLU, sigmoid, and tanh. After the convolutional layer, the output feature map is typically passed through a pooling layer, which is used to reduce the spatial dimensionality of the feature map. The most common type of pooling is max pooling, which involves taking the maximum value in each sub-region of the feature map. Let H be the output of the convolutional layer, and let P be the output of the pooling layer. The output of the pooling layer can be computed as follows:

$${P}_{i,j,k}=\underset{m}{max}\underset{n}{max}{H}_{i\times s+m,j\times s+n,k}$$
(5)

where s is the stride of the pooling operation, which determines the amount of overlap between adjacent sub-regions. In conclusion, 2D CNNs are a powerful tool for image processing and recognition tasks, and are based on the mathematical concepts of convolution and pooling. These operations are used to detect features in the input image and reduce its dimensionality, leading to better accuracy and efficiency in image classification tasks.

CNN is the foundation for the first deep learning model proposed. Five convolutional layers (CNV) are followed by five pooling layers in this model (PL). This hierarchy is implemented to extract features from the input photos and build a feature map that is then enrolled in the classification network. Each CNV. A layer of the deep learning architecture generates a feature map with the same amount of digital filters as its layer. In addition, the pooling layer is used to reduce the number of features. The pooling layer can be implemented using two distinct methods (max. pooling and mean pooling). The feature map is divided into rectangular windows of a specific size. The max-pooling technique collects the largest value from each window, whereas the mean pooling technique extracts the window's mean value.

The classification network comprises two layers (a fully connected layer and a classification layer). This layer manages the generated feature map from the hierarchy of convolutional and pooling layers. This layer turns the 3D feature map into a feature vector, which is then enrolled in the classification layer, which classifies the feature vector and determines whether the input image's feature vector belongs to the normal or abnormal category (Fig. 3).

Fig. 3
figure 3

Architecture of the proposed CNN DLM

  • ConvLSTM Model

Convolutional LSTM is a type of recurrent neural network (RNN) that can process sequential data with both spatial and temporal dependencies. It is commonly used in various fields such as computer vision, natural language processing, and speech recognition. The convolutional LSTM architecture combines the concepts of CNNs and LSTMs, allowing for the efficient processing of spatially and temporally correlated data.

The convolutional LSTM can be expressed mathematically as:

$${i}_{t}=\sigma ({W}_{xi}*{x}_{t}+{W}_{hi}*{h}_{t-1}+{W}_{ci}\circ {c}_{t-1}+{b}_{i})$$
(6)
$${f}_{t}=\sigma ({W}_{xf}*{x}_{t}+{W}_{hf}*{h}_{t-1}+{W}_{cf}\circ {c}_{t-1}+{b}_{f})$$
(7)
$${c}_{t}={f}_{t}\circ {c}_{t-1}+{i}_{t}\circ tanh({W}_{xc}*{x}_{t}+{W}_{hc}*{h}_{t-1}+{b}_{c})$$
(8)
$${o}_{t}=\sigma ({W}_{xo}*{x}_{t}+{W}_{ho}*{h}_{t-1}+{W}_{co}\circ {c}_{t}+{b}_{o})$$
(9)
$${h}_{t}={o}_{t}\circ tanh({c}_{t})$$
(10)

where \({i}_{t}\), \({f}_{t}\), \({o}_{t}\), \({c}_{t}\), and \({h}_{t}\) are the input gate, forget gate, output gate, cell state, and hidden state at time step \(t\), respectively. \({x}_{t}\) is the input at time step \(t\), \({h}_{t-1}\) is the hidden state at the previous time step, \(W\) and \(b\) are the weights and biases of the network, and \(\sigma\) and \(tanh\) are the sigmoid and hyperbolic tangent activation functions, respectively. The symbol \(*\) denotes the convolution operation, and \(\circ\) denotes the element-wise multiplication.

The input gate \({i}_{t}\) controls the amount of information from the input and the previous hidden state that is used to update the cell state \({c}_{t}\). The forget gate \({f}_{t}\) controls the amount of information from the previous cell state that is retained or discarded. The output gate \({o}_{t}\) controls the amount of information from the current cell state that is used to compute the hidden state \({h}_{t}\).

The convolutional LSTM uses the convolution operation to capture the spatial dependencies of the input, and the LSTM structure to capture the temporal dependencies. This combination makes it particularly effective for tasks such as video analysis, where the data has both spatial and temporal correlations.

In this study, another proposed hybrid deep learning model incorporating both ConvLSTM and CNN modalities. ConvLSTM is the 2D version of the LSTM algorithm. LSTM is designed to remember prior states and construct the present state. This modality is a double-edged sword because the current state depends entirely on prior states. Therefore, the decline in such a state will negatively impact the subsequent states. Therefore, deep learning methods must be treated with care and watched during training to identify potential anomalies. This model for deep learning consists of ten layers. Following a ConvLSTM layer is a pooling layer. Then, three CNV layers are applied, followed by three pooling layers. The classification network is identical to the initial model of deep learning. Unlike the original CNN-based deep learning model, this model consists of a smaller number of layers. This model is intended to simplify the design of the deep learning model (Fig. 4).

Fig. 4
figure 4

Architecture of The Proposed ConvLSTM DLM

3.2 Fuzzy active contour model (FACM)

The FACM is a segmentation method that uses fuzzy sets and active contours to delineate object boundaries in an image. The FACM combines the traditional active contour model with a fuzzy set representation of the image to provide more robust and accurate segmentation results.

The FACM energy function is defined as:

$$E\left(x\right)= {E}_{imt}\left(x\right)+ \lambda {\int } \left(\frac{1}{2\pi \sigma }\right)exp\left(-{\left({\frac{d\left(x\right)}{\sigma }}\right)}^{2}\right) dx$$
(11)

where \({E}_{imt}\left(x\right)\) is the internal energy of the contour, \(\lambda\) is a weighting parameter that balances the influence of the internal energy and the fuzzy energy, \(\sigma\) is a parameter that controls the spatial extent of the fuzzy membership function, and \(d\left(x\right)\) is the distance between a point \(x\) and the contour \(C\). The fuzzy membership function is defined as:

$$\mu \left(x\right)=exp\left(-{\left(\frac{d\left(x\right)}{\sigma }\right)}^{2}\right)$$
(12)

where \(\mu \left(x\right)\) represents the degree of membership of a point \(x\) in the region inside the contour \(C\).

The internal energy \({E}_{int}\left(F\right)\) of the contour is given by:

$${E}_{int}\left(F(x)\right)= \alpha {\int }{\left|\nabla F\left(x\right)\right|}^{2}dx + \beta {\int }{\left|{\nabla }^{2}F\left(x\right)\right|}^{2}dx$$
(13)

where \(F\left(x\right)\) is the level set function that represents the contour, \(\nabla F\left(x\right)\) and \({\nabla }^{2}F\left(x\right)\) are the gradient and Laplacian of \(F\left(x\right)\), and \(\alpha\) and \(\beta\) are weighting parameters that control the influence of the first and second-order derivatives of \(F\left(x\right)\), respectively.

The FACM approach involves minimizing the energy function \(E\left(F(x)\right)\) with respect to F(x). The optimization problem is solved using the level set method.

  • Level Set Optimization

Level set optimization is a numerical method for solving optimization problems that involve constraints. The method involves representing the feasible region of the optimization problem as the zero-level set of a function, known as the level set function. The level set function is defined to be positive inside the feasible region and negative outside the feasible region.

The optimization problem can be formulated as finding the minimum or maximum of an objective function subject to the constraint that the level set function is zero. This can be written mathematically as \(E\left(F\left(x\right)\right)\).

where \(g\left(x\right)\) is the level set function. To solve this problem, the level set function is evolved using a partial differential equation known as the Hamilton–Jacobi equation (HJE), which takes the form:

$$HJE=\frac{\partial g\left(x\right)}{\partial t} + H\left(q, \nabla g\left(x\right)\right)= 0$$
(14)

where \(q\) represents the position of the system, \(\nabla g(x)\_\) is the gradient of the action concerning the position, and \(H\) is the Hamiltonian, which is a function of the position and momentum of the system. The evolution of the level set function is guided by the gradient of the objective function, which tends to move the level set function towards the minimum of \(E\left(F(x)\right)\).

4 The experimental analysis

4.1 Samples of dataset

The proposed models are based on the Bloodstain Pattern Data Set (BSPDS) [53], a dataset including seven categories. These categories belong to the bloodstain pattern recognition. They are named "Plexiglas with fingers," "Plexiglas with a finger," "Plexiglas with a finger after 30 s", "Plexiglas with a finger after 90 s", "Plexiglas with a finger after 240 s", "Plexiglas with a paper towel" and "Plexiglas with a paper towel after 90 s", as shown in Fig. 5. In addition, a data augmentation technique (Convolutional Generative Adversarial Networks (CGAN)) is performed on the user data to increase the amount. Table 1 shows the frames in both trains, validation, and test phases.

Fig. 5
figure 5

Sample of bloodstain dataset [53]

Table 1 Description of the Bloodstain Pattern Dataset

4.2 Evaluation metrics

To rank the quality of the proposed solutions, various metrics are used. The F1-Score is based on the metrics of accuracy, Recall, Precision, and Accuracy. The corresponding equations define the measurements from (14) to (17).

where:

  1. 1)

    The number of sleepy states that were incorrectly labeled "normal" is shown in the false-negative column.

  2. 2)

    The True Positive metric indicates the percentage of drowsy states that were accurately identified.

  3. 3)

    The True Negative value indicates the proportion of false negatives correctly identified as false positives.

  4. 4)

    The number of times a normal status was mistakenly labeled as a drowsy status is shown in the false positive.

    $$Accuracy=\frac{No.\;of\;correctly\;detected\;images}{Total\;No.\;of\;images}\times100$$
    (15)
    $$Recall=TPR={T}_{P}/({T}_{P}+{F}_{N})=(1-FNR)$$
    (16)
    $$precision={T}_{P}/({T}_{P}+{F}_{P})$$
    (17)
    $$F1=2*((precision*recall)/(precision+recall))$$
    (18)

4.3 Hyperparameters Selection

The hyperparameters of the proposed models have been selected after several trials to achieve their optimal values of them. The proposed deep learning models are designed to achieve as trainable parameters as possible. Table 2 illustrates the total, trainable, and untrainable parameters of each deep learning model. It can be observed that the proposed DCNN and shallow networks have a fully trained parameter without any untrained ones. On the other hand, the proposed deep learning models, including ConnvLSTM and CNN-ConvLSTM, have some untrainable parameters of 486 and 902 for ConvLSTM and CNN-ConvLSTM, respectively. Therefore, as training wise, the DCNN and shallow networks outperform the deep learning models.

Table 2 Hyperparameters of the proposed deep learning models

Moreover, the classification task is carried out using some machine learning models. The hyperparameters of the proposed models are optimized using grid search technique. Table 3 illustrates the selected hyperparameters of these models.

Table 3 Hyperparameters of the proposed machine learning models

5 Results and discussions

5.1 Result of image classification approach

This proposes an automatic system for bloodstain pattern detection in crime scenes. The objective is to determine the age of the bloodstain, which is important to determine the time of the crime occurrence as well as detect the bloodstain in a certain captured frame. To achieve this objective, the proposed systems comprises both image classification and image segmentation approaches. The classification is performed using several deep learning and machine learning algorithms. The simulation results have been performed using local machine with the following specifications; Intel core i9 CPU, 128 GB RAM, 32 NVIDIA GPU.

The deep learning methodology includes pixel-wise and depth wise analysis. The pixel-wise method consists of shallow and Deep CNNs, while the depth-wise method contains a structure of convolutional neural networks. Figure 6a shows the learning curve of the proposed depth-wise CNN which represents the accuracy of detection during the training. It can be obtained that the accuracy value is stable around 100% because of its complex analysis based on the channels and the depth of the input image. Figure 6b shows the performance of the proof shallow networks, which is a 1-layer based network. Its fluctuation can be observed through the training till epoch number three; then, the performance becomes stable till the end of the process. This issue occurs due to the gradual learning from the input images. On the other hand, the 2-layer shallow network is learned smoothly during the training process. This smooth performance occurs due to the additional convolutional layer, which adds value to learning and feature extraction from the input images.

Fig. 6
figure 6

Performance curves of the proposed lightweight deep learning models

Furthermore, this paper proposes other deep learning methods based on ConvLSTM and CNN-ConvLSTM models. Figure 7 shows the proposed deep learning models including ConvLSTM and CNN-ConvLSTM models. It can be observed that the proposed ConvLSTM model achieved the optimal performance after seven epochs, while the proposed CNN-ConvLSTM model achieved the optimal performance at epoch number two. This difference between the learning capabilities occurs due to the addition of CNN which provides an aid to the feature extraction of the input images.

Fig. 7
figure 7

Performance curves of the proposed deep learning models

Moreover, this paper proposes machine learning algorithms for bloodstain image classification including DT, SVM, RF, LR, QDA, MLP, and KNN. Figure 8 shows the learning curves and performance of the proposed machine learning models. It can be observed that these models achieves an optimal performance regarding bloodstain image classification. The experimental results of the proposed methods are illustrated in Table 4 and demonstrate their effectiveness in efficiently classifying blood stains. Including recall, precision, F1 score, and accuracy. However, the processing time of these models is a crucial factor to consider. Notably, the shallow CNN model with one layer outperforms the deep learning models in terms of real-time applicability. Furthermore, the simulation results are validated by hold-out validation and k-fold with k equals to 10 and we calculated the average accuracy to prove the reliability of the system. Moreover, machine learning models such as DT, MLP, and LR demonstrate efficient time processing within milliseconds, averaging at 21 ms. These results suggest that the proposed machine learning models can be considered as efficient real-time solutions for blood stain pattern recognition.

Fig. 8
figure 8figure 8

Performance curves of the proposed machine learning models

Table 4 Brief comparison of the proposed models

5.2 Results of image segmentation

This section discusses the image segmentation approach based on FACM technique. The proposed method is carried out on an image as a sample from each category. The segmented images are validated by ground truth references to be evaluated. The evaluation metrics include accuracy, precision, recall, f1-score, and MCC, as illustrated in Table 5. The resulting segmented images are obtained after 40 iterations, as shown in Fig. 9. The simulation results reveal that the proposed method achieved a high performance regarding image segmentation with average accuracy of 99%. Therefore, it can be considered as an efficient solution for bloodstain pattern segmentation.

Table 5 Evaluation of the proposed image segmentation model
Fig. 9
figure 9

An Example of the visual results of the proposed image segmentation model

This work proposes an AI-based solution for bloodstain pattern recognition. The proposed system composes two approaches. The first approach is to classify the input images into seven categories presenting the description of the captured bloodstain image. In this approach, we deployed several deep learning and machine learning models. As a real time application, the objective is to achieve an accurate and speedy method. Figure 10 shows a comparison of testing time among the proposed models. It can be obtained through that the proposed machine learning models have a low testing time rather than the deep learning models. In addition, the optimal model among the deep learning models is the shallow network with 1-layer architecture. On the other hand, the optimal among the proposed machine learning models are MLP and LR models. Therefore, the proposed solutions can be deployed in real-time applications.

Fig. 10
figure 10

Brief comparison of testing time among the proposed models

6 Conclusion and future perspectives

The study of bloodstain pattern recognition is a significant focus in forensic analysis. In this paper, we propose an automatic blood stain pattern recognition system based on deep learning and machine learning modalities for image classification and segmentation tasks. The classification task method incorporates machine learning techniques such as SVM, KNN, RF, MLP, LR. and DT. Additionally, we have utilized deep learning models including CNN, ConvLSTM, DCNN, and CNN-ConvLSTM. Furthermore, we have proposed an image segmentation method based on FACM. The proposed system is evaluated in terms of its performance and processing time with different models. The proposed system has demonstrated high performance in image recognition and segmentation, surpassing previous works in the literature. Therefore, these methods can be considered efficient solutions for bloodstain pattern recognition and their application in crime scene investigation. In the next phase, we plan to deploy this model in real-time using a set of sensors and micro-controllers to create a prototype. Our ultimate aim is to achieve the highest time and cost efficiency.