1 Introduction

The volume of generated data is continuously growing from various sources, including websites, social media, mobile networks, and others. Such information can be extremely valuable to businesses if analyzed and utilized correctly. Machine learning algorithms have proven to be highly effective in analyzing data in various domains, including business [1], medicine [2,3,4], communication [5],intrusion detection [6], and industry [7], making them useful for data collection and analysis. With the help of these algorithms, businesses can gain more valuable insights, identify patterns, and build a deeper understanding of the collected data.

Constructing machine learning systems requires specialized expertise and a thorough understanding of each algorithm to optimize hyperparameter tuning. Major corporations offer commercial machine learning services, such as Microsoft Azure Machine Learning Studio (ML Studio) [8], Amazon AWS Machine Learning and Amazon SageMaker [9], and IBM Watson Studio [10]. These companies keep the trained models, training algorithms, and code confidential. They provide access to a payable simple Application Programming Interfaces (API). Small businesses, individuals, and organizations often face difficulties when it comes to processing data and building their own machine learning models. These challenges arise due to a lack of expertise, financial resources to use commercial services, or necessary infrastructure for building these systems.

Consequently, there is a need for cost-effective, user-friendly solutions for data processing and machine learning algorithms applications by companies, organizations, and individuals. MLaaS offers solutions that enable the use of machine learning algorithms without the need for expertise in their usage, parameter optimization, or computing resources management. MLaaS simplifies the use of machine learning algorithms in business analysis by reducing the complexity of data processing algorithms execution, model training, and deployment, and reducing the time required for these tasks.

Our contribution is an open-source unsupervised machine learning as a service (MLaaS) framework for visualizing, analyzing, processing, and extracting valuable insights from dataFootnote 1.Footnote 2 The framework simplifies the use of unsupervised machine learning algorithms for data analysis by offering a time-saving and cost-effective solution. It is user-friendly, dependable, adaptable, and scalable, and can be used by both experts and non-experts in machine learning. Cost-effective; The framework is built on open-source components and developed using open-source libraries such as scikit-learn, Python, Angular, and JWT. It does not require large devices or cloud infrastructure to operate and can run on local devices. When a comprehensive cost comparison between the proposed framework and other similar frameworks available in the market. The comparison included initial costs, maintenance costs, and upgrade costs. The results showed that our framework provides a cost saving of up to 85% in overall expenses. This reduces both initial and ongoing expenses, making it a cost-effective solution. User-friendly; The graphical user interface (GUI), built with Angular, presents components easily and simply, ensuring user usability. Conducted a usability test with diverse users, from beginners to experts. The ease of use was evaluated through questionnaires and surveys after using the framework. The results indicated that 90% of the participants found the framework easy to use and efficient in completing the required tasks. Dependability; is demonstrated through consistent performance metrics in real-world testing scenarios, such as customer segmentation and anomaly detection. Specifically, the framework was tested on various datasets where it consistently delivered accurate clustering and anomaly detection results. For instance, in customer segmentation tasks, the framework effectively identified distinct customer groups with high clustering metrics evaluation, aiding in targeted marketing strategies. In anomaly detection, it reliably identified outliers in transaction data, which were later confirmed as fraudulent activities by domain experts. All details are described in section 5. Adaptability and scalability; have been verified through metrics that show the framework’s capability to handle varying data types from different sectors with flexibility and ease, making it suitable for a wide range of applications. The scalability of the framework was tested by simulating increased load and user numbers. The results indicated that the framework could easily scale to support thousands of users without noticeable performance degradation, proving its capability to grow and expand as needed. The proposed framework comes with built-in algorithms for clustering and outlier detection. In the clustering category, it includes three algorithms: k-means, hierarchical clustering, and DBScan. For outlier detection, it features LOF, KNN, and GMM. The framework can operate autonomously or with user intervention and can be extended to include other algorithms. The proposed solution will be tested on clustering solutions and anomaly detection solutions in real-world data sets, such as customer segmentation and fraud detection. The following benchmark performance metrics will be measured to substantiate the above claims; time taken, Silhouette Score, Calinski–Harabasz index, and Davies–Bouldin index as clustering metrics, and area under the ROC curve as anomaly detection rate. . The test results will demonstrate the reliability, efficiency, cost-effectiveness, and time-saving benefits of the proposed solution.

The suggested framework is available as open-source software, accessible to facilitate the data exploration and manipulation of data and the application of unsupervised ML algorithms for both clustering and anomaly detection. As an open-source project, the framework benefits from contributions by a global community of developers and researchers. This collaborative approach accelerates the development of new features, improvements, and bug fixes, ensuring the framework remains up-to-date with the latest advancements in machine learning. The proposed open-source Framework fosters transparency through publicly available code, allowing users to trust its reliability and customize it for their specific needs. This flexibility and free availability make the framework accessible and adaptable for a wide range of applications.

The subsequent sections of the proposed paper are structured as follows: Section 2 offers an overview of related studies and existing papers on MLaaS systems. In Sect. 3, a comprehensive explanation of the proposed MLaaS framework, along with its essential components, is provided. Section 4 showcases the implementation of the proposed framework and its application in various use cases, such as customer segmentation, anomaly detection, and fraud detection. The section aims to assess the effectiveness of the proposed solution and evaluate its performance in real-world business environments. Section 5 focuses on the analysis and evaluation of the results obtained from the proposed framework. Finally, Sect. 6, provides concluding remarks and discusses potentially future directions for the paper.

2 Related work

MLaaS: There is an increasing fascination with building systems that incorporate machine learning. Such interest led to the development of several ML-based solutions. One such solution is PredictionIO, an open-source machine learning server that provides developers with a graphical user interface to evaluate, compare, and deploy scalable ML algorithms. It allows manual or automatic tuning of ML algorithms hyperparameters and monitoring of model training progress. In addition, the system provides an Application Programming Interface (API) for integration with other software programs [11].

Baldominos et al. [12] presented an architecture that leverages ML algorithms as a service for big data, capable of delivering real-time predictions and analysis. The architecture is built upon Hadoop, a framework used to store process, and analyze big data: it can process a large number of requests in real-time. The architecture provides solutions through a Representational State Transfer (RESTful) application program interface (API), enabling other systems to integrate with it and gain from it by storing, analyzing, and comprehending data.

Ribeiro et al. [13] proposed an open-source architecture for delivering scalable and flexible MLaaS. In their study, the authors demonstrated the implementation of their solution through a case of “power demand forecasting” that utilized real sensors and weather data. The authors showcased how different algorithms can be run concurrently within that context.

Mariani et al. [14] proposed a decision support architecture that offers MLaaS to aid healthcare professionals in identifying health risks for their patients. The architecture promises faster predictions and more efficient model deployment.

Paraskevoulakou et al. [15] proposed an extensible generalized approach to deploy machine learning functions as a service (MLFaaS). Their approach exceeds the traditional isolated and atomic services by offering composite services such as workflows and pipelines ML operations.

MLaaS Platform: When examining the industrial aspect, numerous platforms offer MLaaS, provided by big corporations including:

  • Microsoft Azure Machine Learning Studio (ML Studio) offers easily accessible machine learning algorithms and intuitive visual tools suitable for both novices and seasoned experts [8].

  • Amazon’s AWS Machine Learning and Amazon’s SageMaker [9], are MLaaS platforms that empower users to build machine learning models without the need for an in-depth understanding of intricate algorithms.

  • IBM’s Watson Machine Learning (Watson ML) is a visual service designed to aid users in rapidly recognizing patterns and making decisions [10].

  • Google’s Cloud Machine Learning Engine, is a ML engine that facilitates the development of ML algorithms for large and small datasets [16].

  • BigML offers a wide variety of algorithms for clustering, data visualization, and anomaly detection [17].

The aforementioned platforms, including PredictionIO and Baldominos’ solutions, are constrained by their reliance on specific analytical tools, leading to limited flexibility for incorporating new machine learning algorithms. Additionally, these platforms lack essential features such as a data preprocessing layer, anomaly detection algorithms, and hyperparameter tuning capabilities. Ribeiro’s research highlights that his study needs to extend beyond predictive modeling to applications like pattern recognition, outlier detection, ranking, clustering, and user interface design. The solutions offered by Microsoft, AWS, IBM, Google, and BigMl are proprietary, making it difficult to access and understand the underlying algorithms. Furthermore, such exclusivity limits the ability to incorporate new algorithms. To address these limitations, the proposed framework offers a comprehensive framework that overcomes these constraints by providing enhanced flexibility, a complete preprocessing layer, integrated anomaly detection, and advanced hyperparameter tuning, ensuring a more robust machine learning platform.

Subsequently, there is a significant demand for a framework designed specifically to provide unsupervised ML algorithms as a service. The contributions of the paper can be summarized as follows:

  1. 1.

    Comprehensive framework: The proposed framework provides a robust solution that streamlines data exploration, processing, and the application of unsupervised ML techniques, offering a holistic approach to data analytics.

  2. 2.

    Easy integration: The proposed framework includes a RESTful API web service, facilitating seamless integration with other applications. This enables users to leverage the solution within their existing systems effortlessly.

  3. 3.

    Data understanding and analysis: Emphasizing the importance of understanding and analyzing data, The proposed framework includes components for data comprehension, handling missing values, visualizing outliers, and representing column distributions using histograms.

  4. 4.

    Data processing: The proposed framework transforms raw data into clear, actionable information through data cleaning, outlier management, categorical data handling, dimensionality reduction, and data scaling. This process ensures the identification and addressing of anomalous data.

  5. 5.

    Hyperparameter tuning: The proposed framework offers automatic hyperparameter tuning for the algorithms used, with the option for user intervention, ensuring optimal model performance.

  6. 6.

    Dataset flexibility: The proposed framework accommodates various datasets, allowing the creation of distinct models for each dataset and enabling comparison using different evaluation metrics.

  7. 7.

    Efficiency: The proposed framework ensures the rapid execution of multiple algorithms and facilitates their comparison efficiently, saving valuable time.

  8. 8.

    Empowering organizations: The proposed framework is designed to empower organizations to implement ML algorithms easily and cost-effectively, even without prior experience or internal infrastructure.

  9. 9.

    Data-driven insights: The solution enables organizations and individuals to harness the power of unsupervised ML algorithms, driving data-driven insights and enhancing decision-making processes.

  10. 10.

    Time and cost savings: By accelerating time to market, reducing development costs, and enhancing scalability, our framework delivers significant time and cost savings.

3 The proposed MLaaS framework and its components

The following section explains the seven primary layers within the suggested framework employed to provide Unsupervised MLaaS. Figure 1 shows the seven major layers and the components inside each layer. The seven major layers are namely: authentication, data exploration and visualization, data preprocessing, category and algorithm selection, model Creation and Evaluation, model extraction, and storage.

Fig. 1
figure 1

Framework layers and its components

3.1 Authentication

Authentication is a crucial element within any framework. Without authentication, anyone can access the data, which puts it at risk. The proposed framework includes an embedded authentication layer designed to safeguard data and mitigate the risk of any unauthorized data misuse. The authentication method used involves creating a username with an encrypted password. The user can obtain a username and password from the administrator. Passwords are encrypted using encryption algorithms to ensure that the user’s login credentials are secured. The user’s data are protected between the front-end and back-end layers of the framework using JSON Web Tokens (JWT) [18]. JWTs can use various encryption algorithms to ensure the integrity and confidentiality of data. The authentication layer uses JWTs signed with a secret public/private key pair using RSA (Rivest–Shamir–Adleman) [19]. The duration of the JWT token is set to expire after a specified time or when the user logs out of the system. By using the username and password, the user can access the proposed framework and its screens while preserving data confidentiality.

3.2 Data exploration and visualization

Exploring and visualizing data are an essential process in data science and machine learning. The following layer is composed of four basic components. All results for the four components are presented in the assessment report. The four components within this layer are as follows:

  • Data information: is responsible for displaying the data types and their count of the features used. The results for the component are presented within the assessment report in tabular form. The table contains three columns, which are the name of the feature, the second column contains the amount of data inside the feature, and the third column displays the data type; string, numeric, or any other type.

  • Missing values: is responsible for displaying null values within the data so that the user can identify them. The output of the component is displayed in the assessment report in the form of a table containing two columns: feature name, and count of null values within each feature. The feature name column contains the names of the features used to describe data. The second column contains the number of null values within each feature. The user can easily identify the number of null values from the output of this component to decide whether to process or delete them later.

  • Columns outliers visualization: is responsible for identifying the outliers within each feature. The output of this component is displayed by using a boxplot chart. A boxplot chart is shown for each feature individually. A boxplot diagram is a way to present data divided into five sections, which are the minimum, first quartile, median, third quartile, and maximum. Through this diagram, the user can see the outliers within each feature and the tightness of the data grouping. The diagram also gives the user insights into how the data are skewed.

  • Columns histogram visualization: displays the data within each feature in the form of a histogram chart. The histogram chart displays the frequency distribution of the data set within each feature. The diagram provides a view of the pattern of data distribution within each feature. The diagram also helps to identify the data shape and distribution to see whether it is distributed normally or not.

These components are used to explore and visualize data. By using the components provided in the layer, users can identify missing values, outliers, and relationships between existing features. Upon utilizing The following layer, users gain the ability to identify features that can be converted into different types, necessitate handling of missing values, or hold significant importance within the dataset.

3.3 Data preprocessing

Data preprocessing plays a crucial role in the process of data analysis and processing where raw, incompletely formatted data are transformed into clear and specific data. Preprocessing contributes to increasing the data accuracy, which in turn increases the efficiency of the ML models used. The following layer provides five basic components:

  1. 1.

    Data cleaning component: provides seven ways to clean the data, namely: Fill null values by mean, fill null values by median, fill null values by mode, Drop the list of null columns, Drop incomplete rows, Fill using regression model, and Drop rows that have less than N Non-NaNs (not a number).

  2. 2.

    Data outlier’s component: contains dedicated algorithms to recognize and handle outliers. The algorithms used to identify outliers are the Interquartile, Z-Score, and the standard deviation algorithm. The component provides four ways to handle outlier values, which are as follows: Mean, Median, Mode, and Quantile-based Flooring and Capping.

  3. 3.

    Categorical data component: provides four main algorithms for converting categorical data into numerical data: One Hot Encoding [20], Effect Encoding [21], Binary Encoding [22], and Label Encoding [21].

  4. 4.

    Dimensionality reduction component: The dimension reduction component describes four algorithms used for dimension reduction: Principal Component Analysis (PCA) [23], Factor Analysis [24], Isomap [25], and T-Distributed Stochastic Neighbor Embedding(t-SNE) [26].

  5. 5.

    Data scaling component: provides three basic algorithms for data scaling, namely: StandardScaler, Min-max-scaler, and MaxAbsScaler.

Using these components, data can be processed and cleaned of unwanted data, converted from categorical to numerical data, and reduced in a dimension of high-dimensional data. Additionally, Moreover, the layer helps to detect and treat anomalous data.

The results obtained from the Data Exploration and Visualization layer are instrumental in guiding the selection of the most suitable components within the Data Preprocessing layer. By analyzing the data types, missing values, outliers, and data distributions provided in the assessment report, users can make informed decisions about how to clean, preprocess, and transform their data effectively. For instance, the identification of missing values in the first layer can help determine which data cleaning method to apply, such as filling null values by mean or dropping incomplete rows. Similarly, understanding the presence of outliers can guide the choice of algorithms for outlier treatment. The insights gained from data visualization and exploration ensure that the subsequent preprocessing steps are tailored to the specific characteristics of the dataset, leading to more accurate and efficient machine learning models.

3.4 Category and algorithm selection

The central component of the suggested framework is represented by the category and algorithm selection layer, which incorporates unsupervised ML algorithms. The layer is further segregated into two distinct categories: clustering and outlier detection. The Clustering category contains three algorithms namely: k-means [27], hierarchical [28], and DBScan [29] clustering. The outlier Detection category contains three algorithms namely: local outlier factor (LOF) [30], K-nearest neighbors (KNN) [31], and Gaussian mixture model (GMM) [32]. Since the provided framework is open-source and developed using open-source machine learning libraries, developers can add new algorithms within the category and algorithm selection layer. This is in contrast to other proprietary solutions whose companies do not allow code access, modification, or addition. Users can select and apply one or more algorithms to the uploaded dataset. They can do it easily by uploading the dataset and selecting the desired algorithm from the drop-down menu containing the available algorithms within the framework. Reviewing the outcomes produced by chosen algorithms allows users to determine the most suitable algorithm for their dataset. The appropriate algorithm for the data is selected by comparing the performance metrics of each of the algorithms used. The following metrics are used to evaluate the performance of clustering algorithms: silhouette coefficient or silhouette score, Calinski–Harabasz (CH) index, and Davies–Bouldin index. The performance of anomaly detection algorithms is evaluated by constructing a receiver operating characteristic (ROC) curve and calculating the area under the ROC curve (AUC). These metrics will be explained in detail in the next section. These algorithms apply to tasks involving data segmentation or the detection of anomalies within the dataset.

3.5 Model creation and evaluation

The model creation and evaluation layer is included in the proposed framework to help users assign the algorithm parameter values, either manually or automatically. The model creation and evaluation layer includes three components for building, optimizing hyperparameters, and evaluating the ML model. The three components are:

3.5.1 The hyperparameter tuning component

Hyperparameters play a significant role in the accuracy, speed, and time used by the model. The typical practice is to establish the values of these parameters before initiating the training process, and these values can either be user-defined or determined through innovative approaches to identify their optimal values. The methods implemented for each algorithm can be explored in the following section.

The DBSCAN algorithm has three hyperparameters: epsilon, minimum samples, and distance metric. To determine the best optimal of minimum samples, the proposed component uses the method suggested by Sander et al. [33], which is minimum samples = 2 \(\times \) dimension of data. For the nearest neighbor the minimum samples equal (2 \(\times \) dimension of data − 1). To determine the optimal value of epsilon, the component uses the K-Nearest Neighbors-based approach, as explained by Sander et al. [33] and Schubert et al. [34]. It involves selecting n neighbors to be equal to 2\(\times \)(N−1). To find the distances of the K-nearest neighbors for each point in the dataset, these distances should be sorted and plotted to find the “elbow” value.

Two critical parameters determine the performance of the k-means algorithm: the number of clusters k, and the initialization method for the central points of each cluster, represented by the initialization parameters [35]. Two methods are employed: the elbow method and the silhouette score [36, 37]. Additionally, the initialization method “k-means++” is used to select initial cluster centroids [38].

Hierarchical clustering depends on two hyperparameters. The first hyperparameter is the linkage criterion, which determines the distance between clusters. The second hyperparameter is the affinity criterion, which determines how to measure the distance among clusters. The proposed component provides a dendrogram plot for the identification of the number of clusters k. It also provides four linkage criteria to measure the distances among clusters: ward, complete, average, and single. The default linkage criterion is a ward, which minimizes the variance within each cluster. Additionally, the proposed component provides three distance metrics to measure the distance among clusters: Euclidean, Manhattan, and cosine [39].

To use LOF, two hyperparameters need to be determined: The number of neighbors k and the percentage of observations that will be classified as anomalous observations (c). A better approach for determining k and c was proposed by Zekun et al [40].

The k-nearest neighbors (KNN) algorithm relies on the number of nearest neighbors, k. The proposed component automatically determines the value of k based on the general rule that k is equal to the square root of the number of observations in the dataset, \(k=\sqrt{N}\) [41]. In addition, the proposed component determines the threshold value for anomaly detection.

GMM algorithm depends upon two hyperparameters; the number of clusters K, and the threshold value for identifying anomalies. The proposed component creates multiple GMM models using different numbers of clusters and plots Silhouette scores to help determine the optimal number of clusters. Silhouette scores range from -1 to 1 and measure how well each data observation fits its assigned cluster, with higher scores indicating better cluster quality [42]. In addition, the proposed component determines the threshold value for anomaly detection.

3.5.2 The fitting model predict component

The fitting model predict component is responsible for optimizing the hyperparameters of the selected machine learning algorithm to reduce the model error and improve accuracy. The proposed component may use techniques such as cross-validation or grid search to fine-tune the hyperparameters and obtain the best possible model performance. After optimizing the hyperparameters, the component uses the trained model to predict the values of the training and new data.

3.5.3 The model evaluation component

several metrics for evaluating clustering models and outliers’ models. To evaluate an ML model the following metrics may be used:

  1. 1.

    Silhouette coefficient or silhouette score: is a metric used to measure the quality of clustering algorithms. Its value ranges from − 1 to 1. The more the scoring approach is one, the more the clusters are far apart and distinct. While a score of 0 indicates that the distance among the clusters is not significant. When the scoring approach is − 1 implies that the cluster is not correctly assigned. The formula for the silhouette score is

    $$\begin{aligned} \mathrm{Silhouette\,score} = \frac{b-a}{\textrm{max}(a,b)} \end{aligned}$$
    (1)

    where b is the mean within-cluster distance, and a is the mean distance between a point and all other points in the same cluster [43].

  2. 2.

    Calinski–Harabasz (CH) index: (the variance ratio criterion): is defined as the sum of inter-cluster dispersion and intra-cluster dispersion for all clusters. The sum of squared distances represents the dispersion [44]. The formula for the Calinski–Harabasz index is:

    $$\begin{aligned} \textrm{CH}=\frac{\frac{\textrm{BGSS}}{\textrm{WGSS}} \times (N-K)}{K-1} \end{aligned}$$
    (2)

    where BGSS is between-group dispersion, and WGSS is within-group dispersion. N is the total number of observations, and K is the total number of clusters in the dataset. The higher the CH value, the better the clustering will be. Also, the observations within clusters are closer to each other, and clusters are well separated.

  3. 3.

    Davies–Bouldin index: is a clustering performance measure that quantifies the average similarities among clusters. It evaluates the similarities by comparing the size of each cluster to the distance among clusters. A lower value of the index indicates better clustering performance. The index is easier to calculate than silhouette scores. To calculate the Davies–Bouldin index, the average similarity between each cluster \(C_{i}\) for \(i=1, \cdots ,k\) and its closest cluster \(C_{j}\) is calculated. The similarity measure \(R_{ij}\) is defined as a trade-off between the dispersion of clusters i and j (\(S_{i}\) and \(S_{j}\), respectively) and the distance \(M_{ij}\) among their centroids [45]. The distance \(R_{ij}\) among clusters i and j is defined as follows:

    $$\begin{aligned} R_{ij}=\frac{s_{i}+s_{j}}{M_{ij} } \end{aligned}$$
    (3)

    The Davies–Bouldin index is then defined as follows:

    $$\begin{aligned} \textrm{DB}=\frac{1}{k} \sum \limits _{i=1}^{nk} \textrm{max}_{i\ne j} R_{ij} \end{aligned}$$
    (4)

    The lower the index value, the better the clustering performance, and the optimal value is zero.

  4. 4.

    Anomaly detection evaluation: To evaluate the performance of anomaly detection algorithms, labeled data will mostly be needed. If enough labeled data are available, we can use a common evaluation strategy based on ranking the results according to their anomaly scores may be used, followed by applying an iterative threshold. This approach results in several true positive and false positive rate values, which can be used to construct a receiver operator characteristic (ROC) curve. The area under the ROC curve (AUC) can then be used to measure the performance and compare different algorithms [31]. However, if labeled data are insufficient or non-existent, we can use other measures. Goix proposed two criteria based on Excess-Mass (EM) and Mass–Volume (MV) curves, which do not require labeled data. However, these criteria are not suitable for high-dimensional datasets. Thus, Goix extended them to work in a highly dimensional space [46].

If the results are not satisfactory, the user can retrain the model, and they’re also allowed to adjust the model manually if necessary.

3.6 Model extraction

The model extraction layer is the penultimate layer of the proposed framework. It enables users to export the model and final data with the output target. The exported model can be embedded in any other program or reused later. In the current layer, the user obtains the data and the final result of the prediction, allowing them to examine the final target of the used model.

3.7 Storage

The storage layer was designed for the proposed framework, and is tasked with managing the storage of data essential to the system. This includes handling user information, storing machine learning models generated from training algorithms for future utilization, and enabling users to access and manage these models at their convenience. Additionally, the layer facilitates the storage of training data along with its corresponding prediction results, allowing for further analysis or future reference. The implementation of the framework has been successfully executed and applied across different scenarios, such as customer segmentation, anomaly detection, and fraud prevention.

4 Implementation of proposed framework

The implemented solution has been put into action across diverse applications such as customer segmentation, anomaly detection, and fraud prevention. The main aim of the implementation is to gauge the efficacy of the proposed solution and examine how well the framework operates in efficiently carrying out tasks within real-world business settings.

4.1 Datasets for benchmarking

The following section furnishes an elaborate overview of the datasets employed to assess and compare customer segmentation and outlier detection algorithms. Each of these aspects utilizes three distinct datasets for evaluation. The datasets have been divided into two parts: customer segmentation datasets and outlier detection datasets.

4.2 Customer segmentation datasets

Customer segmentation is the process of dividing customers into distinct groups based on their similar characteristics. This allows companies to target each group with customized products and services, making it easier to create effective and targeted marketing campaigns. The following metrics have been used to evaluate the performance of clustering algorithms used on customer segmentation datasets, as described in detail in section 3.5.3: Silhouette Score, CH Index, and Davies–Bouldin Index. The proposed framework has been applied and evaluated on three different types of datasets, which are presented with their evaluations below.

  1. 1.

    The customer mall dataset: contains virtual customer data from a shopping mall. The purpose is to segment customers into different groups based on customer ID, age, gender, annual income, and spending score. The dataset consists of 5 variables and 200 observations [47].The preprocessing steps ensure the data is clean, consistent, and suitable for analysis. The processed steps were done as follows:

    1. (a)

      Drop the customer_ID variable.

    2. (b)

      LabelEncoder on Genre Column.

  2. 2.

    The FLO dataset: contains shopping behavior data from customers who made purchases both online and offline from a shoe store called FLO between 2020 and 2021. The store wants to segment its customers to define shopping strategies for each segment. The dataset consists of 12 variables and 19,945 observations, including variables such as master_id, order_channel, last_order_date_online, and interested_in_categories_12 [48]. The preprocessing steps ensure the data is clean, consistent, and suitable for analysis. The steps were carried out as follows:

    1. (a)

      Dropped the master_id column.

    2. (b)

      Handled date columns (first_order_date, last_order_date, last_order_date_online, last_order_date_offline).

    3. (c)

      Applied OneHotEncoding to the columns order_channel and last_order_channel.

    4. (d)

      Used StandardScaler to scale the data.

  3. 3.

    The customer credit card dataset: contains information on approximately 8,950 active credit card holders and their behavior while using credit cards. The purpose is to segment customers into groups based on their credit card usage behavior to determine their shopping strategies. The dataset includes 18 variables such as balance, purchases, cash advance, credit limit, and tenure [49]. The preprocessing steps ensure the data is clean, consistent, and suitable for analysis. The processed steps were done as follows:

    1. (a)

      Drop the cust_ID variable.

    2. (b)

      Handle outliers in the cash_advance and installments_purchases columns.

    3. (c)

      Drop missing values in the credit_limit column.

    4. (d)

      Handle null values in the minimum_payments column by median imputation.

    5. (e)

      Scale data using StandardScaler.

4.3 Outlier detection datasets

Outlier detection is a machine learning process that aims to identify data points or events that deviate from the expected behavior of most of the data. The proposed framework has been applied and evaluated on three different types of datasets to identify the existing outliers. Since most of the unsupervised ML datasets lacked labels, the area under the ROC curve (AUC) score has been relied upon to evaluate outlier detection algorithms. To simulate outliers, the classification datasets have been modified by randomly selecting a small sample from one or more classes other than the normal class, as described in [31]. It is a fact that anomalies are rare and differ from the norm. The three datasets are presented hereafter:

  1. 1.

    The breast cancer Wisconsin (diagnostic) dataset: consists of features calculated from a digital image of a fine needle aspirate (FNA) of a breast mass that describes the traits of the visible cell nuclei in the image. The dataset’s goal is to distinguish between cancer patients and healthy patients. The dataset consists of 31 features and 569 observations [50]. A subset of 357 benign cases and the first 10 malignant observations as anomaly cases have been kept, as in [31, 51]. The feature of distinguishing between benign and malignant disease was removed, so the dataset consisted of 30 features and 367 observations, and the percentage of outlier cases was 2.72%. The preprocessing steps ensure the data is clean, consistent, and suitable for analysis. The processed steps were done as follows:

    1. (a)

      Drop columns Unnamed:32, id, and Diagnosis

    2. (b)

      Scale data using StandardScaler.

  2. 2.

    The credit card fraud detection dataset: includes credit card transactions performed by European cardholders in September 2013. The dataset aims to identify fraudulent credit card transactions to avoid charging clients for goods they did not buy. The dataset consists of 30 features obtained from principal components obtained with PCA and 284,807 observations [52]. A subset of 25,000 genuine transactions and 492 fraud transactions as anomaly cases have been kept. The feature of distinguishing between fraud and genuine transactions was removed, so the dataset consisted of 29 features and 25,492 observations, and the percentage of outlier cases was 1.93%.The preprocessing steps ensure the data is clean, consistent, and suitable for analysis. The processed steps were done as follows:

    1. (a)

      Drop columns Time, Class

    2. (b)

      Scale data using StandardScaler.

  3. 3.

    The pen-based recognition of handwritten digits dataset: includes the handwritten digits \( 0-9 \) of 44 different writers, and the dataset’s goal is to identify digits from 0 to 9. The dataset consists of 17 features and 10,991 observations [53]. A subset of the digit eight only as the normal class and sampled another 9 digits from all of the other classes have been kept as anomaly cases, as in [31]. The feature of distinguishing between digits was removed, so the dataset consisted of 16 features and 1,064 observations, and the percentage of outlier cases was 0.84%. No preprocessing steps are necessary for the data as it is clean, consistent, and suitable for analysis.

4.4 Datasets summary

Table 1 provides a summary of the characteristics of all datasets used in the study. The datasets were chosen from diverse fields, including customer segmentation in mall, credit card, and retail store domains, as well as outlier detection in medical, bank card fraud, and handwriting recognition applications. Moreover, the datasets exhibit variations in size, outlier percentages, and dimensions, providing a comprehensive evaluation of the proposed framework.

Table 1 Datasets used for comparative evaluation from different application domains

5 Results analysis and evaluation

The following section provides an overview of the results obtained from the proposed framework, which includes the hyperparameter values and evaluation metrics used for each algorithm. In the following subsections, the algorithms used for each dataset are compared. The also result analysis will be represented. The results are categorized into two parts: customer segmentation algorithms and outlier detection algorithms.

5.1 Customer segmentation results analysis

The customer segmentation results analysis involved applying the proposed framework to the respective datasets, generating distinct models for each. Table 2 summarizes the optimal hyperparameter values and corresponding clustering evaluation results for each model.

Table 2 Datasets used for comparative evaluation from different application domains

Based on Table 2, The results obtained for each dataset have been analyzed as follows:

Mall customer dataset: The clustering evaluations in Table 2 show that the best algorithms for the dataset, ranked by Silhouette score in descending order, are K-means, Hierarchical, and DBSCAN. For the Calinski–Harabasz and Davies–Bouldin scores, the ranking is the same. Figure 2 displays the clustering evaluation summary for each model, where K-means and hierarchical clustering models outperformed DBSCAN. Due to the low density among observations in the dataset, DBSCAN was not effective. Based on the evaluation result, the optimal number of clusters for mall customer dataset has been found to be five.

Fig. 2
figure 2

Summary result of clustering evaluation on the mall customer dataset

FLO customer dataset: For the FLO Customer Dataset, the clustering evaluations in Table 2 reveal that the best algorithms for the dataset are Hierarchical, K-means, and DBSCAN, ranked in descending order by Silhouette score. For the Calinski–Harabasz and Davies–Bouldin scores, the algorithms were found to be in order (descending) as follows K-means, Hierarchical, and DBSCAN. K-means and hierarchical clustering algorithms performed well for the used dataset, with all observations properly separated by efficient algorithms. However, due to the low density among observations in the dataset, DBSCAN did not perform effectively. Figure 3 presents a summary of the clustering evaluation results for each algorithm. Based on the evaluation results, the optimal number of clusters for FLO customer dataset has been found to be four.

Fig. 3
figure 3

Summary result of clustering evaluation on the FLO customer dataset

Credit card customer dataset: The credit card customer dataset’s clustering evaluations in Table 2 show that the best algorithms for the dataset are DBSCAN, K-means, and Hierarchical, based on the Silhouette score. The Calinski–Harabasz score ranking is K-means, Hierarchical, and DBSCAN; while, the Davies–Bouldin score ranking is K-means, DBSCAN, and Hierarchical. Figure 4 presents the clustering evaluation results for each algorithm. K-means produced the best results, with all points accurately separated, while Hierarchical and DBSCAN had an average performance. Based on the evaluation, the optimal number of clusters for credit card dataset has been found to be three.

Fig. 4
figure 4

Summary result of clustering evaluation on the credit card customer dataset

5.2 Outlier detection results analysis

The proposed framework has been applied to Outlier Detection datasets, generating different models for each dataset. Table 3 summarizes the hyperparameters used for each model, including the number of neighbors, the number of components, and the threshold value, followed by the AUC score.

Table 3 Hyperparameters used and AUC score for each algorithm

Subsequently, the AUC score is presented. Analyzing the results for each dataset based on the specified hyperparameters and AUC scores reveals the following insights.

The breast cancer dataset: For the breast cancer dataset, LOF achieved the highest AUC score, indicating that it is the most efficient algorithm for outlier detection. KNN is also efficient in detecting almost all outliers; while, GMM is less efficient; it failed to detect some outliers. Figure 5 provides a summary of the AUC score values for each model.

Fig. 5
figure 5

AUC scores values for the outliers detection algorithms applied to the breast cancer dataset

Credit card dataset: For the credit card dataset, LOF and KNN are the most efficient performing algorithms based on the AUC score values, followed by GMM, as shown in Table 3. Figure 6 presents a summary of the AUC scores for each algorithm. The LOF and KNN models performed well, achieving an AUC score of 90%; while, GMM conveniently performed with an AUC score of 88%.

Fig. 6
figure 6

AUC values for the outliers detection algorithms on the credit card dataset

Pen-based recognition of handwritten digits dataset: By examining the AUC scores presented in Table 3, user can conclude that all algorithms performed well in terms of the AUC score values. Figure 7 provides an overview of the AUC score values for each algorithm. The LOF, KNN, and GMM algorithms all performed well, achieving AUC scores from 93% to 99% (Fig. 8).

Fig. 7
figure 7

AUC values for the outliers detection algorithms on the pen-based dataset

5.3 Computation time of the proposed framework

The computation time of the proposed framework has been evaluated by calculating the running time for each algorithm, as presented in Tables 4 and 5 for customer segmentation and outlier detection algorithms, respectively. The recorded times in seconds for all datasets were shown in these tables. Figure 8 summarizes the computation times of ML models for each customer segmentation dataset; while, Fig. 9 shows the computation times for each outlier detection dataset.

Table 4 Computation time of the different clustering algorithms
Table 5 Computation time of the different outlier detection algorithms
Fig. 8
figure 8

Result of ML models’ time computation values for every customer segmentation dataset

Fig. 9
figure 9

Result of ML models’ time computation values for every outlier detection dataset

The measurements were done on a single device with 8GB RAM, Core i5, and 450 SSD Hard Disk, using Scikit-learn library version 1.2.2 and a single thread for all algorithms. It is observed that the clustering-based algorithms have computation time depending on the number of clusters. Overall, the proposed framework shows good performance in terms of the computation time.

From the previous two tables, it can be observed that the computation time for small datasets is very fast, as it does not exceed 5 s in the worst case. For datasets with sizes up to 25,000, the computation time does not exceed 40 s. It is noteworthy that the proposed framework is highly efficient in terms of time and effort. It enables the execution of multiple algorithms and facilitates their comparison within a few minutes.

6 Conclusion and future work

In conclusion, as the volume of data continues to grow at an extraordinary rate, organizations face an urgent imperative to harness the power of artificial intelligence-based solutions for efficient data analysis. To address the aforementioned demand, the paper proposed a solution built upon a newly proposed open-source unsupervised MLaaS framework that is cost-effective, user-friendly, dependable, adaptable, and scalable. The proposed solution excels in data exploration and processing, empowering users to visualize, analyze, process, and extract valuable insights from diverse datasets. Furthermore, the proposed solution boasts the remarkable capability of fine-tuning algorithm hyperparameters, whether with or without user intervention and seamlessly integrates with a variety of systems. It accommodates different datasets by creating distinct AI models for each dataset, thus facilitating easy comparisons. To validate the effectiveness of this concept, the proposed solution has been applied to six distinct datasets, comprising half of which are related to customer segmentation and the other half related to anomaly detection. The results, subjected to meticulous analysis, demonstrate the framework’s effectiveness across datasets of varying sizes. The proposed framework demonstrated strong performance in both customer segmentation and outlier detection across multiple datasets. For customer segmentation, K-means, Hierarchical, and DBSCAN algorithms were evaluated. The Mall Customer dataset results showed K-means and Hierarchical outperforming DBSCAN due to low-density observations. The optimal number of clusters for this dataset was five. For the FLO Customer dataset, Hierarchical and K-means performed well, with an optimal cluster count of four; while, DBSCAN was less effective. In the credit card customer dataset, K-means produced the best results with three clusters; while, DBSCAN and Hierarchical had moderate performance. Outlier detection results indicated that LOF was the most effective algorithm for the Breast Cancer dataset, achieving the highest AUC score. KNN was also efficient, while GMM lagged. In the credit card dataset, LOF and KNN outperformed GMM, achieving high AUC scores. For the pen-based recognition of handwritten digits dataset, all algorithms performed well, with AUC scores ranging from 93% to 99%. The framework’s computation time was evaluated, showing efficient performance even on larger datasets. For datasets up to 25,000 samples, the computation time did not exceed 40 s. The framework proved to be highly efficient in terms of time and effort, enabling the execution and comparison of multiple algorithms within minutes. This efficiency translates into valuable time and effort savings for businesses and organizations. The versatile framework caters to the needs of businesses and organizations, addressing challenges related to clustering, anomaly detection, and data-driven insights. Regarding future work, the proposed framework’s flexibility extends to the incorporation of various machine learning algorithm types, including supervised and reinforcement learning algorithms. It will also provide a detailed evaluation of the framework’s capacity to manage large datasets.