1 Introduction

Artificial Intelligence (AI) is defined as a field centering on building systems that can implement duties that demand human intelligence normally. Machine learning (ML) is asubdivision of AI focusing on exploring and mining interesting patterns in datasets and making decisions without explicit coding [1]. AI and ML are considered by many researchers the new electricity and an important building block of the fourth industrial revolution [2, 3]. Saudi Arabia is amongst the first nations worldwide to realize this importance and apply it in its school systems at various levels in public schools and universities. This interest motivates researchers to create Arabic datasets for research and educational purposes, particularly with the popularity of English sources and the absence of Arabic sources. This study attempts to lessen this gap by creating a dataset for the sixth issue of the Saudi Arabian currency and analyzing it using Orange Data Mining, Google Teachable Machine, and Liner.ai, which require no coding since no previous work has collected such a dataset.

To highlight the dire need for Arabic datasets suitable for image classification, a quick look at famous dataset websites like “huggingface” shows that it hosts, at the time of writing this work, 430 image classification datasets and only 8 Arabic datasets. The same trend exists in another website “paperswithcode” which hosts 170 datasets for image classification, 3 out of which are Arabic. The author taught the Tiny Machine Learning course various times at the Department of Computer and Network Engineering, College of Computing, Umm Al-Qura University. He found out that using datasets from the students' environment is an important aspect that facilitates the course for the students while motivating them to use machine learning to solve various community problems in innovative ways that have helped them in their academic endeavors and after graduation in real-life applications and challenges.

Among the things that many people use daily worldwide is the currency. Thus, this paper focuses on creating a dataset for all banknote denominations of the sixth issue of the Saudi Arabian currency as the first work ever to do so. The sixth issue was announced during the reign of the Custodian of the Two Holy Mosques, King Salman bin Abdulaziz Al Saud, on 27/3/1438 AH (26/12/2016). The following five cotton-based banknote denominations were issued initially: 500 SR, 100 SR, 50 SR, 10 SR, and 5 SR. A new denomination was introduced on 11/2/1442 AH (28/9/2020), which is the 5 SR polymer-based banknote. The last addition to the denominations was the 200 SR which was introduced on 20/5/1442 AH (4/1/2021) in commemoration of the Kingdom's ambitious Vision 2030. The seven denominations above are the major banknotes of the sixth issue of the Saudi Arabian currency until now [4].

The importance of the study is as follows:

  1. 1-

    The first work ever to establish a balanced dataset for the sixth issue of the Saudi Arabian currency banknote denominations.

  2. 2-

    The dataset contains images for the banknote denominations (images dataset) and the tabular data generated using deep learning (tabular dataset).

  3. 3-

    It is the first scientific work that uses shallow machine learning and deep learning models to create good-performing models for classifying the sixth issue of the Saudi Arabian currency without coding, which enables researchers and those interested in various fields to develop machine learning applications to classify the sixth issue of the Saudi Arabian currency, especially in mobile phones or in microcontrollers, to inspire Internet of Things (IoT) and Tiny machine learning (Tiny ML) applications like currency identification, automatic counting and sorting of banknotes, and fake currency detection.

Since the main contribution of this paper is the creation of AlFloos dataset which has never been collected before and as a common practice in the community for similar papers whose main contribution is the creation of a dataset, various models and evaluations must be taken for AlFloos dataset [5,6,7,8,9,10]. Hence, this study attempts to answer the following research questions:

  1. 1-

    What is the best shallow machine learning model for classifying the sixth issue of the Saudi Arabian currency?

  2. 2-

    Which image embedder is better for extracting the features of the sixth issue of the Saudi Arabian currency: SqueezeNet or Inception v3?

  3. 3-

    Which deep learning platform is better in classifying the sixth issue of the Saudi Arabian currency: Google Teachable Machine or Liner.ai?

The rest of the paper is organized as follows. The second section is the related work section which highlights all available datasets for any issue of the Saudi Arabian currency. Then, the third section explains the collection process of AlFloos dataset. Then, the fourth section discusses the preprocessing, processing, and postprocessing of AlFloos dataset. Finally, the paper is concluded, and potential future work is presented.

2 Related work

This paper focuses on the sixth issue of the Saudi Arabian banknote datasets. Many authors have tried to study and collect datasets for the Saudi Arabian currency previously. Hence, all previous work and datasets related to the Saudi Arabian currency will be mentioned here to highlight the importance of AlFloos dataset, the gap it fills in the community, and the uniqueness of AlFloos dataset.

The author in [11] created an app and a dataset to recognize three denominations of the old fifth issue of the Saudi Arabian currency. However, the number of images in the dataset was not disclosed. The authors in [12] have proposed a currency identification and detection system for counterfeit currency using deep learning. Among the currencies they have gathered from the internet was the Saudi Arabian currency. However, they do not mention whether they have used the sixth issue or if all the Saudi Arabian currency denominations have been used. The author in [13] scanned 110 banknotes representing 8 denominations of the fourth issue of the Saudi Arabian currency. The authors in [14] demonstrated a paper currency recognition and classification for the old fifth issue of the Saudi Arabian banknote by collecting 100 samples of the following denominations: 1, 5, 10, and 50 Saudi Arabian Riyals. The authors in [15] presented a banknote recognition software for various currencies, including the Saudi Arabian currency. However, they have only pictured or scanned the obsolete fourth Saudi Arabian issue for two of its denominations: the old 1 SR and 5 SR.

As far as the author knows, the authors in [16] have collected the largest publicly available Saudi Arabian Currency Dataset. They collected 2000 labeled photos using a smartphone camera, including 1, 5, 10, 20, 50, 100, 200, and 500 Saudi Arabian Riyal Banknotes. The dataset mixes the outdated fifth and the current sixth issues of the Saudi Arabian currency. Also, it does not contain the new polymer issue of the 5 SR denomination. Furthermore, many researchers have studied the Saudi Arabian currency. Unfortunately, they did not provide information about the currency issue and/or the studied denominations as presented in the following studies: [17,18,19,20,21,22].

3 Dataset collection

In the third semester of 1444 AH (in 2023), the author taught a TinyML elective course at the Department of Computer and Network Engineering, College of Computing, Umm Al-Qura University. For a bonus of 5 grades, the students were asked to collect a dataset for the sixth issue of the Saudi Arabian currency banknote. The students were given about a week to complete the following instructions:

  1. 1.

    Taking photos (images) for the sixth issue of the Saudi Arabian banknote covering all of its seven denominations "classes" (5 SR, 5 SR (Polymer), 10 SR, 50 SR, 100 SR, 200 SR, and 500 SR).

  2. 2.

    The extension of the photos must be ".jpg" or ".jpeg".

  3. 3.

    Each denomination must be pictured 10 times: 4 for the front and the back of the banknote with no noise or distractions behind or around the photo. The remaining six should capture the banknote's front and back while folded once and twice. The total number of photos must be 70.

  4. 4.

    Make sure to have a folder for each denomination. Zip the folders and name the zipped folder using your full name and student ID; then, upload it to Blackboard.

  5. 5.

    Look at similar datasets for more information and insights: Turkish Lira Dataset [23], Indian Currency Dataset [24], and Bangladeshi Banknote Dataset [25].

The class had 27 students, 20 participating in the dataset collection. One attempt was removed due to missing required values (images). The total number of images collected in the dataset = 190 images × 7 denominations = 1330 images. The author is unaware of any dataset for the sixth issue of the Saudi Arabian currency bigger than the proposed dataset in this work as mentioned earlier in the related work section. Table 1 shows the 19 participants and the mobile phone used to take the photos for the collection process.

Table 1 Participants and their devices used for banknote collection

The dataset along with all workflows and code used in this paper is available at the following link: https://www.kaggle.com/datasets/gfbati/alfloos for educational purposes and research reproducibility.

4 Results and discussion

According to [26, 27], any data mining task could be viewed by the machine learning community as a three-task process: data preprocessing, data processing (predictive analysis or classification in this work), and data postprocessing as can be seen in Fig. 1.

Fig. 1
figure 1

Data mining tasks as viewed by the ML community [26]

The models in the following subsections have been created using Orange Data Mining 3.36 software [28], Google Teachable Machine website [29], and Liner.ai software [30].

4.1 Preprocessing of AlFloos Dataset

An important preprocessing task here is how features can be extracted from the banknote images. To perform the extraction, Orange Data Mining 3.36 software was used [28]. It is software that allows the user to create various machine-learning models using widgets and requires no coding for most of its functions. One of the useful widgets used to transform images to tabular data by extracting the image features using deep learning is the “image embedding” widget. Orange has various embedders. In this work, two small and easy embedders are used: SqueezeNet and Inception v3. The difference between the two embedders is as follows. They both are trained on ImageNet; however, SqueezeNet works from the user’s device, whereas Inception v3 works from a remote server. Furthermore, SqueezeNet transforms any image to a vector of 1000 features or attributes along with five meta-features that represent: the image’s file name, its folder, its size, its width, and its length, while Inception v3 generates 2046 features along with the same 5 meta features mentioned previously [31]. Since the dataset contains 190 images × 7 denominations = 1330 images, the generated CSV files from the embedders contain 1330 rows. Figure 2 shows a snippet for the CSV file generated by SqueezeNet embedder. The CSV file of Inception v3 is quite similar, yet with more features (larger).

Fig. 2
figure 2

A snippet of the SqueezeNet tabular data for AlFloos dataset

4.2 Processing and postprocessing of AlFloos tabular dataset

The processing task here is to create machine learning models capable of classifying the various denominations of the sixth issue of the Saudi Arabian currency. Since the dataset contains tabular data created by the embedders and the actual images of the Saudi Arabian currency, two types of models will be created: the first ones are concerned with the tabular data and the second ones are related to the images.

Orange data mining 3.36 software [28] is used in this predictive task. To train and test the models, tenfold cross-validation sampling was used for both CSV files of SqueezeNet and Inception v3 respectively. The models (algorithms) used for modeling are Logistic Regression “Log. Reg.”, Support Vector Machines “SVM”, K Nearest Neighbors “kNN”, Naïve Bayes, Random Forest, Adaptive Boost “AdaBoost”, and Constant (ZeroR or Majority). They all have been used with their default settings as normally appears in Orange. Also, these algorithms are used because they are very famous and available in various machine learning platforms to facilitate educational purposes, research reproducibility of the results, and pave the way for future comparative studies.

It is worth mentioning that the model “Constant” always classifies all the instances (money denominations here) into the majority class as a baseline to facilitate interpreting the results. Since the dataset is a balanced one, i.e. it has the same number of images for all of its 7 classes (denominations), any class (denomination) can be the majority class. Also, AUC stands for (Area Under the Receiver Operating Characteristic Curve) and CA means (Classification Accuracy), while, F1 is the harmonic mean between precision and recall [32]. It is desired to get results closer to 1 (100%) for all of the previous machine learning evaluation metrics used in this work “AUC, CA, and F1”; the higher the better.

Table 2 shows the classification results (models’ evaluation or postprocessing task) for all 6 models when trained and tested using the SqueezeNet-generated tabular data. Table 3 shows the classification results for all 6 models when trained and tested using the Inception v3 generated tabular data.

Table 2 Classification results for AlFloos dataset (SqueezeNet)
Table 3 Classification results for AlFloos dataset (Inception v3)

Tables 2 and 3 show clearly that the best model is “Logistic Regression” which scored the highest in all evaluation metrics: AUC, CA, and F1. In “SqueezeNet”, the AUC = 97.6%, CA = 83.8%, and F1 = 83.9%. The increase is abundant when compared to the baseline model “Constant”. It is a 95.2% increase in AUC, 486% in CA, and 2230% in F1. In Inception v3, the AUC = 97.4%, CA = 82.1%, and F1 = 82.1%. The increase is also abundant when compared to the baseline model “Constant”. It is a 97.4% increase in AUC, 474% in CA, and 2180% in F1. Also, “Logistic Regression” and “SVM” outperform the rest of the models and the scores of the tabular data generated by “SqueezeNet” outperform the ones generated by “Inception v3” for “Logistic Regression” and “SVM”.

Since the “SqueezeNet” dataset has 1000 vectors (features) and the “Inception v3” has 2046 vectors and to determine which embedder gives better modeling of the Saudi Arabian currency, various experiments have been done using the same number of vectors in every experiment. To accomplish these experiments, the “rank” widget in Orange was used which always ranks the best subset of features based on the “information gain” algorithm. The same sampling method was used, which is tenfold cross-validation for training and testing. The number of vectors was 10, 100, 500, and 1000 respectively. We can noticeably see from Table 4 that the tabular data of “SqueezeNet” allow models to have slightly better scores in AUC, CA, and F1 with very few exceptions bolded in the table. Also, “SVM” outperforms “Logistic Regression” in most cases; however, when the number of vectors increases and passes 50% of the features, “Logistic Regression” tends to outperform “SVM”.

Table 4 Classification results for AlFloos dataset (SqueezeNet) vs. (Inception v3)

4.3 Processing and postprocessing of AlFloos images dataset

Another famous way of image processing instead of transforming them into tabular data as done in the previous subsection is to process them directly using various tools and platforms like Google Teachable Machine and Liner.ai as going to be presented in the following subsections.


(1) Google Teachable Machine


Google Teachable Machine is a free tool that runs from the browser and it is capable of creating machine learning models with no code using deep learning to classify images, sounds, and body poses [29]. Since the dataset has 190 images for each denomination out of 7 classes, 7 classes must be created in Google Teachable Machine. Then, the images of each denomination must be uploaded to its designated class. Teachable Machine splits any dataset of images as follows by default: 85% of the images for training (161 images in the dataset here) and 15% of the images for testing (29 images). Figure 3 shows the modeling in Google Teachable Machine, the training settings, and the accuracy of the created model. It is clear that the accuracy scores are perfect for almost all denominations except for 5 SR (cotton-based) and 5 SR (polymer-based) due to the identical design for both of them and the difficulty of distinguishing them.

Fig. 3
figure 3

A screenshot of modeling AlFloos images dataset using Google Teachable Machine

For future work and as homework for teaching purposes, the model can be exported to a mobile phone or a microcontroller by downloading this file:

https://drive.google.com/file/d/1fW0eYadG5vCg7x2-0m2l0_tJPTIji0hq/view?usp=share_link. Also, as a class activity and for the model to be used directly from the browser, this link can be used: https://teachablemachine.withgoogle.com/models/biB75bXGA.


(2) Liner.ai


Liner.ai is also free software that runs from the user’s device to deploy machine learning models using deep learning with no code. It deals with various forms of data like images, text, audio, video, etc.… It gives more options than Google Teachable Machine in terms of data and models [30]. For image classification tasks, it has the following models: “EfficientNet”, “MobileNet”, and “ResNet 50”. Figure 4 shows the results of the modeling of AlFloos images dataset using Liner.ai and “EfficientNet” using the default settings as usually appear in the software by default. It is apparent that the model here gives better accuracy than the model of Google Teachable Machine. Also, this model can be enhanced more if the settings of the model are tweaked, or a better model is used like “MobileNet” or “ResNet 50”. As a future work, the model can be exported to a Python application, JavaScript web application, TensorFlow, TensorFlow Lite, TensorFlow.js, ONNX, or Keras which allows many researchers from various backgrounds and skills to enhance this work. The files of the model can be downloaded from here: https://www.kaggle.com/datasets/gfbati/alfloos.

Fig. 4
figure 4

A screenshot of modeling AlFloos images dataset using Liner.ai

5 Conclusion and future work

Despite the spread of Artificial Intelligence and Machine Learning educational sources, there is a clear lack in the Arabic ones. This work tries to bridge the gap by creating AlFloos dataset, a dataset for the sixth issue of the Saudi Arabian currency banknote. This dataset is unique because no previous work published a complete dataset for the banknotes of the sixth issue of the Saudi Arabian currency. Also, no previous work has modeled and analyzed the sixth issue of the Saudi Arabian Currency. The main findings of this research are as follows: 1- “SVM” and “Logistic Regression” are the best shallow machine learning algorithms for classifying the sixth issue of the Saudi Arabian currency. 2- “SqueezeNet” when used to embed the images of the sixth issue of the Saudi Arabian currency and transform them to a tabular dataset gives slightly better scores than “Inception v3”. 3- the image classification models in “Liner.ai” outperform the ones in “Google Teachable Machine”.

An immediate future work should consider creating a larger dataset for the sixth issue of the Saudi Arabian currency by combining AlFloos dataset with the dataset in [16] after removing the old denominations from it. Also, Google Teachable Machine and Liner.ai can easily export the trained models in various ways and forms that enable multiple applications running on mobile phones and microcontrollers motivating several IoT and TinyML applications like currency identification for special needs people, automatic counting and sorting of banknotes, and fake currency detection, just to mention a few.