Introduction

Electronic commerce (e-commerce) has gained increasing popularity, because the Internet becomes an essential tool for companies to increase their competitive edge by collecting and analyzing customer data. The most popular sort of e-commerce is Business-to-Consumer (B2C) sales, which is fulfilled by online stores. The total e-commerce share of total retail sales has increased about 2% each year since 2015, and it is about 20% in 2021 [1]. Retail e-commerce sales increased by more than 20% worldwide in a year, from 2017 (2382 billion USD) to 2018 (2928 billion USD). It is expected that sales volume will reach 6542 billion USD in 2023. Besides, the number of online customers raised about 11% in 2 years, from 2019 (1.92 million users) to 2021 (2.14 million users) [2].

This rising market share is a significant interest for B2C companies to facilitate customers’ purchase decision. Moreover, the increase of the digital economy causes heavier competition among companies in the similar industry. On the other hand, exposing many items to customers is one of the main challenges to display appropriate products. For this reason, customers are mainly confused when deciding on a product to buy. Consequently, e-commerce companies have been forced into displaying the right products to engage online customers’ attention by supporting their selection decisions. In e-commerce trends, today’s research demands involve analyzing customers’ online shopping behaviours for their purchasing decision by B2C companies [3]. With these research outcomes, online stores can sharpen their e-commerce platforms and offer a better purchasing experience. One way of presenting practical online user experiences by a well-thought-of website is to associate products [4].

Association Rule Mining (ARM) is a technique to associate products considering their sales frequencies. ARM utilizes customers’ transaction data to investigate their demands and product choices [5]. It is powerful to detect relationships between products, which meet specific support and confidence degrees. The discovered association rules are invaluable for B2C companies to offer products that a customer is interested in [6]. Association rules are created by counting products that are purchased together and trust of product relationships, respectively, called support and confidence. These two parameters are critical for generating association rules. A very low support value may generate rules by chance and also cause to produce uninteresting rules. An association rule is regarded as interesting when the support value is greater than the minimum support threshold, and the confidence value is greater than the minimum confidence threshold [7].

If k represents the number of products that will be analyzed, k-items-dataset can produce \(2^k-1\) possible frequent items and \(R=3^k-2^{k+1}+1\) possible association rules. However, less than 20% of the generated rules are meaningful for 20% of minimum support rate and 50% of minimum confidence rate. Consequently, most of the computations become wasted [8]. Considering real-life implementations include a huge k value, studies needs a suitable algorithm to avoid this problem. One way to reduce the computational time and cost is to decrease possible frequent items. The second way is to lessen the number of combinations. Apriori algorithm overcomes the problem of a high number of data attributes. It diminishes the number of possible items to produce frequent association rules. According to the Apriori principle, an item set that is not frequent cannot be included in the combined item sets. In other words, all subsets of a frequent item set must also be frequent [7]. It is also known as support-based pruning because of this principle.

The big data era has occurred a vast amount of data. Therefore, many data between datasets overlap, and datasets’ boundaries are more ambiguous than in the past. Boolean association rules can handle Boolean data, which is not suitable for overlapping and boundary data points. On the other hand, the fuzzy association rules can generate more reliable solutions even if data overlap and are located in boundaries. If we can obtain fuzzy association rules, we can ideally generate solutions for the problem of product associations. Fuzzy Association Rule Mining (FARM) is an extension of the ARM method by fuzzy set theory. In many real-life datasets, one product may belong to more than one class [9]. For example, a T-shirt can be put under sportswear class and clothing class. Traditional methods cannot consider these multiclass cases. A fuzzy extension of the traditional ARM method can close this gap [10]. Recent studies showed how fuzzy set theory could improve in evaluating the online customers’ demand from a B2C company [6, 11,12,13].

This study aims to present the potential of using the FARM approach with Apriori algorithm for e-commerce sales. Sales volumes for each product are the primary attribute for this research. The main contribution of the study is to recommend more relevant products to online customers by considering not only items sold but also sales amounts. Previous studies such as [1, 14] applied in e-commerce focused on items sold together in a certain period. However, the amount of the sold items should be considered to produce more meaningful results by association rule mining. The results of the study support decision-makers deciding how many and which kinds of products to generate the closest attention from online customers. This research proposes a methodology extending association rule mining with fuzzy set theory for this motivation. Although there are various fuzzy-based rule mining approaches in the literature, handling sales amounts distinguishes this study from others. Moreover, this study analyzes products specifically for Muslim women, because the company sells products that meet modest women’s desires. To the best of the authors’ knowledge, the fuzzy-based rule mining methodology is applied to Islamic products for the first time.

The remaining part of the paper is as follows: “Literature review” presents related works on association rules. “Preliminaries and methodology” explains the proposed FARM methodology. “Evaluation of the proposed methodology” shows the real-world numerical application of the FARM approach. “Conclusion and limitations” concludes the study by discussing the results and giving the limitations of the proposed methodology.

Literature review

E-commerce platforms produce a significant part of a company’s income, since they provide personalized recommendations for online customers. The purchasing decisions, experiences, and motivation of customers may change their preference, purchasing process, and final decision [15,16,17,18,19,20,21]. To increase people eagerness for online shopping, recommendation mechanisms and personalized systems are broadly utilized on current e-commerce websites [17, 22,23,24,25,26]. Several data mining techniques can be formulated to examine the data in e-commerce sales and recommend related products to users. Recommendation systems are applied originally to e-commerce websites for offering products to customers.

Selecting the recommendation method plays a vital role to generate recommendations. Association rule mining can enable recommendations, and Apriori is a well-known ARM algorithm. Reference [27] used clickstream data to develop a recommender system using association rule mining. The developed methodology measures the confidence degrees between clicked products, between the products placed in the basket, and between purchased products, respectively. Reference [28] proposed a personalized recommender system employing ARM and classification. Buyer demands were obtained from text documents and then converted into some meaningful expressions. Association rules were mined from the transformed phrases using the Apriori algorithm.

The rule-based theory as another common way has been extensively implemented in association rule mining [29]. Fuzzy logic is one of the most popular rule-based methods in the literature. More importantly, some exciting types of research on fuzzy extensions of traditional ARM models were conducted to create effective association rules. Reference [30] created efficient association rules FARM. They improved the fuzzy Apriori algorithm for uncertain data, so that it produces fuzzy association rules. Reference [31] introduced an ARM approach based on fuzzy set theory and demonstrated that fuzzy extension of ARM is more effective. Reference [32] proposed a fuzzy reasoning approach and applied it to assess the created fuzzy association rules. Some recent studies developed the FARM method with machine learning techniques. Reference [31] combined fuzzy association rule mining and fuzzy c-means clustering algorithm to discover association rules. Reference [33] introduced a recommendation system using fuzzy association rules. In this study, a regression model was integrated into the recommendation system of fuzzy association rules to extend the analysis statistically. Combinatorial optimization methods can be combined with recommendation systems. Reference [34] suggested a particle swarm optimization algorithm to recommend multiple products simultaneously.

FARM has been a popular method in recommendation systems in recent years [6]. It has been extended with additional properties to benefit from advantages of fuzziness to new problems such as profile-based fuzzy association rule mining (PB-FARM) approach [35] and class-based fuzzy soft associative (CBFSA) [36]. FARM studies in the literature can be seen in various categories such as classification [36,37,38], optimization [39], and regression [33].

Reference [37] developed a classifier based on fuzzy association rule-based and applied it for imbalanced classification problems, which include additional challenges for learning methods. Heuristic approaches were combined with the FARM method in some studies. Reference [39] used a differential evolution algorithm to create important fuzzy association rules. Reference [33] proposed a fuzzy recommendation system considering association rules. They analyzed the data statistically and applied the regression model.

Studies based on association rules in the e-commerce environment have focused on various research objectives. In the e-commerce domain, customers exhibit highly changed shopping behaviors [28]. Reference [40] adopted an FP-growth algorithm to analyze orders and develop a batching algorithm for massive customer orders with small batch and high frequency. In large datasets, ARM requires a time-consuming process. Therefore, [28] improved a recommendation system that includes ARM and clustering. They applied hierarchical clustering techniques to divide a large dataset into a tree of clusters.

Consequently, it can be understood from the above literature review that the FARM method has been applied to various business problems, including overlapping and ambiguous boundaries data. Especially, in e-commerce, the FARM method has been implemented because of much data without a clear boundary. At present, the advancement of the FARM has become a popular research topic [6]. The use of association rules can yield more meaningful information that can be used in the decision-making process.

The reviewed literature indicates that no studies examine the e-commerce product associations purchased by particular customers with an effective method. This study focuses on Muslim women’s online shopping transactions with a fuzzy-based association rule mining methodology. A real-life dataset was used to verify the applied methodology.

Preliminaries and methodology

Association rules

An association rule can be expressed as \(X\rightarrow Y\), where \(X,Y \subseteq I\), \(X,Y\ne \emptyset \), and \(X\cap Y\ne \emptyset \). I refers a finite set of items and T indicates a finite set of transactions with items in I. The rule \(X\rightarrow Y\) explains that every transaction T that includes X contains Y.

The standard criteria to evaluate association rules are support and confidence parameters. Both of them are a subset of items. The support of an item set \(I_0 \subseteq I\) is the probability that a transaction T includes \(I_0\) (Eq. 1)

$$\begin{aligned} \text {supp }(I_0, T) = \frac{\left| \left\{ \tau \in T\mid I_0\subseteq \tau \right\} \right| }{T}. \end{aligned}$$
(1)

The support and confidence of an association rule \(X\rightarrow Y\) in T is given in Eqs. 2 and 3, respectively. Note that supp is used for support value for item set, and Supp refers to support value for the association rule

$$\begin{aligned}&\text {Supp }(X\rightarrow Y, T) = \text {supp } (X \cup Y) \end{aligned}$$
(2)
$$\begin{aligned}&\text {Conf }(X\rightarrow Y, T) = \frac{\text {Supp }(X\rightarrow Y, T)}{\text {supp }(X)} = \frac{\text {supp }(X \cup Y)}{\text {supp }(X)}.\nonumber \\ \end{aligned}$$
(3)

The ARM method efforts to mine to create association rules whose support and confidence values are greater than predefined thresholds called minsupp and minconf, respectively. Such rules are described as strong rules.

Fuzzy transactions and fuzzy association rules

A fuzzy transaction \({\widetilde{\tau }}\) is nonempty fuzzy subset of item set I. \({\widetilde{\tau }}(i)\) is the membership degree of i in a fuzzy transaction \({\widetilde{\tau }}\) for every \(i \in I\). A crisp transaction is a particular form of fuzzy transaction. Similarly, \(T{\text {-}}\)set is a set of crisp transactions, whereas \({FT}{\text {-}}\)set indicates a set of fuzzy transactions. Fuzzy support (\(F{\text {-}}\)supp) values of the fuzzy item sets are calculated by the minimum operator given in Eq. 4. The total amount of minimum values of data points \(t_k\) in \({\widetilde{\tau }}\) for each product (\(z_i\)) is divided into N, which refers to the total number of data points

$$\begin{aligned} F{\hbox {-supp}~}(X\rightarrow Y, \tau ) = \frac{\sum _{k=1}^{N}{\text {min}} \{t_k(z_i)\}}{N}. \end{aligned}$$
(4)

A fuzzy association rule \(X\rightarrow Y\) holds if and only if \({\widetilde{\tau }}(X) \le {\widetilde{\tau }}(Y)\) for each \({\widetilde{\tau }} \in T\). It means that the inclusion degree of Y is greater than that of X for every fuzzy transaction \({\widetilde{\tau }}\). An association rule is a specific form of fuzzy association rule, because a transaction is a specific form of fuzzy transaction. The confidence value of a fuzzy rule is computed by Eq. 5

$$\begin{aligned} F{\hbox {-conf}}(X\rightarrow Y, \tau ) = \frac{F{\hbox {-supp}}(X \cup Y)}{F{\hbox {-supp}}(X)}. \end{aligned}$$
(5)

Proposed FARM model

This study uses a three-stage methodology given in Fig. 1. The first stage given in green is about the database activities. It begins with collecting related sales data from databases by queries. Then, sales amounts are converted into a fuzzy number using predefined fuzzy membership sets. The second stage given in orange is counting frequent products in the fuzzy transactions. First, fuzzy support numbers are counted, and then, the Apriori algorithm runs for all combinations of frequent products to find out frequencies of products sold together. The third stage given in blue discovers fuzzy association rules and stores them in repositories for future usage.

Fig. 1
figure 1

Proposed FARM methodology

Evaluation of the proposed methodology

A case scenario was presented to illustrate the feasibility of the proposed methodology in this section. The algorithm was applied to 101,250 online transactions including 100 different products. Data were collected from a B2C company, one of the leaders in the sector with more than 16 million visitors from all over the world. The company sells over 500 brands and 70 thousand products to five continents. In this study, special products for Muslim women were selected to verify the proposed FARM methodology.

The first stage of the methodology starts with data preparation steps. Transactions that have only one kind of product were ignored, because this study considers associations among products. Each transaction includes at least two types of the product given in the Products column in Table 1. The proposed FARM methodology considers the sales amount, whereas the traditional ARM method takes into account binary variables, sold or not.

Table 1 Data description

All sales amounts in each transaction were transformed into the fuzzy sets numbers. The fuzzy membership functions in Fig. 2 were determined by company managers considering products’ sales volume. For example, one of the customers purchased three kinds of products in transaction ID 679. The customer bought 4, 12, and 3 items of product 1078, 1086, and 1087, respectively. Although 4 and 12 directly correspond to “Medium” and “High” classes, respectively; 3 lies in both “Low” and “Medium” classes. 3 was transformed into the fuzzy set, calculated as (\(0.25/Low + 0.75/Medium\)).

Fig. 2
figure 2

Fuzzy membership sets for sales amounts

Table 2 Fuzzification of sales amounts
Table 3 Fuzzy support values of frequent 1-item set

Table 2 presents the converted qualitative numbers into fuzzy sets for each product considering all transactions. After the transformation process, the total fuzzy score (TFS) was computed by aggregating all fuzzy membership values for each class. The final fuzzy class (FFC) of a product was defined using the maximum TFS. For the previous example, product 1075 has a TFS of \(\{0.75, 0.50, 1.75\}\). In this case, product 1075 was assigned to the “High” class and renamed as 1075H.

In the second stage, first, fuzzy support values were computed for the 1-itemset of products and given in Table 3. Minimum support (minsupp) value was determined by managers as 500 items in all transactions. This means that a product is assigned to the frequent 1-item set if the maximum TFS of the product is larger than minsupp value.

According to the Apriori principle, the algorithm eliminates items with a lower support value than the minsupp from the frequent item set. The frequent 2-item set was created computing all possible combinations of the frequent 1-item set using the Apriori algorithm. Table 4 shows the fuzzy support values of the 2-item set.

Table 4 Fuzzy support values of frequent 2-item set

Similarly, the frequent 3-item set was created for all possible combinations of the frequent 2-item set. Table 5 gives the results of frequent item sets that include three products. The algorithm terminated, because none of the 4-item set could meet the minsupp threshold.

Table 5 Fuzzy support values of frequent 3-item set

The third stage creates and stores fuzzy association rules. All potential association rules for each frequent itemsets were extracted, and the minconf threshold, 35%, was checked for each created rule. Top 15 fuzzy association rules and the corresponding confidence values are presented in Table 6. For example, the confidence value of the rule “\(if \; \{1078M, 1121M\}\; then\; \{1150M\}\)” was calculated as

$$\begin{aligned} \frac{n(1078M\cap 1121M\cap 1150M)}{n(1078M\cap 1121M)}=\frac{3018.75}{8625.00}= 0.35. \end{aligned}$$
Table 6 Fuzzy association rules with 35% of confidence value

Comparison of FARM, ARM, and discrete ARM

Table 7 is presented to show the contribution of the proposed FARM approach by a comparison with traditional ARM. In both methods, support threshold value and confidence value were determined as 500 and 0.35, respectively. Traditional ARM discovered created 58 rules, whereas the proposed FARM approach created 22 fuzzy rules. Although ARM presented more rules, FARM included more information in fewer rules, because it gives details about sales volume instead of “sold or not (binary)” information. For example, for one of the common rules that include \(X=\{1127\; and\; 1150\}\) and \(Y=\{1078\}\), ARM indicates that if products 1127 and 1150 sold, then product 1078 is sold. However, FARM shows that if product 1127 is sold over 8 items and 1150 sold up to 8 items, then product 1078 is sold up to eight items. These results confirm that the proposed FARM approach produces much information about e-commerce sales for decision-makers. Furthermore, the FARM method eliminates some traditional rules considering their sales amount and can produce some rules different from ARM, such as “\(If\; \{1150M,\; 1100M\}\; then\; \{1121M\}\)” and “\(If\; \{1078M,\; 1121M\}\; then\; \{1150M\}\)”.

Table 7 ARM association rules with 35% of confidence value

Table 8 gives the run time of ARM and FARM, respectively, for different support and confidence threshold values. For example, ARM worked 206.7 s, whereas FARM ran 112.13 seconds for the support threshold is 200 items, and the confidence threshold is 20%. FARM was faster than ARM in all cases except for the support threshold is 600 items, and the confidence threshold is 50%. Textbf cells in the table emphasize the maximum difference between the two methods based on the support values. When the support threshold value of 500 items, determined by managers, was considered, FARM produced results faster for the confidence threshold value of 0.35.

Table 8 Run time of ARM and FARM (in seconds)

This study also compares the results of discrete ARM and FARM. In discrete ARM, each sales amount is considered Low, Medium, or High. One customer transaction can belong to only one of these classes. FARM also considers Low, Medium, and High classes, but it calculates membership values for each class.

Discrete ARM requires expanding the product types based on predefined classes. For example, the product 1075 is divided into 1075 low, 1075 medium, and 1075 high sub-products. Therefore, it usually produces fewer association rules than traditional ARM and FARM. However, this study shows that discrete ARM is not a proper method for a large number of products in the e-commerce sector. Table 9 shows the generated number of rules for the corresponding threshold values.

Table 9 Comparison table

Conclusion and limitations

In this study, some products special for Muslim women were considered to investigate their associations. To do this, an FARM methodology was proposed and tested in e-commerce sales. Some strong rules among various products were discovered. The results of such research contribute to decision-makers deciding how many and which kinds of products to generate the closest attention from online customers. For example, the rule “if product 1078 and 1121 sold up to 8 items then product 1150 is sold up to 8 items.” This rule shows that the company should recommend product 1150, and the maximum amount of offer should not exceed eight if they want to increase customer attraction. Company may present a special discount for this item and amount.

This study proposes a methodical process of applying the FARM method in the field of e-commerce sales analysis. The proposed methodology presents valuable support to the company managers, because discovered fuzzy rules are straightforward and can simply be adopted to design customer relations. Moreover, the FARM approach generates rules with more information than traditional ARM, because it considers not only “sold or not (binary)” variables but also sales amount to create fuzzy rules. Managers can have information about which and how many products should be offered to online customers using the proposed approach to increase customer interest.

The proposed methodology was compared to demonstrate its effectiveness with traditional ARM. The results were interesting, because the FARM approach yielded more information with fewer rules. In a common example, ARM indicates that if products 1127 and 1150 are sold, then product 1078 is sold. However, FARM shows that if product 1127 is sold over 8 items and product 1150 is sold up to eight items, then product 1150 is sold up to 8 items. Consequently, the company can offer product 1078 up to 8 items to the customers buying these two products with the corresponding amount. These results confirm that the proposed FARM approach produces much information about e-commerce sales for decision-makers. Furthermore, the FARM method eliminates some traditional rules considering their sales amount and can produce some rules different from ARM.

There are some important notes for the readers. As the number of transactions increases, support and confidence values should be decreased. Therefore, the support and confidence threshold value is expressed in numbers instead of rates. Additionally, a rule is created by item sets that satisfy the minimum support (minsupp) and minimum confidence (minconf) values. It does not imply causality. Causality needs cause and consequence of the rule.

For future studies, readers can focus on methodological setup and experimental effects. From the methodological setup perspective, the described fuzzy sets and threshold values can be changed. On the other hand, the dataset can be enlarged with respect to the data attributes, or the same data attributes can be used for various countries.

From the methodological setup perspective, (i) The fuzzy sets used in the study may affect the analysis results. The described fuzzy sets use triangular fuzzy numbers. It is suggested that authors can redesign fuzzy sets with different representation types, such as trapezoidal fuzzy numbers. (ii) Although managers decided on the minimum support and minimum confidence values, authors can investigate the effects of the changes in these two critical thresholds to enhance the reliability of the proposed methodology.

From the experimental effects perspective, (i) This study considers the items and amounts in the transaction data. However, for further research, authors can take into account several e-commerce features such as income, cost, profit, and sales time. The proposed model can be adapted for these features. (ii) Customers in different regions may have different shopping behaviors. Therefore, applying the methodology to different countries and revealing the cross-cultural difference can be an exciting study.