Keywords

1 Introduction

Due to recent advances in the current internet environment, the market size of EC (Electronic Commerce) market that trades products on the internet is in rapid expansion. In addition, competition for customer acquisition is occurring and acquisition cost of new customers is rising in this market. Therefore, acquisition of repeat customers who use the EC site continually is regarded as important. In the EC site market, there is a feature customer defection rate from the first purchase to the second purchase is the highest, and the subsequent rate customers who have purchased for the second time decreases [1]. Therefore, when considering acquisition of repeat customers, it is important to prevent separation from first purchase to second purchase. The transition of customer defection rate from the first purchase to the multiple purchases is shown below (Fig. 1) [2].

Fig. 1.
figure 1

Transition of customer defection rate

Hence, it is important to understand the behavior of repeat customers, and it allows decreasing of defection customers [3,4,5,6]. Especially, the target EC site of this study provides the system of make reservations for golf courses in addition to purchasing golf supplies. So, customer retention on the EC site without limiting total number and purchase price of items purchased at second purchase brings sales increase as a whole.

The purpose of this study is to clarify the factors specific to customers who repurchase through the analysis of behavior at the first purchase at the EC site.

Using the result of the analysis, we also propose marketing policy for the time of first purchase to encourage repurchase.

2 Data Sets

In this study, we target on the general EC site relating to golf. The EC site provides some services such as EC of golf equipment, reservations for golf courses, manage golf score, etc. In this study, we used following data.

  • Customer information data (age, sex, registration date, etc.)

  • Purchase data (category of purchase items, purchase date, whether purchased item is used item or not, etc.)

  • Access history data (login date and time, URL of access page, URL of referrer page, etc.)

    * Period for each data: 8 months from January to August, 2014.

The category name of the product included in the purchase data is shown in Table 1.

Table 1. Category name of item

The landing route and browsing page name included in the access log data is shown in Tables 2 and 3.

Table 2. Browsing page name
Table 3. Landing route name

In this study, we analyzed the bellow customer. The reason for this is that we defined the first three months of the data period as the first purchase period and the six months from the first purchase month as the repurchase period.

[Customers]

  • Customers who bought for the first time between January and March 2014

    * We exclude the customer who has passed for more than 2 years from registration.

So, the number of analyzed customers was 8,181, of which 3,228 customers repurchased within 6 months.

In this study, the purpose was to predict the presence or absence of repurchase within a certain period from the first purchase. When the objective variable to be predicted is binary, binomial logistic regression models are often used [7].

The Binomial logistic regression model is a type of classifier that performs class discrimination. By interpreting significant explanatory variables in the constructed model, it is possible to clarify the characteristics that affect the presence or absence of repurchase. In the binomial logistic regression analysis, the customer’s repurchase probability \( p_{i} \) is expressed by the following equation [8].

$$ p_{i} = \frac{{{ \exp }\{ \mathop \sum \nolimits_{j = 0}^{m} \beta_{j} X_{ij} \} }}{{1 + { \exp }\{ \mathop \sum \nolimits_{j = 0}^{m} \beta_{j} X_{ij} \} }} $$
(1)
  • \( X_{ij} \): Factors affecting repurchase \( (X_{i0} = 1) \)

  • \( \beta_{j} \): Parameters for each explanatory variable (\( \beta_{0} \) is Intercept).

As an explanatory variable used in the model construction, we created three variables from membership information data, nine variables from purchasing behavior at first purchase, and 27 Web-browsing behaviors at the first purchase date. Details of the explanatory variables are shown in Tables 4 and 5.

  • * Demographic Variables was created by membership information data

    Purchasing Behavior Variables was created by purchase data

    Access History Variables was created by Web browsing data.

Table 4. Dmographic variables and purchasing behavior variables used in the model construction
Table 5. Access history variables used in the model construction

Although the number of target customers in this research was 8,181, at the time of model construction, we randomly sampled the number of non-repurchased customers by setting the number equal to the number of repurchased customers.

Furthermore, in order to verify the prediction accuracy of the model, we set 70% of the data as training data and 30% as the test data, for each non-repurchased customer and each repurchased customer. As a result, the datasets used in the model construction was split as follows (Table 6).

Table 6. Datasets used in the model construction

In addition, in order to grasp the characteristics of repurchased customers more precisely, we constructed repurchase prediction model for each purchase item category such as wear item, golf club item and accessory item at first purchase. This is because the behavior at the first purchase is considered different depending on the purchase category. Purchasing behavior variables and number of datasets (training data and test data) used in these model construction are shown in Tables 7 and 8.

Table 7. Purchasing behavior variables used in model construction for each purchase category
Table 8. Datasets used in repurchase prediction model for each purchase category

In order to confirm the prediction accuracy of the constructed model, we performed hold-out validation by using the training data and test data. Specifically, we created a confusion matrix like a following table and we calculated prediction accuracy of the constructed model by using following equations (Table 9).

Table 9. Confusion matrix

Accuracy (ACC): Percentage of the total number correctly predicted among the total number predicted.

$$ {\text{ACC}} = \frac{TP + TN}{FP + FN + TP + TN} $$

Precision (PRE): Percentage of the total number that is a positive class actually among the total number predicted positive class.

$$ {\text{PRE}} = \frac{TP}{TP + FP} $$

Recall (REC): Percentage of the total number predicted positive class among the total number that is a positive class actually

$$ {\text{REC}} = \frac{TP}{FN + TP} $$

F-measure: harmonic mean of PRE and REC

$$ {\text{F-measure}} = 2 \times {{PRE \times REC} \over {PRE + REC}} $$

3 Analysis of Repeat Customer

We built a model that predicts repurchase for the entire customer using binomial logistic regression analysis with stepwise selection method. We selected explanatory variables of coefficient of significant probability less than 0.05.

From Table 10, we can see that variables created from Web browsing data are selected much. In addition, the confusion matrix for the test data of this model and the evaluation indicator for confirming the prediction accuracy are shown in Tables 11 and 12.

Table 10. Estimated value of selected partial regression coefficient
Table 11. Confusion matrix of model for entire customer
Table 12. Evaluation indicator of model for entire customer (%)

Subsequently, we built discriminate model focusing only customer who purchased each product category such as wear item, golf club item and accessory item at the time of first purchasing. Table 13 shows the explanatory variables that selected by the model construction for each purchase category.

Table 13. Estimated value of selected partial regression coefficient for each purchase category

From Table 13, in all three models, variables of whether landing from bookmark or not, average number of page view at first purchase date and number of login at first purchase date are selected commonly. In addition, the confusion matrix for the test data of these three models and the evaluation indicator for confirming the prediction accuracy are shown in Tables 14, 15 and 16.

Table 14. Confusion matrix of model for customers who purchased wear item
Table 15. Confusion matrix of model for customers who purchased golf club item
Table 16. Confusion matrix of model for customers who purchased accessory item

In comparison with accuracy of model for entire customer, it can be seen that there is no difference in accuracy of model between any models (Table 17).

Table 17. Evaluation indicator of model for customers who purchased each category (%)

4 Discussions

First, we consider the model predicting repurchase for entire customers. We could see that customers who purchased for the first time immediately after membership registration are leading to repurchase. It is considered important for acquiring repeat customers to promote golf equipment early after membership registration and to shorten the number of days until initial purchase. Moreover, since the partial regression coefficient of purchase of used items is negative, it seems that it is possible to encourage repurchase by recommending new item at the first purchase. Furthermore, since the partial regression coefficients of the e-mail magazine registration, browsing frequency of page other than shopping page and the landing from the news site are positive, it can be said the customer who is highly interested in golf on a daily basis repurchased. From this, it seems that continuing attraction of customers’ interests by periodically distributing e-magazines and news related to golf after membership registration will lead to a reduction of defection rate. Regarding that the estimated value of the partial regression coefficient of the whether landing from other EC site or not is negative and that the partial regression coefficient of whether landing from bookmark is positive, it is inferred that the customer is not using other EC site and uses only this EC site. It seems that these customers already settle in the EC site for purposes other than purchasing.

Second, we consider the model constructed using only customers who purchased wear items. Since partial regression coefficient of whether customer purchase discount items is positive, it is considered effective as measure to encourage repurchase recommending discount items of wear items at the first purchase. Regarding that the partial regression coefficient of the browsing frequency of management golf score page at the first purchase date is positive, it is inferred that customer using the score management function of the EC site during the period until the first purchase or is interested in the score management function repurchased. Considering that the EC site provides score management app, it seems that concentrating on product recommendation in the app will lead to a reduction of defection rate.

Third, we consider the model constructed using only customers who purchased club items. It seems that the customer purchased high price club or didn’t purchase used clubs repurchased. In other words, with respect to purchasing of clubs, a reduction of defection rate is expected by recommending without limiting price.

Finally, we consider the model constructed using only customers who purchased accessory items. We can observe that customers who purchased for the first time immediately after membership registration are leading to repurchase. From this result, in the purchasing accessory items whose average price is inexpensive compared to other item categories, it is considered as effective measure for reduction of defection rate that to urge early purchase after membership registration. In addition, since the partial regression coefficient of the browsing frequency of golf course reservation page is positive, the customer purchases inexpensive accessories items on the way to reserve a golf course repurchased. In other words, by recommending expendable items such as golf balls to the customer who is likely to reserve a golf course, customer retention can be expected. Moreover, since customers who browse pages other than shopping page much repurchased, it can be inferred that customers who purchased on impulse when visiting the EC site for purposes other than purchasing repurchased. Therefore, considering the low price of the accessory item, it seems that prompting unplanned purchasing promotes the acquisition of repeat customers.

In addition, since partial regression coefficient of average number of page view at first purchase date is positive in all of the four models constructed, it seems that repurchase is promoted by implementing measures to make customer stay at the EC site as long as possible. Since partial regression coefficient of the number of login on the first purchase date is positive as well, we considered that customers who took a long time to purchase repurchased. From this, it seems that recommendations of similar items promote repurchase.

5 Conclusion

In this study, we extracted the characteristic of customers who repurchase and tried to propose marketing measures. Especially, we built model that predict repurchase within a certain period by binomial logistic regression analysis. As a result of model, we could clarified the characteristics related to repurchase. Moreover, we built models predicting repurchase focusing only customer who purchased each product category such as wear item, golf club item and accessory item at the time of first purchasing. As a result of these model, it was found that characteristic of customers who repurchase are different for each category and we could propose marketing measures to promote to repurchase in detail. However, the prediction accuracy of the constructed model in this research is not sufficient and there is room for improvement. We think that we can build a more precise prediction repurchase model by incorporating variables of behavior before and after the first purchase into the model.