1 Introduction

The recommendation system can dig out valuable information for users from massive data through the recommendation model according to the user’s preferences. Therefore, the recommendation system is widely used in many fields such as e-commerce, advertising recommendation, and movie recommendation. The quality of input data has a great influence on the recommendation results. Sparse feature data will not only increase the difficulty of subsequent model training but also easily cause the recommendation results to fall into a local optimum. However, in the existing recommendation field, effectively solving the data sparsity problem and accurately predicting user needs are still problems that need to be solved urgently in the recommendation field.

Most of the currently widely used recommendation systems are based on traditional recommendation models, including content-based recommendation models [1], collaborative filtering recommendation models [2], association rule-based recommendation models[3], matrix factorization-based recommendation models [4], and combined recommendation models [5]. The traditional recommendation model does not require a lot of computing and storage resources, is suitable for processing small-scale datasets, and is easy to implement and deploy; but it relies too much on the user’s historical behavior data, facing the similarity between products or the correlation between users and products It requires a lot of computing and storage resources, and the analysis of data is relatively simple, and there are problems such as low recommendation efficiency, sparse data, and low recommendation accuracy [6]. In recent years, Deep Learning technology [7] has made great progress, and because of its strong learning ability, it has been widely used in computer vision, speech recognition, natural language processing and other fields [8]. Apply deep learning technology to the recommendation model, and use graph neural network [9], convolutional neural network [10], deep neural network [11] and other networks in the neural network to enable the model to obtain more efficient learning capabilities, thereby mining features. deep relationship. As a result, more and more researchers are investing in the study of the application of deep learning technology in the field of recommendation.

In order to dig deeper into the feature relationship between data information and solve the problem of data sparsity, this article proposes a fusion recommendation model based on LightGBM and deep learning—DCLGM model. The model uses LightGBM [12] to convert and fuse the features in the dataset, and perform feature selection to obtain effective integer result leaf vectors; then use the cross network [13] and deep neural network to obtain the linear cross combination relationship of high-order features and the nonlinear high-order feature association relationship, and fully mine the hidden relationship between high-order features to improve the accuracy of recommendation.

The main contributions of this article are as follows:

  1. 1.

    A fusion recommendation model DCLGM based on LightGBM and deep learning is proposed. That is, LightGBM and the cross network and deep neural network in deep learning are used for training to solve the problem of data sparsity and the hidden relationship between mining features.

  2. 2.

    Use the integer leaf node index value generated by LightGBM as the prediction result of the tree to replace the high-dimensional sparse OneHot data generated by the tree in the past as the result of the tree, solve the problem of data sparsity, and further improve the efficiency of data processing.

  3. 3.

    Utilize the cross network and deep neural network to process complex high-order feature combination relationships, obtain the linear cross combination relationship and nonlinear feature correlation relationship of high-order features, further improve the recommendation accuracy and improve the recommendation effect.

  4. 4.

    The simulation experiments on the public dataset Criteo are and dataset Avazu used to compare the proposed model with the other four typical recommendation models. The experimental results show that the recommendation effect of the DCLGM model is better.

2 Related Work

The recommendation model proposed in this paper is an integrated framework based on LightGBM and deep learning. This chapter mainly discusses the related work of the DCLGM model from two aspects, namely the recommendation based on echelon promotion decision tree and the recommendation based on deep learning.

2.1 Recommendation Based on LightGBM

In the research of recommendation system, scholars pay more and more attention to the LightGBM algorithm, which has become a research hotspot. Scholars have done a lot of research work on the application of the LightGBM algorithm in the recommendation field, making it widely used in tasks such as multi-classification, click-through rate prediction, and search ranking.

Yun et al. [14] proposed a prediction model based on a convolutional neural network (CNN) and LightGBM. The model uses convolutional neural network to extract information from the input data, and innovatively integrates the LightGBM classification algorithm into the prediction model, which improves the prediction accuracy and robustness. Liu et al. [15] proposed a model combining deep convolutional neural network and LightGBM algorithm. The model uses DCNN to extract information from the input data and uses the strong classifier LightGBM to solve the limitations of the DCNN fully connected layer. High dimensionality and complexity of sensor data in RUL prediction. Jiang et al. [16] proposed a feature-parallel distributed gradient boosting tree model FP-GBDT. The model designs an efficient distributed dataset transposition algorithm, which converts the dataset originally divided by rows into data representations divided by columns; and designs a sparse-aware method to speed up the establishment of gradient histograms and a bitmap compression method to transmit the location information of the data samples. Hong et al. [17] proposed a hybrid recommendation model utilizing LightGBM, CNN and IPNN to improve the performance of default prediction. The core idea of the model is to learn the original features through LightgGBM and obtain new feature interaction representations, and then use the deep neural network as a feature generation method to generate deeper feature interactions. In addition, the structure of the inner product-based neural network (IPNN) is used. as a deep learning classifier to learn feature interactions.

In the above related work, the recommendation model uses LightGBM to improve the accuracy of recommendation prediction to a certain extent. However, although only LightGBM can improve the interpretability of features, it cannot mine the relationship between high-order features. Actually, many researchers have noticed the above problems and have begun to study the integration of LightGBM and neural networks to improve the interpretation of high-order features by the recommendation model.

2.2 Recommendation Based on Deep Learning

In recent years, deep learning technology has developed rapidly in the fields of speech recognition, computer vision, and natural language processing. Researchers are thinking about how to apply deep learning technology to the recommendation system, and try to use deep learning model modeling to strengthen user feature training to solve the problem of data sparsity in the recommendation system, thereby improving the accuracy of prediction and recommendation.

Liu et al. [18] proposed a recommendation method based on attribute-aware attention graph convolutional network (A-GCN). The model uses graph neural networks to express the complex interactions between features, and uses a message passing strategy to aggregate data from other features. Directly linked node types pass messages, and attribute information is considered through an attention mechanism to filter the information passed from the item to the target user. Bernardis et al. [19] proposed a neural feature combiner NFC, which is a cold-start item recommendation method based on deep learning and items. The central idea of the model is to use the neural network to map the content features of the item into the low-dimensional mixed embedding space, and the features that make up the embedding are then combined, in order to reproduce the collaborative item similarity value. Cui et al. [20] proposed a sparsity-aware secure cross-platform social recommendation framework S3Rec. The model improves the recommendation performance of the rating platform by integrating sparse social data on the social platform. It also proposes two secure sparse matrix multiplication protocols based on homomorphic encryption and private information retrieval, which can protect the data privacy of the platform. Liu et al. [21] propose a new interest-aware message-passing GCN, where the model performs high-order graph convolutions inside subgraphs. Subgraphs consist of users with similar interests and their interactive items; and an unsupervised subgraph generation module is designed, through which user features and graph structure are exploited to effectively identify users with common interests. Xie et al. [22] proposed a new cross-domain comparative recommendation framework (CCDR) for CDR matching. They built a huge diversified preference network to capture a variety of information reflecting users’ diverse interests, and designed a intra-domain contrastive learning (intra-CL) and three inter-domain contrastive learning (inter-CL) tasks to better perform representation learning and knowledge transfer.

The recommendation models in the above-mentioned related works have improved the performance of recommendation to a certain extent, but the mining and extraction of data features are generally not deep enough, and the utilization of linear and nonlinear relationships of high-order features is neglected. In response to these problems, this paper proposes to use LightGBM, cross network and deep neural network to solve the problem of data sparsity and mine the cross-correlation relationship of data features.

3 Proposed Model

In order to better solve the problem of data sparsity, deeply mine the relationship between data features, and improve the accuracy of the recommendation system, this article proposes a fusion recommendation model DCLGM model based on LightGBM, cross network and deep neural network. This section provides a detailed description of the proposed DCLGM model. First, describe the overall structure of the DCLGM model, and introduce the dataset preprocessing process; secondly, introduce the LightGBM algorithm in DCLGM, and introduce in detail how to use the LightGBM algorithm to process the dataset; then, introduce the cross network and deep neural network of DCLGM, describe in detail how to use the cross network model to mine the linear cross combination features of data features and how to use the deep neural network model to extract the nonlinear correlation of data features; finally, the output layer of the model is introduced, and the output part of the model is described.

3.1 Problem Description

This paper mainly aims to solve the product recommendation problem under the background of big data, use the network information generated in the big data environment, mine user preferences, and provide users with more accurate product recommendations. The model in this paper mainly solves how to effectively alleviate the problem of data sparsity, and how to dig out the deep-level feature combination relationship between users and transaction items, so as to provide users with more accurate recommendation effects. Table 1 lists some symbols and definitions used in the remainder of this article.

Table 1 Description of the main symbols and definitions of the article

3.2 Structure of the DCLGM Model

The architecture of the DCLGM model proposed in this paper is shown in Fig. 1. The DCLGM model is mainly divided into data processing layer, neural network layer and model output layer, in which the data processing layer includes data preprocessing and LightGBM model data processing; the neural network layer includes cross network and deep neural network; the model output layer is responsible for the results of the neural network output after splicing.

Fig. 1
figure 1

DCLGM model structure diagram

The specific algorithm of the DCLGM model is shown in Algorithm 1.

figure a

3.3 DCLGM Data Processing Layer

In this chapter, the dataset U is taken as an example to illustrate. Due to the large number of vacant values in the dataset, the DCLGM model must preprocess the dataset U before processing the dataset. First, fill in the vacancies in the dataset U with reasonable values in the input layer. The filling values selected for different datasets should be analyzed based on the data values of the dataset itself, and the filling values should be values or characters that do not exist in the dataset, and try to avoid introducing too many types of filling values, and the filling values should be closer to the real data values to reduce the impact of filling values on the dataset. By analyzing the data set used in this chapter, we found that the minimum value in the continuous data is − 2, and the character “− 1” does not exist in the discrete data value. Therefore, − 3 and − 1 are used to fill the blank values of continuous data and discrete data in the dataset respectively. After filling, the dataset U becomes of free of vacancies value dataset A. Next, normalize the continuous data in dataset A. In this paper, the linear normalization method is used to perform linear transformation on the sparse data. After linear normalization, the result is mapped to the [0, 1] interval. The formula of the linear normalization method is shown in Eq. (1). Label encoding [23] is performed on the discrete data in dataset A, and the data is discretized [24]. Finally, dataset A forms a fully numerical dataset B without vacant values.

$$ { }X_{norm} = \frac{{X - X_{min} }}{{X_{max} - X_{min} }} $$
(1)

where \(X_{max}\) is the maximum value of the data, \(X_{min}\) is the minimum value of the data, \(X\) is the data value, and \(X_{norm}\) is the data value obtained by linear normalization.

Transfer the full numerical dataset B obtained after the input layer preprocessing to the LightGBM layer of the DCLGM model. The LightGBM layer is used to fuse and extract the features in the full numerical dataset B, and the feature set with the highest classification accuracy is used as the result of feature selection, and an effective integer result leaf vector is obtained.

First, the histogram algorithm in the LightGBM model is used in the LightGBM layer. The histogram algorithm can divide continuous features into discrete values by constructing a histogram of features. This discretization method not only reduces the calculation amount of feature discretization, but also better retains feature information. Convert each feature column in data set B into a histogram. The characteristic data in the dataset B will be divided into K bins, and an integer will be assigned to each bin, and then the data value in the bin will be replaced by the integer assigned to the bin, at the same time, a histogram with a width of K is formed according to the integer of the bin. When traversing the data, the histogram is counted and accumulated. After the data has been traversed once, the histogram accumulates the corresponding statistics, and then traverses according to the discrete value of the histogram to find the optimal segmentation point. The overall process is shown in Fig. 2.

Fig. 2
figure 2

LightGBM’s histogram algorithm

Secondly, use the EFB algorithm to sort the sparse data in dataset B according to the size of the eigenvalues, construct a dense new feature structure, and then use this new feature to replace the original feature, to realize the data without loss of information. In this case, the feature dimension is reduced and the calculation of unnecessary 0 values is avoided. This structure can improve the processing efficiency of sparse features. The process is shown in Fig. 3.

Fig. 3
figure 3

LightGBM’s EFB algorithm

Finally, Use the one-sided gradient sampling algorithm GOSS, the training data can be divided into multiple subsets, and multiple decision trees can be trained simultaneously in a parallel computing environment, and random sampling is used so that each decision tree node only uses part of the sample for splitting, this reduces the risk of overfitting, reduces the number of samples in data set B, and excludes most of the small gradient samples in data set B. Sort all the values of the features to be split in dataset B in descending order according to the absolute value, select m ∗ 100% data with the largest absolute value, and randomly select n ∗ 100% data from the remaining smaller gradient data; next, randomly select n ∗ 100% data and multiply it by a constant (1 − m)/n, so that the algorithm will pay more attention to the under-trained samples and will not change the distribution of the dataset B too much; Finally, the information gain is calculated using (m + n)*100% data.

The formula of the objective function of the LightGBM model is shown in Eq. (3).

$$ { }\hat{y}_{i}^{k} = \hat{y}_{i}^{{\left( {k - 1} \right)}} + f_{k} \left( {x_{i} } \right) $$
(2)
$$ \begin{aligned} O_{{{\text{bj}}}}^{*} = & \mathop \sum \limits_{i} L\left( {y_{i} ,\hat{y}_{i}^{k} } \right) + {\Omega }\left( {f_{k} } \right) + e^{{\left( {k - 1} \right)}} \\ = & \;\mathop \sum \limits_{i} L\left( {y_{i} ,\hat{y}_{i}^{{\left( {k - 1} \right)}} + f_{k} \left( {x_{i} } \right)} \right) + {\Omega }\left( {f_{k} } \right) + e^{{\left( {k - 1} \right)}} \\ \end{aligned} $$
(3)

where \(y_{i}\) is the true value of the label, \({\hat{\text{y}}}_{i}^{{\left( {k - 1} \right)}}\) is the result of the k − 1th learning, \(e^{{\left( {k - 1} \right)}}\) is the regularization [25] item of the first k − 1 tree, \({\Omega }\left( {f_{k} } \right){ }\) is the regularization item of the kth tree and \(O_{{{\text{bj}}}}^{*}\) is the objective function.

The result of the second-order Taylor expansion of the function is shown in Eq. (4).

$$ \sum\limits_{i} {L\left( {y_{i} ,\hat{y}_{i}^{{\left( {k - 1} \right)}} + f_{k} \left( {x_{i} } \right)} \right)} = \sum\limits_{i} {\left[ {L\left( {y_{i} ,\hat{y}_{i}^{{\left( {k - 1} \right)}} } \right) + L^{\prime}\left( {y_{i} ,\hat{y}_{i}^{{\left( {k - 1} \right)}} } \right)f_{k} \left( {x_{i} } \right) + \frac{1}{2}L^{\prime\prime}\left( {y_{i} ,\hat{y}_{i}^{{\left( {k - 1} \right)}} } \right)f_{k}^{2} \left( {x_{i} } \right)} \right]} $$
(4)

where \(y_{i}\) is the true value of the label, \({\hat{\text{y}}}_{i}^{{\left( {k - 1} \right)}}\) is the result of the k − 1th learning, \(L\left( {y_{i} ,\hat{y}_{i}^{k} } \right)\) is the training error of the sample, which means to find a suitable tree \(f_{k}\), so that the value of the function is the smallest.

LightGBM has the advantages of processing large-scale data, efficient training speed, high-accuracy model, robustness and generalization ability. The model data processing layer uses LightGBM to efficiently process large-scale data, which can discretize feature values, convert continuous features into discrete features, and reduce the number of feature values; it can better handle nonlinear relationships between features, improve the expressiveness of features and feature parallel training; and be able to process multiple features at the same time to further improve the accuracy of the model.

After the dataset B is processed by the LightGBM model, an integer leaf node index value will be generated instead of the high-dimensional sparse OneHot data generated by the tree in the past as the result of the tree, and then all the generated leaf node index values will be spliced by row. Set a unique column name for each column, treat it as discrete data, and finally form a new discrete dataset T. The dataset T solves the problem of data sparsity, enhances the validity and interpretability of the original dataset, improves the utilization of data features, and enhances the accuracy of the recommendation model. The complexity of LightGBM depends on the depth of the tree and the number of parameters, and the complexity is m * log(n), where m is the depth of the tree and n is the number of parameters.

Combine the discrete data in the preprocessed dataset B and the discrete dataset T obtained after the LightGBM model is processed, and merge and splice rows according to the labels in the first column of the dataset to obtain the dataset V, and then the continuous data in dataset B and dataset V are input into the cross network and DNN network of the neural network layer as the input dataset for training.

3.4 DCLGM Neural Network Layer

The neural network layer of the DCLGM model is composed of cross network and deep neural network in parallel. Its input part adopts the principle of input sharing [26], and each part of the neural network processes the same dataset, making the output part associative and connectable.

The cross network is composed of cross layers, through which the features are crossed, and the input data in the embedding layer is crossed layer by layer. The input formula is shown in Eq. (5).

$$ x_{1} = f\left( {x_{0} } \right) $$
(5)
$$ { }f\left( x \right) = x_{0} x_{0}^{T} w_{c,0} + b_{c,0} + x_{0} $$
(6)

where w, b are parameters, and x0 is the initial input value.

The cross layer in the cross network performs a layer-by-layer cross operation of the input features and the previous cross results, where the input data of each layer will be cross-acted with the current layer data, and the linear cross combination features in the data will be extracted to obtain a new cross feature. Taking layer \(l\) as an example, dataset V performs feature crossover in the crossover layer. After a feature crossover is completed, the \(l + 1\) layer will add its input back, and use the residual idea to solve the problem of network performance degradation. Such crossover operations can better capture the nonlinear interactions between features and improve the model’s ability to model sparse features. The structure of the cross network cross layer is shown in Fig. 4.

Fig. 4
figure 4

Cross network cross-layer structure

The output result of layer \(l + 1\) is shown in Eq. (7).

$$ x_{l + 1} = x_{0} x_{l}^{T} w_{l} + b_{l} + x_{l} = f\left( {x_{l} ,w_{l} ,b_{l} } \right) + x_{{l{ }}} ,x_{l + 1} ,x_{l} ,x_{0} \in {\mathbb{R}}^{d} { } $$
(7)

The output of the cross network is shown in Eq. (8).

$$ y_{Cross} = x_{0} *x^{\prime}*\omega + b + x $$
(8)

where x represents input data, b represents Bias, \(x_{l}\) and \(x_{l + 1}\) represent the output of the \(l\) layer and the \(l + 1\) layer cross layer, \(w_{l}\) and \(b_{l}\) represent the connection parameters between the two layers, d represents the dimension of the feature, and \(x_{0}\) represents the vector formed by superimposing the embedding vector and the continuous feature vector.

In the cross network, it is assumed that there are a total of Lc cross layers, and the dimension of the input x0 is d, then the complexity of the entire Cross network is d * Lc * 2. Since W and b of each layer are d-dimensional, it can be found that the complexity is a linear function of the input dimension d. The “cross calculation” adopted by the Cross network does not need to explicitly store the cross results of sparse features, but uses the non-zero positions of sparse features and their corresponding weights to perform calculations directly. This calculation method not only reduces the computational complexity, but also enables the cross network to automatically learn limited high-order cross features and their corresponding weight parameters, and the degree of the cross features increases with the deepening of the layer depth, its time and space complexity both increase linearly with the increase of the input dimension, and it has good generalization ability. Therefore, compared with the deep neural network, the complexity introduced by the cross network is negligible, which ensures that the complexity of the neural network layer is at the same level as the deep neural network.

Cross networks have the advantages of solving feature cross problems, improving feature representation capabilities, reducing the number of parameters, and enhancing feature adaptability. By introducing the cross layer, the linear interaction relationship between features can be better captured, and the expression ability and prediction performance of the model can be improved. By reducing the number of parameters and adaptive learning, the cross network can reduce the complexity of the model and improve the generalization ability and adaptability of the model; and the cross layer is adaptive and can automatically adjust the way features interact according to different data and tasks, which enables the cross network to better adapt the model to different data distribution and feature combination patterns, improving the model’s adaptability and prediction performance.

It is precisely because the crossover parameters are relatively small that its expressive ability is limited. To be able to learn highly nonlinear combination features, the deep neural network is introduced in parallel.

The deep neural network is a fully connected feed-forward neural network, and each data feature will implicitly interact with other features. At the same time, it can extract nonlinear high-order features in the dataset. Its structure is a nonlinear mapping, and the composite mapping of nonlinear units is used for nonlinear processing. Deep neural network consists of three parts: input layer, output layer and hidden layer. The network structure is shown in Fig. 5.

Fig. 5
figure 5

Network structure of deep neural network

In the deep neural network, the input layer takes the discrete feature data and continuous feature data in the embedding layer as input. The input formula is shown in Eq. (9).

$$ h_{1} = g\left( {x_{0} } \right) $$
(9)
$$ g\left( x \right) = Relu\left( {W_{h,0} x_{0} + b_{h,0} } \right) $$
(10)

where b is the parameter, x0 is the initial input value, and h is the number of layers.

When the neurons of each input layer receive the features in a discrete feature vector, the feature information is continuously transmitted in the fully connected hidden layer; and each feature is interactively combined. The calculation formula of the hidden layer is shown in Eq. (11).

$$ h_{l + 1} = f\left( {w_{l} h_{l} + b_{l} } \right) $$
(11)

Finally, the nonlinear high-order features are extracted at the output layer, and the output formula is shown in Eq. (12).

$$ y_{DNN} = \sigma \left( {\mathop \sum \limits_{i = 1}^{k} w_{i} h_{i} + b} \right) $$
(12)

where \(\sigma\) represents the nonlinear transfer function, \(w_{i}\) and \(b\) are parameters, and h is the output value of the previous layer.

In the deep neural network, assuming that the input x0 dimension is d, there are a total of Ld layers of neural networks, and the number of neurons in each layer is m, then the complexity is d * m + m + (m2 + m) * (Ld − 1). Due to the parallelism of cross network and deep neural network, the time overhead in processing text information is low. Currently, for most GPUs/CPUs, the calculation task is not too large. In practical applications, the depth of the tree in LightGBM is generally not too large, and the number of iterations of the input data processed by LightGBM, cross network and deep neural network will be greatly reduced, so the model in this paper has good scalability.

DNN networks have relatively low requirements for data diversity and quantity, and can better utilize large-scale data for model training; It can gradually extract the deep-level features of the data through the stacking of multiple hidden layers, thereby better obtaining the internal structure of the data and the relationship between the features; Moreover, the DNN network has strong feature robustness and can learn more robust feature representations, thereby reducing the sensitivity to noise and changes in the input data and having better adaptability to noise and changes in the data.

3.5 DGLGM Output Layers

The final output of the DCLGM model is first concatenated by the output of the cross network and the DNN network in the fully connected layer, and then the logits are obtained through a weighted summation. The model uses equal weight weighting, and its formula is shown in Eq. (13).

$$ y_{CD} = \left[ {x_{lc}^{T} ,h_{ld}^{T} } \right]w_{logits} $$
(13)

Finally, after the Sigmoid activation function [27] is applied, the final output result is obtained, and its output formula is shown in Eq. (14).

$$ \hat{y} = Sigmoid\left( {y_{CD} } \right) $$
(14)

The DCLGM model enhances the validity and interpretability of the input data through the gradient lifting decision tree, solves the problem of data sparsity, and uses the cross network and deep neural network to extract linear cross combination features and nonlinear correlation features, realizing efficient and reasonable input data The utilization of the algorithm solves the problem that the recommendation effect of the traditional recommendation algorithm is not obvious.

4 Experimental Analysis

4.1 Experimental Setup and Dataset

The experimental environment of this paper is Windows10 (64-bit) operating system, the processor is Intel Core i7-11800H, 16G memory, the programming language is Python3.6, the deep learning computing framework adopts Tensorflow1.6.0 and uses Python libraries such as Sklearn, Numpy, LightGBM and DeepCTR.

The experimental dataset uses the public advertising click-through rate dataset Criteo and dataset Avaz. The dataset Criteo is an online advertising dataset released by Criteo Labs, which contains millions of function values and click feedback of display ads, the dataset has 13 feature attributes, among which Label represents the binary label of whether the advertisement is clicked, with a value of 0 or 1, I1–I13 represent 13 integer features, C1–C26: 26 category features. The dataset Avazu is a 10-day online advertising dataset released by Avazu Company. The dataset has 24 characteristic attributes, which include advertisement ID, advertiser ID, application ID, device information, user information, etc.

4.2 Experimen Evaluation Metrics

The DCLGM model proposed in this paper is to predict the possibility of recommendation, and the output of the model is the predicted possibility between 0 and 1. The evaluation index used in the experiment is the cross-entropy logarithmic loss function Logloss [28] and the area AUC [29] surrounded by the coordinate axis under the ROC [30] curve.

The Logloss calculation formula is shown in Eq. (15).

$$ Logloss = - \frac{1}{n}\mathop \sum \limits_{i = 0}^{m} \left( {y_{i} log\left( {p_{i} } \right) + \left( {1 - y_{i} } \right)log\left( {1 - p_{i} } \right)} \right) $$
(15)

where \(y_{i}\) is the real label of the sample, and \(p_{i}\) is the probability that the i-th sample is predicted to be a positive sample.

The calculation formula of AUC is shown in Eq. (16).

$$ AUC = \frac{{\sum\nolimits_{{ins_{i} \in positiveclass}} {rank_{{ins_{i} }} - \frac{{M \times \left( {M + 1} \right)}}{2}} }}{M \times N} $$
(16)

where \(rank_{{ins_{i} }}\) represents the serial number of the i-th sample (the probability score is sorted from small to large, and it is ranked at the rank position), M and N are the number of positive samples and the number of negative samples, respectively and \( \mathop \sum \limits_{{ins_{i} \in positiveclass}} { }\) is to add up the serial numbers of the positive samples.

AUC represents the area enclosed by the ROC curve and the coordinate axis, and its value range is generally between 0.5 and 1. The closer the AUC is to 1, the better the effect of the model is, the closer the AUC is to 0.5 or even below 0.5, the poorer the effect of the model is; and the smaller the log loss of cross-entropy, the better the effect of the model.

4.3 Experimental Results and Analysis

4.3.1 DCLGM Performance Analysis

Due to the large amount of data in the dataset Criteo and the limited computing resources, this paper uses the grid search algorithm of Sklearn to find the optimal parameters in the LightGBM part of the data processing layer, that is, by setting different parameters and parameter value ranges as input and specifying the result evaluation criteria, the model parameters of the optimal classification results can be obtained. In the neural network layer, this paper uses a single variable and step-by-step optimization method to explore the optimal hyperparameter value of the DCLGM model. This section studies the four hyperparameters of the DCLGM model, L2 regularization parameter, dropout parameter, Embedding vector dimension and Optimizer, to explore their impact on the recommendation results.

The effect of the L2 regularization parameter on the performance of the DCLGM model is shown in Fig. 6. When the L2 regularization parameter value is 1e-7, AUC and LogLoss reach the optimal value. As the L2 regularization parameters of the model continue to decrease, the AUC value gradually increases, the Logloss value gradually decreases, and the performance of the model gradually improves. Therefore, in order to prevent the model from overfitting, this article selects 1e-7 as the parameter value of L2 regularization.

Fig. 6
figure 6

Experimental results under different L2 regularization parameters

The impact of the Embedding vector dimension on model performance is shown in Fig. 7. Take the Embedding vector dimension as 2, 4, 6, 8, and 10 to test the model. When the Embedding vector dimension is 4, both AUC and Logloss reach the optimum, and the model recommendation effect reaches the optimum. As the Embedding vector dimension continues to increase, the AUC value and Logloss value gradually level off after decreasing, and the performance of the model also decreases and then reaches a plateau. Therefore, this paper chooses the dimension parameter 4 as the value of the embedding vector dimension.

Fig. 7
figure 7

Experimental results under different embedding parameters

The selection of different optimizers has a great influence on the recommendation effect of the model. In this paper, four optimizers, SGD, Adam, Adagrad, and Adadelta, are selected for experimental comparison. The experimental results are shown in Fig. 8. When the SGD optimizer is used, the recommendation effect of the model is poor; when the Adam optimizer is selected, the AUC and Logloss of the model reach the optimal value, and the recommendation effect of the model is also due to the other three optimizers, and the recommendation effect of the model is the best. Therefore, this article chooses Adam as the optimizer of the model.

Fig. 8
figure 8

Experimental results under different optimizer parameters

Through the performance analysis of the DCLGM model, the parameter configuration of the model is shown in Table 2.

Table 2 DCLGM model parameter settings

4.3.2 Performance Ablation Experiment and Analysis of Each Module of the DCLGM Model

In order to further verify the functions and effectiveness of different modules of the model, the model in this paper designs ablation experiments from three perspectives: LightGBM, cross network and DNN network; This experiment uses the dataset Criteo as training data, and uses AUC and Logloss as evaluation indicators to verify the effects of different models. The experimental results are shown in Table 3.

  1. (1)

    DCLGM–LightGBM. The LightGBM module of the data processing layer in the original model is removed, and directly transmits the preprocessed data to the neural network module.

  2. (2)

    DCLGM–Cross. The cross module of the neural network layer in the original model is removed, and the spliced dataset obtained after processing is directly transmitted to the DNN network.

  3. (3)

    DCLGM–DNN. The DNN module of the neural network layer in the original model is removed, and the spliced dataset obtained after processing is directly transmitted to the cross network.

Table 3 Performance comparison of each functional module of DCLGM

It can be seen from Table 3 that the recommendation performance of DCLGM–LightGBM on the dataset Criteo is the worst, which shows that LightGBM contributes a lot to the recommendation effect of the DCLGM model in data mining and processing data sparsity, and this also proves the importance of using LightGBM to process data in DCLGM. The recommendation effect of DCLGM–DNN and DCLGM–cross on the dataset is better than that of DCLGM–LightGBM, because after the dataset is processed by LightGBM, the new dataset obtained enhances the validity and interpretability of the original dataset, when trained by the neural network, the utilization rate of data features will be improved. Therefore, the recommendation effect of the DCLGM model fused with LightGBM, Cross network, and Deep neural network is stronger than that without LightGBM and only using a single neural network model.

4.3.3 Comparison of Different Recommendation Models in Data Sparse Environment

In order to further evaluate the effectiveness of the DCLGM model in alleviating the problem of data sparsity, the experiment uses the dataset Criteo as experimental data. Since in the same dataset, the selected data scale is different, the sparsity reflected by the dataset is also different, and the data sparsity increases with the increase of the data scale. In order to reflect different degrees of data sparse environments, this paper randomly manually selects data with data scales of 10%, 40%, 70%, and 90% on the dataset Criteo to simulate training datasets [31] with different degrees of sparseness. Comparing the recommendation performance of different models in different degrees of data sparse environments, the experimental results are shown in Table 4.

Table 4 Performance comparison of different models on sparse datasets

It can be seen from Table 4 that no matter which recommendation model is used, as the data sparsity increases from 10 to 70%, the recommendation performance of the comparison model has a corresponding improvement. When the degree of data sparsity increases from 70 to 90%, the recommendation performance of the comparison model decreases accordingly. This is because the comparison model can alleviate a certain degree of data sparsity, However, when the data sparsity exceeds a certain range, its ability to solve data sparsity problems will decline. Among them, the LGBM model using LightGBM has a stronger ability to solve the problem of data sparsity than the DCN and DeepFM models, which indicates that LightGBM has played a certain role in alleviating the problem of data sparsity. Finally, compared with other recommendation models, the recommendation performance of the DCLGM model has obvious advantages in the data sparse environment, this phenomenon shows that the recommendation model combined with LightGBM, cross network and deep neural network has a good effect on alleviating the problem of data sparsity.

4.3.4 Experimental Comparison of Different Recommended Models

In order to verify the recommendation performance of the DCLGM model proposed in this paper and the effectiveness of the model, a large number of experimental analyses are carried out in this section. The model in this paper is compared with representative recommendation models, including LGBM, DeepFM [32], XDeepFM [33], and DCN. The experiment in this paper selects 70% of the samples from the dataset Criteo and dataset Avazu as the training set, and 30% of the samples as the test set. Each experiment is carried out for 10 rounds, and the average value of the results is finally selected as the experimental record.

So as to show the good recommendation performance of the DCLGM model more vividly, this paper uses the dataset Criteo and dataset Avazu to explore its learning process, and obtains the change curve of the error (loss) as the training batch (Epoch) decreases as shown in Fig. 9. By comparing the two line graphs in Fig. 9, it can be found that the effect of the LGBM model using LightGBM alone is better than that of the XDeepFM model in the second half, which shows that LightGBM has a certain effect in alleviating data sparsity. Through observation, it is found that the DCLGM model in the two datasets has a more stable training process than other models, and the training cycle converges faster, and can quickly descend to the optimal position, and finally stabilize in the optimal area, its error drop rate and size are better than other models, which further highlights the good recommendation performance of DCLGM model.

Fig. 9
figure 9

a The change curve of the cross-entropy Logloss function of each model in the dataset Criteo; b The change curve of the cross-entropy Logloss function of each model in the dataset Avazu

In order to further verify the recommendation effect of the DCLGM model, the model in this paper is compared with the other four recommendation models based on the evaluation indicators in Sect. 4.2. The experimental results of different recommendation models in the dataset Criteo and dataset Avazu is shown in Table 5.

Table 5 Comparison of experimental results

By comparing the two index values of AUC and Logloss of each model in Table 5, it can be concluded that the performance of LGBM on the dataset Criteo is better than that of XDeepFM, but the performance on the dataset Avazu is the opposite, which shows that LightGBM is suitable for the data sparse environment. DCN has better performance than LGBM, which shows that cross network and deep neural network have obvious advantages in dealing with sparse data and extracting feature relations. The performance of DCN is better than that of the model DeepFM, because the former joins the Cross network, which can more effectively obtain the linear cross-combination relationship between high-order features. In addition, DCN and DeepFm outperform LGBM models, which further demonstrates the potential of deep learning techniques in recommender systems.

Finally, the performance of the DCLGM model proposed in this paper on both datasets is the best among all compared models. The AUC value of the DCLGM model is much higher than that of other recommended models, and its LogLoss error is smaller than that of other recommended models. Its AUC value has increased by about 10% and 7% on the Criteo and the Avazu dataset, respectively, and the LogLoss value has decreased by about 20% and 10%. Compared with the LGBM model which only uses LightGBM and the DCN model which only uses cross-network and deep neural network, the DCLGM model integrates LightGBM with cross network and deep neural network to improve the utilization rate of data features, strengthen the interpretability of datasets, and effectively alleviate the problem of data sparsity, at the same time, it can effectively mine high-order combination features, fully extract the cross-combination features between data, and improve the comprehensive performance of recommendations.

5 Conclusion and Prospect

This paper integrates LightGBM, cross neural network and deep neural network, according to the characteristics of the three, a fusion recommendation model based on LightGBM and deep learning—DCLGM model is proposed. The model uses the gradient boosting decision tree to fuse and extract the features in the dataset, and performs feature selection to obtain an effective integer result leaf vector, thereby enhancing the validity and interpretability of the data and solving the problem of data sparsity; At the same time, the model uses convolutional neural network and deep neural network to mine the linear cross-combination relationship and nonlinear correlation relationship of data features, fully mine the hidden relationship between features, and improve the accuracy of recommendation. The model in this paper is compared with the currently popular recommendation models such as DeepFM, DCN, and xDeepFM on the public dataset Criteo and dataset Avazu, and AUC and Logloss are used as evaluation indicators to verify the model. The results of the simulation experiment show that the recommendation effect of the DCLGM model is strong. than other recommended models.

Since too much data feature extraction will increase the computational complexity and memory overhead, it is difficult to give a reasonable explanation for the recommendation results. Therefore, in the following research work, we will try to introduce the attention mechanism into the recommendation model and explore its application in the recommendation system.