1 Introduction

Internet and mobile technologies make it possible for people to access information anytime and anywhere [1,2,3]. People’s lifestyles have been intensively altered by online systems, including social media, e-commerce, and different lifestyle application [4,5,6,7]. When friends get together for dinner, they might share pictures of the food they enjoy on social media, and users might turn to apps for recommendations when deciding what to eat [8,9,10]. Food recommendation systems play a vital role across a wide range of lifestyle applications, and they become an essential component of many lifestyle services, which can, in turn, be used as ways to influence people towards healthy lifestyles as well. Typically, food recommendation systems predict user preference in terms of recipes, eating habits, and behavior. Hence, for a food recommendation system to be efficient, it is vital to understand users’ food preferences accurately. Even for generating health-oriented food services, the user can only be persuaded to follow a suggestion if the recommended food matches their preferences [11, 12].

In recent years, with the recent advancements of online food applications, many food recommendation systems have been designed to respond to accommodate user needs in seeking relevant foods according to their tastes [13,14,15]. However, there are still considerable challenges in this domain that should be addressed. For instance, foods served in a given community may not be suitable to another community due to environmental, cultural, or economic contexts. Traditional \(K\)-nearest neighbor selection approaches fail to capture user communities that take into account food patterns and their relationships [8, 16,17,18]. Whilst, applying community detection approaches in food recommendation systems can be effective to determine the natural groupings of users and utilize these groupings as the nearest neighbor sets of users. User–food interaction matrix is employed as one of the main data resources in food recommendation systems to model the previous behavior of users and find their future preferences [15, 19]. An important point about this matrix that should be taken into account is the dynamic behavior of users, which involves potential changes in user preferences over time [20,21,22,23]. Therefore, to account for the dynamic feature of users’ preferences, the time factor should be considered in the procedure of making recommendations to users.

Another key factor impacting the health of people is their daily diet where unhealthy foods might significantly threaten people’s health [9]. For this reason, it can be suggested that the food recommendation task is more critical than other types of recommendation domains such as movies, music, book, and so on. Moreover, the foods recommended to users by a given food recommendation system should be sufficiently reliable. In other words, the reliability of recommended foods is an important issue, which is, to a large extent, ignored by the previously developed food recommendation systems. Some previous works mapped the recommendation process to a classification procedure in which the unknown ratings can be predicted by applying well-known classification approaches [24]. Also, there have been a number of earlier works that viewed recommendation systems as pattern recognition tasks [25]. In these works, a feature vector was inferred from user and/or item data as latent features, then they are utilized instead of original data to make recommendations. Making an effective mechanism to model the reliability of predictions by recognizing the patterns of users’ daily diet and also emphasizing their tastes can help food recommendation systems in boosting both reliability and accuracy of recommendations [26,27,28].

In this paper, we have addressed the above-mentioned issues by developing a novel food recommendation system based on health-aware reliability measurement. The proposed model provides significant contributions compared to the previously developed food recommendation systems. These contributions are summarized in the following list:

  • A novel community detection approach is developed based on the Laplacian node centrality measure to specify the natural groupings of users. This approach can automatically determine the appropriate number of communities and does not need to set any specific parameter manually for the number of communities.

  • A novel time-aware similarity function is introduced to capture the temporal information of ratings and consider it in calculating user-similarity scores. With this time-aware function, we can incorporate the dynamic feature of users’ preferences into the recommendation process.

  • An effective health-aware reliability measurement is introduced, which simultaneously considers both the accuracy and the health factors of predicted ratings to evaluate their reliability value. To the best of our knowledge, this is the first work that incorporates a health-aware reliability measurement into the food recommendation systems [29,30,31]. Moreover, we introduced a new health metric to determine how much the provided recommendations are healthy.

  • A novel mechanism is developed to recalculate unreliable predictions due to thresholding effect. To this end, a neighborhood refinement approach is proposed in which ineffective neighbors are identified based on a confidence criterion and eliminated from the neighborhood.

In the remainder of this paper, in Sect. 2, the related works are highlighted. Then, the developed system is represented in Sect. 3 in detail. Experimental results and their discussions are detailed in Sect. 4. Finally, Sect. 5 concludes the paper briefly.

2 Related Works

This section is provided to review the related works by briefly describing their main contributions and core conceptual models. We categorized these works into four main classes termed as food recommendation systems, community detection-based recommendation systems, time-aware recommendation systems, and reliability-based recommendation systems. Each of these groups is further reviewed in its corresponding subsection. Moreover, Table 1 summarizes the related works in terms of four distinguished factors (Community detection, Time, Reliability, and Health factors) alongside their main contributions. This table clearly shows that the proposed food recommendation system is the only approach that considers all the four above aspects in its recommendation procedure compared to the related works.

Table 1 The summarized descriptions of the related works in terms of different aspects compared to the proposed method

2.1 Food Recommendation Systems

In recent years, food recommendation systems have gained increasing attention due to their applicability to healthy living. The majority of food domain studies currently consider the preferences and/or health issues of users in order to make recommendations for favorite foods. In [41], a novel food recommendation system is proposed by considering both food ingredients and food images. The authors presented the relevant solution by employing three different resources, including the user historical ratings, the ingredients of a food, and food images. This research presents a hierarchical attention-based food recommendation approach to prioritize users’ preferences. The authors of [14] developed a new model to suggest different foods to the users based on the raw foods in the refrigerator. Also, in [42], the authors proposed a novel algorithm for extracting customer food preferences from online restaurant reviews where natural language processing techniques are used to extract food names from user comments. To determine whether the opinions of users regarding each food were reasonable or unreasonably negative, sentiment analysis was conducted. Lastly, a local eatery recommendation system has been developed, which matches food tastes with local restaurants. According to [43], ingredient–ingredient, ingredient–recipe, and recipe–user interactions play a vital role in food recommendation models. In this paper, the authors proposed a food recommendation algorithm based on a graph convolutional network and embedding propagation layers that improve representation learning and model high-order connections.

Although learning users’ preferences is vital for food recommendation systems, they should also optimize user health and nutritional needs by analyzing health issues, past eating behaviors, and nutritional needs [19]. An innovative food recommendation system on the Android platform is developed in [9] that allows users to hover over their preferences and health. Trattner et al. [44] investigated the healthiness of the Allrecipes.com foods employing nutrition measure. Then, they analyzed user activities to determine whether they corresponded to healthy foods. Finally, they tested their model with both recommendation systems. Rehman et al. [45] emphasized that recommending a diet that is suitable for patients must meet their nutritional needs. To manage this issue, the researchers developed an online food recommendation system called Diet-Right. Also, an ant colony optimization algorithm was used to produce a food list and recommend suitable foods based on pathological values. In [46], a context-aware food recommendation system based on matrix factorization and feature engineering is developed for well-being care applications using mobile devices and medical records. Accordingly, users can access the food recommendations from nearby food establishments and order healthy foods for their tables in real time. Authors of [47] presented a diet recommendation system to suggest healthy menus and dishes. An individual with hypertension can use the suggested foods to control their diet and prevent acquiring health complications by becoming a regular customer. Moreover, Wang et al. [15] developed a healthy food recommendation model called Market2Dish. People can use Market2Dish to discover customized foods and maintain a healthy lifestyle, preventing diseases caused by unhealthy habit.

By investigating the reviewed food recommendation models, it is clear that there is a research gap in these models related to taking into consideration the reliability of recommendations. This is an important issue affecting increasing the level of trust in the recommendations provided by food recommendation models. However, the above-mentioned food recommendation models neglect this issue and only consider the accuracy of recommendations without paying attention to their reliability. In this paper, we effectively address this issue by introducing a health-aware reliability measurement and incorporating it into the recommendation process.

2.2 Community Detection-Based Recommendation Systems

Community detection has emerged as a primary method for understanding how network structure affects behavioral patterns in recommendation systems. Communities can be recognized as an effective way of identifying the underlying structures that uncover unforeseen patterns, such as finding potential friends with similar interests in social media, analyzing social opinions, etc. Community detection approaches mainly capture the tendency of nodes in a graph to create groups based on their similarity and then form communities [54]. Using community detection algorithms in recommendation systems, similar users or items can be grouped. As an added advantage, these systems can benefit from employing community detection techniques to increase their rating prediction accuracy and better handle cold start and data sparsity challenges in recommender systems. For instance, in [39], the authors integrated community detection and association rule mining to develop a novel recommendation system for solving the cold start issue of context-aware recommender systems. In [37], a personalized recommendation system is developed by applying a user community detection algorithm. To this end, a time-aware matrix is generated to grasp the user’s interest accurately. Moreover, users and items with a high correlation are grouped together using the difference equation. Deebak et al. [38] proposed a community-based trust-aware recommendation system for big data cloud service networks. In order to find similar preferences, trusted neighbors within the community of the target user are merged to express the users’ preferences. In [40], an e-commerce recommendation system is proposed using the deep learning-based clustering representation. In this study, the neighbor factor and time function are employed to choose the adjacent item set. To process large-scale users’ data, they designed a collaborative filtering-based model using stacked denoising autoencoder and time mean models. In [55], a graph embedding-based method is developed to identify group shilling attacks in collaborative filtering recommendation systems. Initially, the authors of this paper analyzed user ratings to create a user network, and then they embed each node in the user relationship graph with a low-dimensional vector representation. Finally, they used a clustering method to identify candidate communities considering the produced user characteristics.

2.3 Time-Aware Recommendation Systems

The focus of time-aware recommendation systems has been primarily on how to model users’ preferences by taking into consideration temporal dynamics. The temporal models emphasize the fact that users’ preferences often drift over time as their behavior changes [32]. Using two unified methods that consider the spatial, textual, and temporal elements simultaneously, Kefalas et al. [21] proposed a time-aware recommendation system. Additionally, their method considered the temporal dimension and assessed the impact of time on different time intervals. In [34], a collaborative filtering-based personalized recommendation system is developed for the Internet of Things (IoT) services. The authors of this paper integrated the time correlation coefficient and a clustering technique to increase the efficiency of final recommendations. By modeling the evolution of implicit user preferences over time, the authors of [36] developed a time-aware music recommendation system. In this paper, to provide users with better recommendations, the collaborative filtering method captures daily listening habits in order to characterize their listening habits. To capture the changing of users’ preferences over time, in [20], the user bias is considered as a measure that changes over time. Initially, they examined the time-varying effect on user bias and item bias in the context-aware recommendation system and then developed a time-varying bias tensor factorization-based recommendation method. Ngaffo et al. [35] developed a time-aware recommendation system to deal with the problem of service targeting. By inferring implicit trust relationships, this developed system improved recommendation reliability. Therefore, both direct and indirect trust relationships are calculated to alleviate the data sparsity problem while addressing the recommendation credibility issue. Ahmadian et al. [33] developed a time-aware recommendation system by integrating temporal reliability and confidence criteria. In their study, the predictability is appraised utilizing a temporal reliability criterion that takes into consideration the changes in users’ preferences over time.

2.4 Reliability-Based Recommendation Systems

In addition to the accuracy of recommendation, its reliability also plays a vital role in enhancing the efficiency of recommendation system as it can determine how much the users can trust the recommendations provided by the underlined recommendation system [56,57,58]. Hernando et al. [48] investigated the possible correlation between the accuracy of recommendations and their reliability scores, and showed that incorporating a reliability measure into a recommendation system can improve the effectiveness of such a system. Their analysis showed that there is a significant correlation between the accuracy and reliability of recommendation. In [49], the authors proposed a social recommender model by incorporating reliable implicit connections between users into the recommendation procedure aiming at addressing the data sparsity issue. For this purpose, they introduced a reliability criterion to assess the correctness of the implicit relationships. The authors of [50] attempted to fill the gap of lack of research in reliability-based recommendation systems by proposing two reliability measures. One of these measures refers to the prediction quality and another one focuses on the recommendation quality. They experimentally proved that the efficiency of recommendation models will be enhanced if the developed reliability measures are associated with the predictions. In [51], a reliability-based deep recommendation method is introduced to improve the representation learning of users’ preferences. The main contribution of their work is to expand the user’s rating profile by adding additional implicit ratings whose correctness is assessed by some reliability measure. This yields a denser user–item interaction matrix, which can considerably increase the efficiency of deep learning models in deriving hidden features of users. Margaris et al. [52] showed that the implicit ratings obtained by analyzing the textual reviews may not exactly reflect the user’s preferences as these ratings have a level of uncertainty. To address this issue, they proposed to utilize a reliability measure, which exploits the features of textual reviews to evaluate the uncertainty of an implicit rating. Accordingly, the reliability measure is employed in both the user-similarity function and rating prediction procedure simultaneously. Bernoulli distribution is employed in a matrix factorization methodology to propose an effective model-based collaborative filtering recommendation system named Bernoulli Matrix Factorization (BeMF) [53]. Unlike previous matrix factorization models, BeMF is able to generate both the prediction and reliability scores simultaneously. Through extensive experiments, it is shown that a prediction with more reliability will result in selecting recommendations with more quality for users.

It is worth mentioning that none of the above-mentioned reliability-based recommendation models consider the health factors in calculating the reliability score of recommendations. They mainly emphasize enhancing the accuracy of recommendations by taking into account the factors affecting increasing the level of accuracy. However, in the food recommendation context, considering the health factors is very critical as it yields more healthy recommendations. Therefore, in this paper, we propose a health-aware reliability measurement, which is based on both health and accuracy factors simultaneously to be incorporated into the food recommendation process.

3 Proposed System

This section describes the developed system, referred to as Food Recommendation System based on Health-aware Reliability measurement (in short, FRSHR), which is organized into five main steps: (1) Community detection, (2) Rating prediction, (3) Health-aware reliability measurement, (4) Neighborhood refinement, and (5) Making recommendation. In the first step, a community detection approach is introduced based on the Laplacian node centrality measure to cluster users in appropriate communities in which the users who belong to the same community have similar preferences. Therefore, the obtained communities are utilized as the nearest neighbors set of users to predict unknown ratings in the second step. In the third step, a reliability measurement is proposed based on both accuracy and health criteria to evaluate the effectiveness of the predicted ratings. This measurement is essential to determine how the recommendations are accurate and healthy. The predicted ratings whose reliability value is lower than a threshold value, are considered unreliable predictions. For these ratings, in the fourth step, a neighborhood refinement mechanism is developed to remove ineffective neighbors from the neighborhood of the target user based on confidence measure score. Finally, in the fifth step, the unreliable predictions are recalculated based on the refined neighborhood to enhance the accuracy and health of recommendations. Figure 1 illustrates the general schema of the proposed FRSHR method. The notations and their brief descriptions are tabulated in Table 2. We discuss the details of this method in the following subsections.

Fig. 1
figure 1

The general schema of the developed FRSHR model

Table 2 The notations used in this work and their brief descriptions

3.1 Community Detection

Identifying communities of nodes that share some similarities with each other is a vital social network analysis task. It is typical for a community detection algorithm to identify clusters of nodes with high intracommunity similarity and low intercommunity similarity. Within a community, nodes have high pairwise similarity scores while those outside the community have low pairwise similarity scores. Generally, let \(G=(V, E,W)\) be a graph, where \(V\) is the set of nodes, \(E\) is the set of edges denoting the relationships between the nodes, \(W\in {R}^{\left|V\right|\times \left|V\right|}\) is a matrix to represent the weights of the edges, and \(|V|\) is the total number of nodes. In this section, a novel community detection algorithm is developed to group users into appropriate communities based on the user–food interaction matrix. To this end, we consider each user as a node in the graph \(G\), where \(V\) and \(E\) represent the set of all users and the relationship links between them, respectively. Moreover, we use a time-aware similarity measure to construct the weight matrix \(W\) where \({w}_{u,v}\in W\) represents the time-aware similarity value between users \(u\) and \(v\). The main advantage of this measure is to take into account the changes of user’s preferences during different time periods. This is calculated as follows:

$${w}_{u,v}=\frac{\sum_{i\in {I}_{u,v}}(({r}_{u,i}-{\overline{r} }_{u})\times ({r}_{v,i}-{\overline{r} }_{v})\times {TW}_{u,v,i})}{\sqrt{\sum_{i\in {I}_{u,v}}{(({r}_{u,i}-{\overline{r} }_{u})}^{2}\times {TW}_{u,v,i})}\sqrt{\sum_{i\in {I}_{u,v}}({({r}_{v,i}-{\overline{r} }_{v})}^{2}\times {TW}_{u,v,i})}},$$
(1)

where \({I}_{u,v}\) indicates a set of foods that have been rated by both users \(u\) and \(v\), \({r}_{u,i}\) is the rating of food \(i\) assigned by user \(u\), the average of ratings of user \(u\) is denoted by \({\overline{r} }_{u}\), and \({TW}_{u,v,i}\) represents a time weight associated with the ratings of users \(u\) and \(v\) assigned to food \(i\), which is defined as follows:

$${TW}_{u,v,i}=\sqrt{{e}^{-\lambda (TL-{t}_{u,i})}\times {e}^{-\lambda (TL-{t}_{v,i})},}$$
(2)

where \({t}_{u,i}\) refers to the timestamp of the rating of food \(i\) assigned by user \(u\), \(TL\) is the maximum value of timestamps in the system, and \(\lambda\) denotes a preset parameter that adjusts the importance of time factor in the time-aware similarity expression. It is obvious that the greater (resp. lower) the value of \(\lambda\), the higher (resp. lower) the impact of time factor in the similarity score.

After constructing the graph \(G=(V, E,W)\), the proposed community detection algorithm can be applied to generate appropriate communities of users. This algorithm consists of four main steps: (1) Graph sparsing (2) Node centrality calculation, (3) Community expansion, and (4) Community merging. In the remainder of this subsection, these four steps are detailed.

3.1.1 Graph Sparsing

The time-aware similarity measure maps the user space of recommender system into a fully weighted and connected graph \(G=(V, E,W)\). It should be noted that the existence of edges with low weights can negatively impact the performance of the developed system as making recommendations according to the opinions of users with low similarity values results in reducing the accuracy of recommendations. Moreover, removing ineffective edges from the original graph is helpful to increase the scalability of the proposed community detection algorithm. Accordingly, in this step, the edges whose associated weights are lower than \(\theta\) are removed from the original graph to make it sparser. \(\theta\) is an adjustable parameter that takes values in the unit interval \([0, 1]\). Setting high value for \(\theta\) results in removing further edges from the graph (i.e., obtaining a sparser graph), while setting a low value for \(\theta\) yields a larger graph (i.e., obtaining a denser graph). Therefore, a sensitivity analysis is needed to determine an optimal value for this parameter, which leads to achieving more precise recommendations by the developed system.

3.1.2 Node Centrality Calculation

Central nodes, as the community’s core nodes, play a major role in the efficiency of the community detection algorithm. A node with high node centrality in a graph will be penetrable on its neighbors according to network structure. Those nodes located in a center position in the community and have a substantial number of connections to other nodes of the community will probably impact group consistency, while those from the bordering communities may act as a mediator between different communities. Therefore, considering the node centrality measure is a good way to detect the central nodes in a given graph. To this end, we use a Laplacian centrality measure to calculate the centrality of each node [59]. Formally, for the graph \(G=(V, E, W)\), \(X\) is defined as below:

$$X\left(G\right)=\left(\begin{array}{cccc}{X}_{1}& 0& ...& 0\\ 0& {X}_{2}& ...& 0\\ .& .& .& .\\ 0& 0& ...& {X}_{n}\end{array}\right),$$
(3)

where \({X}_{i}\) refers to the sum weight of node \({v}_{i}\), which is defined as \({X}_{i}=\sum_{j=1}^{\left|V\right|}{w}_{i,j}\), where \({w}_{i,j}\) is the weight of the edge connecting nodes \(i\) and \(j\) calculated using Eq. (1). Moreover, the Laplacian Energy of the graph \(G\) is computed as follows:

$${E}_{L}\left(G\right)=\sum\limits_{i=1}^{\left|V\right|}{{X}_{i}}^{2}+2\sum\limits_{i<j}{w}_{i,j.}^{2}$$
(4)

Eventually, the Laplacian centrality \({C}_{L}({v}_{i},G)\) of node \({v}_{i}\) can be computed as below:

$${C}_{L}\left({v}_{i},G\right)=\frac{(\Delta {\rm E}{)}_{i}}{{E}_{L}\left(G\right)}=\frac{{E}_{L}\left(G\right)-{E}_{L}\left({G}_{i}\right)}{{E}_{L}\left(G\right)},$$
(5)

where \({G}_{i}\) is the graph obtained by removing \({v}_{i}\) from graph \(G\).

3.1.3 Community Expansion

After calculating the Laplacian centrality of all nodes in the graph \(G\), a node with the highest Laplacian centrality value is selected as the center of a community named \(c\). Then, the selected center is expanded to generate a new community by adding the remaining nodes having important dependencies with this center. For this purpose, in our developed community detection method, a gain function is employed to determine how much a node \({v}_{i}\) belongs to a community \(c\). More specifically, the gain in modularity \(\Delta Q\) obtained by moving an isolated node \({v}_{i}\) into the community \(c\) is calculated as below [60]:

$$\Delta Q = \left[ {\frac{{\mathop \sum \nolimits_{in} + k_{i,in} }}{2m} - \left( {\frac{{\mathop \sum \nolimits_{tot} + k_{i} }}{2m}} \right)^{2} } \right] - \left[ {\frac{{\mathop \sum \nolimits_{in} }}{2m} - \left( {\frac{{\mathop \sum \nolimits_{tot} }}{2m}} \right)^{2} - \left( {\frac{{k_{i} }}{2m}} \right)^{2} } \right] ,$$
(6)

where \(\sum_{in}\) is the total weight of the edges within community \(c\), \(\sum_{tot}\) is the total weight of the edges incident to nodes in \(c\), \({k}_{i}\) is the total weight of the edges incident to node \({v}_{i}\), \({k}_{i,in}\) is the total weight of the edges from node \({v}_{i}\) to the nodes in \(c\), and \(m\) is the total weight of all edges in the graph.

If the gain value of node \({v}_{i}\) to community \(c\) is greater than zero, it will be added to this community. Moreover, the neighbors of this node are considered as candidates for expansion. There are iterations until there are no more nodes for expansion, at which point the community is detected. The nodes that belong to this detected community will be removed from the list of nodes. After all nodes in the graph are accounted for, the node with the highest Laplacian centrality value is selected and expanded. This process is iterated until all communities have been identified.

3.1.4 Community Merging

Trivial communities refer to those communities that do not contain sufficient members. The use of these communities as the nearest neighbors set of the users who belong to, results in the reduction of recommendation performance. Therefore, in this step, these trivial communities are detected and then merged with other communities. As a result of the merge procedure, each community is regarded as a node of representation. The weight of each representative node is calculated by adding up all the weights of the edges within a community. Whenever two communities have an edge, a weighted edge is made between their representative nodes, where the total weights of the edges between these communities are added together. If the edge weight of the node is less than the edge weight of its neighbors, it is merged, and the graph will be updated. The merge process is repeated until no node is merged with its neighbors. Accordingly, merging trivial communities will improve the quality of identified communities. The pseudo-code of the proposed community detection algorithm is demonstrated in Algorithm 1.

3.2 Rating Prediction

In this step, users’ communities detected by the proposed community detection method are used to predict unknown ratings for users. Specifically, the users belonging to each community are considered as nearest neighbors where their opinions about unknown ratings are utilized in the rating prediction process. Let \({K}_{u,i}\) be a set of users belonging to the community that user \(u\) belongs to and have rated food \(i\). Therefore, these users are considered as nearest neighbors of user \(u\) to predict the rating of unknown food \(i\) using the following equation:

$${P}_{u,i}={\overline{r} }_{u}+\frac{\sum_{v\in {K}_{u,i}}{w}_{u,v}\times ({r}_{v,i}-{\overline{r} }_{v})}{\sum_{v\in {K}_{u,i}}{w}_{u,v}},$$
(7)

where \({P}_{u,i}\) is the predicted rating of food \(i\) for user \(u\), \({\overline{r} }_{u}\) corresponds to the average of ratings assigned by user \(u\), \({r}_{v,i}\) denotes the rating of food \(i\) assigned by user \(v\), and \({w}_{u,v}\) refers to the similarity value between users \(u\) and \(v\) calculated using Eq. (1).

figure a

3.3 Health-Aware Reliability Measurement

Food recommendation systems may be more critical than other types of recommendation models such as movie, music, and book recommenders due to the fact that their produced recommendations have a significant impact on the health of users. Therefore, it can be inferred that evaluating the reliability of recommendations, which considers appropriate health factors is a crucial task in food recommendation systems. However, previous food recommendation methods did not consider such a factor. In addition to the health factors, it is needed to consider the accuracy of recommendation, which is very important in increasing users’ satisfaction. To address these issues, in this step, we develop a novel health-aware reliability measurement, which simultaneously takes into account the health and accuracy factors to evaluate the predicted ratings [i.e., \({P}_{u,i}\) calculated using Eq. (7)]. The proposed health-aware reliability measurement is defined based on three factors in which the first one is related to the health degree of recommendation, while the second and third ones refer to the recommendation accuracy. The details and concepts of these three factors are discussed below.

First factor: This factor models the health degree of recommendation by considering seven nutrition components of food including proteins, carbohydrates, sugars, sodium, fat, saturated fats, and fibers. World Health Organization (WHO) has provided an appropriate range for each of these nutrition components, which shows how much their amounts should be to generate a healthy food. The healthy ranges of nutrition components provided by WHO are shown in Table 3. The main idea of the first factor is that the neighbors who are interested in healthy foods can improve the effectiveness of predicted ratings from the health point of view. Therefore, we calculate the health degree of foods that have been rated by each neighbor of the target user. In other words, the health degree of each neighbor \(v\in {K}_{u,i}\) is measured by calculating the summation of the health degree of all foods rated by user \(v\). Accordingly, the health degree of neighbor \(v\in {K}_{u,i}\) is calculated as follows:

$$\mathrm{HU}\left(v\right)=\frac{\sum_{i\in {I}_{v}}\mathrm{HF}\left(i\right)}{\left|{I}_{v}\right|},$$
(8)

where \(\mathrm{HU}\left(v\right)\) denotes the health degree of neighbor \(v\in {K}_{u,i}\), \({I}_{v}\) is the set of items (foods) rated by user \(v\), and \(\mathrm{HF}\left(i\right)\) refers to the health degree of food \(i\) which is calculated as follows:

$$\mathrm{HF}\left(i\right)=\mathrm{LabelWHO}\left(\mathrm{proteins}\right)+\mathrm{LabelWHO}\left(\mathrm{carbohydrates}\right)+\mathrm{LabelWHO}\left(\mathrm{sugars}\right)+\mathrm{LabelWHO}\left(\mathrm{sodium}\right)+\mathrm{LabelWHO}\left(\mathrm{fat}\right)+\mathrm{LabelWHO}\left(\mathrm{saturated fats}\right)+\mathrm{LabelWHO}\left(\mathrm{fibers}\right),$$
(9)

where for each nutrition component, \(\mathrm{LabelWHO}\left(\dots \right)=1\) if the amount of the nutrition component is in the ideal range shown in Table 3, otherwise; \(\mathrm{LabelWHO}\left(\dots \right)=0\). Therefore, it can be concluded that the value of \(\mathrm{HF}\left(i\right)\) is in the range of \([0\left(\mathrm{unhealthy}\right),7\left(\mathrm{healthy}\right)]\).

Table 3 The healthy ranges of nutrition components provided by WHO (https://www.who.int/health-topics/nutrition)

After calculating the health degree of neighbors of the target user, the first factor of the reliability of the predicted rating \({P}_{u,i}\) is determined using the following equation:

$${f}_{H}\left({H}_{u,i}\right)=1-\frac{\overline{H} }{\overline{H }+{H}_{u,i}},$$
(10)

where \(\overline{H }\) corresponds to the median of the \({H}_{u,i}\) values, and \({H}_{u,i}\) denotes the summation of the health degree of all neighbors of the target user \(u\) who have rated food \(i\), which is calculated as follows:

$${H}_{u,i}=\sum_{v\in {K}_{u,i}}HU\left(v\right),$$
(11)

where \({K}_{u,i}\) denotes the set of users who belong to the community that the target user \(u\) also belongs to, and have rated food \(i\). The factor \({f}_{H}\left({H}_{u,i}\right)\) has a positive effect on the reliability of the predicted rating \({P}_{u,i}\) from the health point of view, where the reliability value increases when the factor \({f}_{H}\left({H}_{u,i}\right)\) is increased.

Second factor: Although the health degree of recommended foods is an important factor, the relevancy of recommendations to the users’ tastes should also be considered in the proposed reliability measurement as it directly impacts the satisfaction level of users from the system. To this end, we utilize the summation of similarity values between the target user \(u\) and other users in his/her neighbors set as a positive factor for the proposed reliability measurement. A higher similarity value between two users indicates their close opinions. Therefore, when the unknown rating is predicted based on the opinions of neighbors who have high similarity scores with the target user, it is expected to have a more accurate prediction leading to a more reliable recommendation from the accuracy point of view. To obtain the second factor of the proposed reliability measurement, we first calculate the overall similarity of the target user \(u\) and all his/her neighbors using the following equation:

$${S}_{u,i}=\sum_{v\in {K}_{u,i}}{w}_{u,v},$$
(12)

where \({w}_{u,v}\) refers to the similarity value between users \(u\) and \(v\) calculated using Eq. (1). Then, the second factor of the proposed reliability measurement is defined as follows:

$${f}_{S}\left({S}_{u,i}\right)=1-\frac{\overline{S} }{\overline{S }+{S}_{u,i}},$$
(13)

where the median of the values of \({S}_{u,i}\) is denoted by \(\overline{S }\). High value of the factor \({f}_{S}\left({S}_{u,i}\right)\) increases the reliability of the predicted rating \({P}_{u,i}\).

Third factor: Another factor that can impact the quality of the predicted rating is the variance of the ratings that have been assigned to the target item by the neighbors of the target user. We use this factor as the third factor with a negative effect on the proposed reliability measurement. In other words, the quality of the predicted rating decreases when the value of this variance increases, and vice versa. The main idea behind this factor is that when the neighbors have closer opinions on the target item, we can expect to achieve a higher reliability value for the predicted rating. On the other hand, when the variance of the opinions provided by the neighbors is high, it causes a decline in the reliability value of the predicted rating. Accordingly, we calculate the third factor by:

$${f}_{V}\left({V}_{u,i}\right)={\left(\frac{\mathrm{max}R-\mathrm{min}R-{V}_{u,i}}{\mathrm{max}R-\mathrm{min}R}\right)}^{\gamma },$$
(14)

where \(\mathrm{min}R\) and \(\mathrm{max}R\) denote the lower bound and upper bound of the ratings provided in the system, respectively, and \({V}_{u,i}\) is defined as:

$${V}_{u,i}=\frac{\sum_{v\in {K}_{u,i}}{w}_{u,v}\times {({r}_{v,i}-{\overline{r} }_{v}-{P}_{u,i}+{\overline{r} }_{u})}^{2}}{\sum_{v\in {K}_{u,i}}{w}_{u,v}},$$
(15)

and \(\gamma\) is calculated as below:

$$\gamma =\frac{\mathrm{ln}0.5}{\mathrm{ln}\frac{\mathrm{max}R-\mathrm{min}R-\overline{V}}{\mathrm{max }R-\mathrm{min}R}},$$
(16)

where the median of the values of \({V}_{u,i}\) is shown by \(\overline{V }\).

The proposed health-aware reliability measurement is calculated based on the three defined factors. Accordingly, we integrate these factors using the geometric average function. Before using this function, we should determine the weights of these factors. The first factor \({f}_{H}\left({H}_{u,i}\right)\) and the second factor \({f}_{S}\left({S}_{u,i}\right)\) are not dependent on any other factor. Therefore, the weights of these factors should be set to 1 in the geometric average function. Moreover, as the value of the third factor \({f}_{V}({V}_{u,i})\) depends on the value of the second factor \({f}_{S}\left({S}_{u,i}\right)\), its weight in the geometric average function should be set to \({f}_{S}\left({S}_{u,i}\right)\). As a result, the proposed health-aware reliability measurement for a predicted rating \({P}_{u,i}\) is formulated as follows:

$${\mathrm{Rel}}_{u,i}={\left({{f}_{H}\left({H}_{u,i}\right).f}_{S}\left({S}_{u,i}\right).{{f}_{V}\left({V}_{u,i}\right)}^{{f}_{S}\left({S}_{u,i}\right)}\right)}^{\frac{1}{2+{f}_{S}\left({S}_{u,i}\right)}}.$$
(17)

As it is shown in Eq. (17), the proposed health-aware reliability measurement takes into account three different factors where the first factor \({f}_{H}\left({H}_{u,i}\right)\) refers to the health degree of the predicted rating and the other two factors \({f}_{S}\left({S}_{u,i}\right)\) and \({f}_{V}({V}_{u,i})\) determine the quality of the predicted rating. Hence, the proposed measurement can simultaneously appraise both health and quality criteria for recommended foods.

3.4 Neighborhood Refinement

Some users’ neighbors negatively impact the accuracy of predictions because they fail to provide enough information to produce recommendations. It is possible to improve the predictions by analyzing their effectiveness and then recalculating those with low quality. In order to evaluate the confidence levels of users’ neighbors in their neighborhood, a novel measurement is proposed in this study. To determine the effectiveness of the neighbors in the neighborhood of users, the developed confidence measurement is calculated in accordance with the developed reliability measurement. Specifically, the health-aware reliability measurement [see Eq. (17)] is used to determine the effectiveness of predicted ratings. It is possible to identify those ratings with low-reliability values by calculating the reliability values of the predicted ratings. In order to recalculate these low-reliability ratings, ineffective neighbors within the neighborhood of the target user need to be ignored. Hence, the predicted ratings whose reliability value is lower than the threshold value (\(\alpha\)) are regarded as unreliable. The proposed mechanism should be used to recalculate these ratings to create more accurate and healthful recommendations. To this end, an innovative method of determining the confidence score associated with a given pair of users taking into account their neighbors is introduced. This confidence score is calculated as below:

$${C}_{u,v}=\frac{\sum_{i\in {I}_{u,v}}({r}_{u,i}-{\overline{r} }_{u})\times ({r}_{v,i}-{\overline{r} }_{v})\times {\mathrm{Rel}}_{u,i}}{\sqrt{\sum_{i\in {I}_{u,v}}{Rel}_{u,i}\times {({r}_{u,i}-{\overline{r} }_{u})}^{2}}\sqrt{\sum_{i\in {I}_{u,v}}{Rel}_{u,i}\times {({r}_{v,i}-{\overline{r} }_{v})}^{2}}},$$
(18)

where \({C}_{u,v}\) indicates the confidence score between users \(u\) and \(v\), \({\mathrm{Rel}}_{u,i}\) is the reliability score defined using Eq. (17), and \({I}_{u,v}\) corresponds to the subset of foods that have been rated by both users \(u\) and \(v\). The developed confidence measurement specifies how much a user positively influences the neighborhood-based quality index and health component of the recommended foods. As a result, our measure is defined based on the correlation of foods that have been rated by both users \(u\) and \(v\) (i.e., \({I}_{u,v}\)). Also, this confidence measure is calculated by taking into account the reliability of the ratings. Based on the calculated confidence scores, the neighbors whose confidence scores are less than some threshold value (\(\beta\)) are considered ineffective for the prediction task. In this way, the predictions are deemed more accurate because these users are ignored when predicting unknown ratings. As a result, the predicted rating \({P}_{u,i}\) whose reliability score is less than \(\alpha\) can be recalculated utilizing a new neighborhood defined as below:

$${K}_{u,i}^{\mathrm{new}}=\left\{v\in {K}_{u,i}|{C}_{u,v}>\beta \right\},$$
(19)

where \({K}_{u,i}^{\mathrm{new}}\) indicates the new neighborhood with the most confident neighbors of the user \(u\) and food \(i\), \({K}_{u,i}\) is the initial neighbors set of the target user (i.e., the community that the target user belongs to), \({C}_{u,v}\) denotes the confidence value between users \(u\) and \(v\) that is calculated using Eq. (18), and \(\beta\) is a pre-defined threshold value. It is evident from Eq. (19) that the new neighbors of the target user include users whose confidence values are higher than the threshold value \(\beta\). By considering only the users with high confidence value in generating the new neighborhood, it can ensure that the new neighborhood is more reliable than the initial neighbors set. This results in a more accurate and healthy food recommendation. The main advantage of this process is the removal of the ineffective/low-confidence neighbors from the neighborhood of users and the recalculation of the unreliable ratings.

3.5 Making Recommendation

The final step of the proposed method uses an updated neighborhood of users to recalculate the unreliable predicted ratings. By using neighbors with high confidence as predictors, the proposed recommendation process results in an increased accuracy and health-aware of recommended foods. More specifically, the updated rating of food \(i\) given by user \(u\) is defined as below:

$${P}_{u,i}^{\mathrm{new}}={\overline{r} }_{u}+\frac{\sum_{v\in {K}_{u,i}^{\mathrm{new}}}{w}_{u,v}\left({r}_{v,i}-{\overline{r} }_{v}\right)}{\sum_{v\in {K}_{u,i}^{new}}{w}_{u,v}},$$
(20)

where \({P}_{u,i}^{\mathrm{new}}\) corresponds to the updated rating of food \(i\) for user \(u\), \({\overline{r} }_{u}\) indicates the average value of the ratings provided by user \(u\), \({K}_{u,i}^{\mathrm{new}}\) is the new neighbors set of the target user \(u\) that is calculated using Eq. (19), \({w}_{u,v}\) is the similarity measure between users \(u\) and \(v\) which is computed using Eq. (1), and \({r}_{v,i}\) denotes the rating of food \(i\) given by user \(v\). The essential difference between the primary predicted rating acquired by Eq. (7) and the updated rating predicted using Eq. (20) is that the new neighborhood of the target user (\({K}_{u,i}^{\mathrm{new}}\)) is employed to compute the updated rating rather than relying on the initial neighbors set (\({K}_{u,i}\)). As \({K}_{u,i}^{\mathrm{new}}\) contains more confident users than \({K}_{u,i}\), it is expected to achieve a more reliable and accurate recommendation by recalculating the unreliable predicted ratings. Eventually, a list of foods is recommended to the target user by sorting the predicted ratings and selecting those with the highest values. Algorithm 2 indicates the pseudo-code of the developed FRSHR recommendation system.

3.6 Computational Complexity

Computational complexity analysis is an important issue to determine how much an algorithm is complex with respect to its execution time. This analysis also can clarify whether an algorithm is practical in real applications. To perform this analysis, we should calculate the computational complexity of each step of the proposed method, and then sum up the calculated complexity of all steps as the total complexity. Suppose \(\left|U\right|\) and \(\left|I\right|\) are the number of users and foods in the system, respectively. In the first step of the proposed FRSHR method, a community detection approach is applied with the complexity of \(O\left({\left|U\right|}^{2}\left|I\right|\right)\). Also, the second step is performed with the complexity of \(O\left(\left|U\right|\left|I\right|\right)\) to calculate the unknown ratings. Then, the proposed health-aware reliability measurement is calculated for the predicted ratings in the third step whose complexity is \(O\left({\left|U\right|}^{2}\left|I\right|\right)\). In the fourth step of FRSHR, the unreliable predictions are recalculated using the neighborhood refinement strategy, which is performed with the complexity of \(O\left({\left|U\right|}^{2}\left|I\right|\right)\). Finally, the recommendation process is applied to make a recommendations list to the target user in the fifth step, which takes the complexity of \(O\left(\left|U\right|\left|I\right|\right)\). Consequently, the total complexity of the proposed FRSHR method is obtained by summing up the complexity of all steps as \(O\left(3{\left|U\right|}^{2}\left|I\right|+2\left|U\right|\left|I\right|\right)\), which can be considered as \(O\left({\left|U\right|}^{2}\left|I\right|\right)\).

figure b

4 Experimental Results and Discussion

In this section, we design extensive experiments to verify the effectiveness of the proposed FRSHR model based on a real-world dataset, then the results are reported and discussed in terms of different evaluation metrics. To make a comparison, several state-of-the-art food recommendation approaches are selected as baselines, and their main ideas are briefly described in the following:

  • LDA [44]: A food recommendation approach that analyzes the healthiness of recipes and attempts to make more healthy recommendations to users.

  • HPRR [61]: A health-aware food recommendation approach that introduces a healthy bias and a healthy tag to be utilized in the recommendation process to produce both healthy and accurate recommendations.

  • CNNFR [13]: This approach utilizes Convolution Neural Network (CNN) to recognize the ingredients of recipes from their images and employ these ingredients to design a food recommendation system.

  • HAFR [41]: This model is based on a hierarchical attention network, which employs user’s preferences, food’s ingredients, and recipe’s visual images to develop a food recommendation system.

  • CFRR [62]: A hybrid health-aware food recommendation model, which integrates both collaborative filtering and content-based approaches to design a recommendation procedure. Also, it filters out those foods whose calories are higher than the target user’s calorie intake requirement from the recommendations list to generate more healthy recommendations.

  • FGCN [43]: A food recommendation method that uses a graph convolutional network to propagate information deeply in three networks constructed based on ingredient–ingredient, ingredient–recipe, and recipe–user relations.

  • HGAT [63]: This food recommendation method employs a hierarchical graph attention network to exploit information from the user’s preferences and recipe’s ingredients. Moreover, a ranking-based objective function is introduced to optimize the food recommendation approach.

4.1 Dataset

As part of the evaluation of our model, we crawled a dataset from the www.Allrecipes.com food social network and extracted 52,821 foods for the period 2000–2018. Accordingly, users’ ratings, food nutrition, and timestamps are crawled for each food. The ratings assigned to foods by users are in the range of \([1, 5]\) where the rating score of 1 for a food indicates that the user has no interest in this food, while the rating score of 5 shows the highest interest. The rating of a variety of foods is used to generate implicit feedback, indicating whether the users interacted with foods. After preprocessing the crawled dataset, 68,768 users, 45,630 foods, and 1,093,845 ratings were obtained in total.

4.2 Evaluation Metrics

To carry out the experiments, five evaluation metrics: precision, recall, F1, Normalized Discounted Cumulative Gain (NDCG), and health are employed to assess the recommendation results of the compared models. Precision and recall evaluate the recommendations list provided by food recommendation models by taking into account the proportion of recommendations that are relevant and the proportion of all relevant items in the recommendations list, respectively. Accordingly, the following equations define these metrics:

$$\mathrm{precision}=\frac{\left|\left\{\text{recommended items that are relevant}\right\}\right|}{L},$$
(21)
$$\mathrm{recall}=\frac{\left|\left\{\text{recommended items that are relevant}\right\}\right|}{\left|\left\{\text{all relevant items}\right\}\right|},$$
(22)

where the length of recommendations list is represented by \(L\). Moreover, F1 metric calculates the harmonic mean of the precision and recall metrics as follows:

$$F1=\frac{2\times \mathrm{precision}\times \mathrm{recall}}{\mathrm{precision}+\mathrm{recall}}.$$
(23)

One of the important evaluations is to investigate the ability of recommendation models in providing a recommendations list in which the relevant items are placed on the top positions of this list. To do this, NDCG is a useful metric that can be calculated using the following equation:

$$\mathrm{NDCG}=\frac{\mathrm{DCG}}{{\mathrm{DCG}}_{\mathrm{max}}},$$
(24)

where

$$\mathrm{DCG}={\mathrm{rel}}_{1}+\sum_{i=2}^{L}\frac{{\mathrm{rel}}_{i}}{{\mathrm{log}}_{2}(i+1)}$$
(25)

and \({\mathrm{rel}}_{i}\) is used to determine whether the recommended food \(i\) is relevant. Hence, if the recommended food \(i\) is relevant, then \({rel}_{i}=1\); otherwise, \({rel}_{i}=0\). Also, \({\mathrm{DCG}}_{\mathrm{max}}\) is calculated as follows:

$${\mathrm{DCG}}_{\mathrm{max}}=1+\sum_{i=2}^{L}\frac{1}{{\mathrm{log}}_{2}(i+1)}.$$
(26)

In addition to the above accuracy metrics, we define a novel metric to evaluate the recommendations list based on the health level of the recommended foods. This metric is very important to determine how much the recommended foods are healthy, which is defined as follows:

$$\mathrm{Health}=\frac{\sum_{i=1}^{L}HF\left({f}_{i}\right)}{7\times L},$$
(27)

where \(\mathrm{HF}({f}_{i})\) refers to the health factor of food \({f}_{i}\) which is calculated using Eq. (9). It is worth mentioning that the value of the health metric can be in the interval \([0, 1]\) where \(\mathrm{Health}=0\) means that the recommendations list is completely unhealthy while \(\mathrm{Health}=1\) means that the recommendations list is completely healthy.

4.3 Experimental Setup and Parameter Analysis

The proposed FRSHR model has four input parameters, which their values should be initialized before performing the experiments. The first parameter is \(\lambda\), which determines the importance of time factor in calculating time-aware similarity values [see Eq. (2)]. \(\theta\) is the second parameter used in the graph sparsing step of the proposed community detection approach to remove the edges with associated weights lower than \(\theta\) from the original graph to make the graph sparser. The third parameter used in the neighborhood refinement step of the proposed FRSHR model is \(\alpha\), which determines unreliable predicted ratings whose reliability values are less than \(\alpha\). Finally, \(\beta\) is the fourth parameter used in the neighborhood refinement step [i.e., Eq. (19)] to identify ineffective neighbors whose confidence values are less than \(\beta\). To compare the proposed FRSHR model, we set the values of these input parameters as \(\lambda =2.5\), \(\theta =0.6\), \(\alpha =0.7\), and \(\beta =0.6\) that obtain promising results for the proposed method. It is worth mentioning that these values are achieved by performing a sensitivity analysis on different possible values of the parameters. The results of this analysis are shown in Figs. 2, 3, 4, 5. To conduct the experiments, we make the training set by considering the earliest 80% of ratings in the used dataset based on their timestamps, while the remaining 20% of recent ratings are considered as the test set.

Fig. 2
figure 2

Experimental results regarding the sensitivity analysis of the parameter \(\lambda\)

Fig. 3
figure 3

Experimental results regarding the sensitivity analysis of the parameter \(\theta\)

Fig. 4
figure 4

Experimental results regarding the sensitivity analysis of the parameter \(\alpha\)

Fig. 5
figure 5

Experimental results regarding the sensitivity analysis of the parameter \(\beta\)

It is clear that the efficiency of the proposed FRSHR model can change when we use different values of the input parameters. Therefore, it is needed to analyze how this efficiency changes with varying the values of the input parameters. To make this analysis, some experiments are designed for each input parameter separately in which the values of the input parameters vary in their defined ranges. Also, we report the results of these experiments in terms of precision, recall, NDCG, and health metrics. F1 metric is not used for these evaluations as it is the harmonic mean of precision and recall metrics and its trend follows the trends of precision and recall metrics. Figure 2 is drawn to show the results of experiments for evaluating the effect of the parameter \(\lambda\) on the effectiveness of FRSHR. The acceptable range defined for this parameter is \([0, 4]\) with the step size 0.5 where the higher value of \(\lambda\) makes a greater effect on the time factor in the similarity calculation function. According to Fig. 2, it is obvious that all used metrics will be increased by changing \(\lambda\) from 0 to 2.5, meaning that the effectiveness of FRSHR is enhanced. Whilst, for the values of \(\lambda\) greater than 2.5, this effectiveness is declined. Thus, we can find out that \(\lambda =2.5\) is an optimal value resulting in achieving the best performance for FRSHR. Another input parameter used in FRSHR is \(\theta\) ranging in the interval \([0, 0.9]\) with the step size 0.1 where the higher value of this parameter leads to removing further edges in the graph sparsing step of the proposed community detection approach. Figure 3 illustrates the performance of FRSHR in relation to different values of the parameter \(\theta\). As we can see from this figure, when the value of \(\theta\) increases meaning that a larger number of edges will be removed from the graph, the performance of FRSHR in terms of all metrics is boosted. Nevertheless, it is worth mentioning that the performance will be declined when the number of removed edges exceeds a threshold value. In other words, setting \(\theta\) to a value higher than 0.6 results in diminishing the performance of FRSHR. As a result, we can use \(\theta =0.6\) as an appropriate value for FRSHR to boost its effectiveness when it is compared with other recommendation models.

To perform the sensitivity analysis for the other two parameters \(\alpha\) and \(\beta\), the same experiments are designed and their results are reported in Figs. 4 and 5, respectively. It should be noted that the parameters \(\alpha\) and \(\beta\) are defined in the proposed FRSHR as two threshold values for the reliability and confidence criteria, respectively. As we have shown in Fig. 4, the performance of FRSHR increases when \(\alpha\) varies from 0.1 to 0.7, then the performance will be declined by exceeding \(\alpha\) from 0.7. When the value of \(\alpha\) increases, it means that the effect of the proposed reliability measurement will be increased. Therefore, we can conclude that the proposed reliability measurement is very helpful in boosting the performance of FRSHR. However, the value of \(\alpha\) higher than 0.7 leads to a decline in the performance meaning that \(\alpha =0.7\) is an important value to be considered when the proposed FRSHR is compared with other models. The main reason behind this issue is that when \(\alpha\) is higher than 0.7, most of the predicted ratings should be recalculated by refining the neighborhood of the target user and removing a large number of neighbors from this neighborhood. As a result, the ability of the proposed method in making predictions for all unknown ratings will be declined. The experimental results related to the sensitivity analysis of the parameter \(\beta\) are shown in Fig. 5. The value of this parameter is changed from 0.1 to 0.9 with a step size of 0.1. The higher value of \(\beta\) will result in removing a larger portion of neighbors from the neighborhood set of the target user and vice versa. These results demonstrate that in most cases, the performance of FRSHR will be boosted according to all evaluation metrics by increasing \(\beta\). Also, it is clearly shown that exceeding \(\beta\) from 0.6 makes a negative impact on the performance of FRSHR. This is due to the reason that when \(\beta\) is set to a value higher than 0.6, a considerable number of neighbors will be removed from the neighborhood of the target user as their confidence value is lower than \(\beta\). In this condition, the recommendation system is not able to completely predict all unknown ratings due to the lack of sufficient neighbors for the target user. Consequently, we can set \(\beta =0.6\) for the proposed FRSHR model to achieve promising recommendation results compared to other recommendation models.

4.4 Performance Comparison

This section analyzes the performance of the developed FRSHR model compared to other recommendation models. As the evaluation measures are computed based on the length of recommendations list (\(L\)), we consider different values of this parameter (i.e., \(L=\mathrm{10,15,20}\)) to make a better comparison. Table 4 contains the experimental results for compared recommendation models in terms of precision, recall, F1, NDCG, and health metrics. These results can be interpreted from two perspectives where the first one is related to the accuracy metrics (i.e., precision, recall, F1, and NDCG) and the second one is related to the health metric. By investigating the results reported in Table 4, it can be clearly concluded that the proposed FRSHR method outperforms other models in terms of all accuracy metrics and different lengths of recommendations list. HGAT model is placed in the second-best position by obtaining more promising results than other models that are compared to the proposed method. The improvement amount of FRSHR compared to HGAT shows the great ability of the proposed method in producing accurate recommendations. One of the main reasons to obtain this achievement is to use the reliability measurement in the FRSHR method and also perform the neighborhood refinement mechanism, which results in removing the ineffective neighbors from the neighborhood of the target user. To analyze the performance of the compared models from the health perspective, we should investigate the experimental results based on the health metric. The results obtained by the proposed FRSHR method based on the health metric are 0.4985, 0.4729, and 0.4401 for \(L=10\), \(L=15\), and \(L=20\), respectively. Whilst, the second-best results for the health metric are 0.4291, 0.4113, and 0.3846 for \(L=10\), \(L=15\), and \(L=20\), respectively, that are obtained by HPRR model. To sum up, the results show the superiority of FRSHR in both accuracy and health criteria compared to other recommendation models. This is mainly due to consideration of both accuracy and health factors in the proposed health-aware reliability measurement leading to making accurate recommendations that at the same time are healthy.

Table 4 The efficiency of the developed FRSHR model compared to other recommendation models in terms of different lengths of recommendations list (\(L\))

4.5 Ablation Study

In this section, we design an ablation study to better analyze the contributions provided by the proposed FRSHR method. It should be noted that FRSHR provides three main contributions: taking into account the time factor by defining a time-aware similarity function, developing a community detection algorithm, and introducing a novel health-aware reliability measurement. The performed ablation study can help to demonstrate the amount of performance improvement through each provided contribution. To investigate the effect of each contribution on the performance improvement, we ignore this contribution in the proposed FRSHR method and compare the obtained model with the original version of FRSHR. The results of comparing the original version of FRSHR with the model obtained by ignoring the reliability measurement are shown in Fig. 6. Through investigating these results, we can see that the proposed health-aware reliability measurement is able to considerably increase the performance of FRSHR in terms of all evaluation measures and also different lengths of recommendations list. Therefore, the reliability measurement benefits FRSHR in terms of both accuracy and health metrics simultaneously. The reason behind this achievement is to take into account both accuracy and health factors in designing the proposed health-aware reliability measurement. According to Eq. (17), it is clear that the proposed health-aware reliability measurement not only considers the health factor but also takes two accuracy-based factors into account in its calculation. Therefore, it is expected that the reliability measurement achieves better results in terms of both health and accuracy (i.e., precision, recall, and NDCG) metrics than the model ignoring this measurement in its procedure. As a result, utilizing the reliability measurement in food recommendation systems makes a significant improvement in producing accurate and healthy recommendations.

Fig. 6
figure 6

The comparison of the proposed FRSHR method with the model obtained by ignoring the reliability measurement

Figure 7 shows the comparison results of the developed FRSHR model with the recommendation model obtained by ignoring the time factor. These results prove that considering the time factor in the proposed time-aware similarity function (Eq. 1) as one of the main contributions has a positive effect on boosting the efficiency of FRSHR. In other words, the FRSHR model with the time factor outperforms the model without the time factor by obtaining better results showing the positive role of the time factor in the performance improvement. These results are expectable because the preferences of users usually vary in different time periods, and taking into account this dynamism results in a better recommendation procedure. Another contribution is to develop a community detection approach utilized to find appropriate user communities, which are considered the nearest neighbors of users. To analyze how much this approach is effective in improving the performance of FRSHR, some experiments are conducted and their results are shown in Fig. 8. In these experiments, the original version of FRSHR is compared with another version that does not apply the community detection approach and only uses the pure \(K\)-nearest neighbors strategy in its prediction mechanism. We can conclude from these results that the original version of FRSHR (i.e., the model with community detection) is better than another recommendation model (i.e., the model without community detection). Consequently, the proposed community detection approach is a considerable contribution of this paper to boost the effectiveness of food recommendation systems. Moreover, it can be concluded that the users of the same community have mainly common tastes.

Fig. 7
figure 7

The comparison of the proposed FRSHR method with the model obtained by ignoring the time factor

Fig. 8
figure 8

The comparison of the proposed FRSHR method with the model obtained by ignoring the community detection approach

4.6 Discussion

Based on the key innovations utilized in our developed food recommendation system, it is more effective than other state-of-the-art food recommendation systems. These innovations can be summarized as below:

  1. 1.

    In prior compared food recommendation systems, such as LDA [44], HPRR [61], CNNFR [13], HAFR [41], CFRR [62], FGCN [43] and HGAT [63], the time factor of the historical ratings was ignored. A highly effective food recommendation system should take into account that users’ preferences, including their diets and tastes can change over time. Previous food recommendation systems ignore the time factor of user ratings, making them ineffective when user preferences change over time. In this study, a novel time-aware similarity function (Eq. 1) is developed that accommodates changes in dietary preferences or preferences over time. This gives our developed model an edge over the state-of-the-art, which often neglected the time factor. The results of the ablation study reported in Fig. 7 clearly showed the significant effect of considering the time factor on boosting the performance of the proposed FRSHR model.

  2. 2.

    In contrast to all previous compared food recommenders, an effective health-aware reliability measurement is introduced (Eq. 17), which evaluates the reliability value of predicted ratings simultaneously by considering both accuracy and health factors. Reviewing previous food recommender systems demonstrated that there is a research gap in these models related to taking into consideration the reliability of recommendations. In terms of improving trust in food recommendation models, reliability measurement is a crucial issue. All of the compared food recommendation models (i.e., LDA, HPRR, CNNFR, HAFR, CFRR, FGCN, and HGAT) neglect this issue and only consider the accuracy of recommendations without paying attention to their reliability. In order to provide more accurate and healthy recommendations to users, the unreliable predicted rating is recalculated by the newly obtained neighborhood. The results obtained by the performed ablation study indicated that the developed health-aware reliability measurement is able to considerably increase the performance of our model in terms of all accuracy metrics and also health metric simultaneously (Fig. 6). To the best of our knowledge, the proposed FRSHR model is the first work that incorporates a health-aware reliability measurement in the main procedure of food recommendation systems.

  3. 3.

    In the compared food recommendation models, user communities are ignored, and the analysis of user–user interaction is rather light. In contrast to previous works which did not take users’ communities into account when recommending foods, a novel approach to detecting users’ communities is developed to define the natural groups of users. Using this approach, the appropriate number of communities can be determined automatically, without having to set any parameters manually. Users in the community the target user belongs to are considered as her/his nearest neighbors set when predicting unknown ratings (Eq. 20). According to the ablation study reported in Fig. 8, in terms of precision, recall, NDCG, and health metrics, the developed model (i.e., the model with community detection) is better than another recommendation model (i.e., the model without community detection).

Although our developed system outperforms all of the compared state-of-the-art food recommendation models as shown in the results section, the question remains as to whether it can be further improved. There are several ideas that can help further improvement of our model:

  1. 1.

    Dataset: Since the publicly available food datasets, such as Food-101 [64], and Yummly [65] do not contain relevant user–food rating information, they are not suitable for evaluating food recommendation systems. Therefore, in this study, we collected the user–food rating dataset by crawling the allrecipes.com website to evaluate the system’s performance. Based on the analysis of the visitors to this website, despite being one of the largest food social networks, 85% of its visitors are North Americans who have a special food culture. Therefore, evaluating our recommendation system using the data from sites that are hosted by other countries with different food cultures could bias to some extent the results. Therefore, we plan to crawl other food social networks and analyze diet style in different cultures and countries as well to compensate for this shortcoming.

  2. 2.

    Specificity: Our developed model and previous recommendation systems ignore user characteristics (e.g., age, weight, gender, location, allergies, medical history, etc.) and only consider food contents or user ratings when recommending foods to potential users. Recommendations that ignore each individual’s characteristics can lead to both health issues, as well as a reduction in the overall effectiveness of the system. Although collecting these data could be complicated, future studies should consider user characteristics to further improve the final performance and provide more tailored and individualized food recommendations.

  3. 3.

    Novelty/Diversity: The proposed food recommendation system and previous research have focused on improving the accuracy of recommendations and the health factor of recommended foods. As a result of overemphasizing accuracy, recommendations may be inappropriate and may result in over-fitting. Therefore, in future works, novelty and diversity should be addressed to solve the over-fitting problem as well as to increase the quality of the user’s experience with the recommender system.

5 Conclusions

One of the issues in the field of recommendation systems is to design intelligent tools to find the tastes of users regarding available foods and assist them to have a better and healthy diet. This paper proposed a novel food recommendation system with the aim of incorporating health-aware reliability measurement into the recommendation procedure. The main contribution of this paper is to introduce a novel reliability measurement, which is defined based on both accuracy and health factors. This measurement is used to assess the effectiveness of the predicted ratings, where the unreliable predictions will be recalculated using a neighborhood refinement mechanism. In addition, a time-aware similarity function is used to calculate the user’s similarity values, which are employed in a community detection approach. The effectiveness of the proposed method is investigated by conducting various experiments and comparing its performance with other state-of-the-art food recommendation models. The results clearly proved the superiority of the proposed method compared to other models in terms of both accuracy and health metrics. Moreover, an ablation study is performed to demonstrate the effect of each provided contribution in the proposed method. The results of this ablation study showed that the contributions have a considerable effect on improving the effectiveness of the proposed food recommendation model.