Keywords

1 Introduction

The food and dining industry has been revolutionized by the internet, providing customers with a wealth of information and tools to enhance their dining experiences [6]. The rise of social media, food blogs, and review websites has empowered customers to share their opinions and experiences with a vast online community, while also allowing restaurants to showcase their menus and services to a broader audience. The primary goal of this paper is to introduce and demonstrate the efficacy of a content-based recommender system (RS) for restaurants that harnesses textual data, while notably omitting the need for user-specific information. We leverage a compiled list in the German language of more than 6000 restaurants, bars and cafés in Austria grouped by categories from Falter Verlagsgesellschaft m.b.H.Footnote 1 (Falter) for their column “WIEN, WIE ES ISST”Footnote 2. This digital guide is well established and is utilized by both residents and tourists. Relevant research has explored various approaches to restaurant RS. Gupta et al. [5] leverage a user’s geolocation and visit history to recommend similar restaurants, while another approach [1] involves sentiment analysis of user reviews to derive food preferences for personalized suggestions. Additionally, user-entered favorite amenities can be used to recommend restaurants based on their offerings [4], and restaurant descriptions and photos can be combined with user data to create a hybrid recommendation system [2]. These diverse strategies showcase the versatility of RSs for enhancing restaurant recommendations. In this work, we highlight the importance of domain expert interviews for revealing crucial aspects that need to be incorporated into a content-based RS which can compensate for the lack of user reviews. Additionally, we demonstrate that baseline approaches can yield comparable, and sometimes even better, results in specific cases compared to state-of-the-art models.

1.1 Example of Falter’s Recommendations

Falter’s restaurant guideFootnote 3 lists restaurants by providing a description, metadata, as well as different tags to group restaurants by predefined properties, e.g., if a restaurant review is available or not (see example restaurant “Das Bootshaus”Footnote 4). The restaurant recommender system currently in place works solely in a similar rule-based way by filtering tags and location. The problem with this basic approach becomes apparent when we see that “Das Bootshaus” offers seafood as a specialty, which is not indicated in the tags. There is thus a need for a recommender system that can utilize the restaurant’s text to provide more meaningful recommendations.

2 Method

TF-IDF is used as a baseline method for textual recommendations. Reimers et al. [7] highlight in their work that Sentence-BERT (SBERT) can outperform previous approaches within this field. We used the T-Systems RobertaFootnote 5 model based on the publication of Reimers et al. [7] supporting both German and English words. For the evaluation of our text models, we executed qualitative and quantitative approaches. In the following two paragraphs, the results of both the TF-IDF and SBERT approaches are outlined. These recommendations are derived by using the restaurant discussed in Subsect. 1.1 as a reference point.

For the TF-IDF baseline, the text data was run through a preprocessing pipeline which first removes the prices from the description as they do not provide any contextual information, then we remove standard German stopwords and lemmatize the text by using the HannoverTagger lemmatizer [9]. Apart from these standard steps, we had a look at words that are used often in descriptions but do not provide meaningful information. Such words can be the German words for “Kitchen”, “Dish”, “Air Conditioning”, “Sidewalk tables” (Schanigarten) etc. This was an important step that greatly improved the recommendations since it removes unnecessary noise from the data. The top-5 recommendations using cosine similarity are Ufertaverne, Pizzeria Adamo, Neuzeit, Gästhaus Käpt’n Otto, and Cafe Restaurant Denito.

SBERT is our state-of-the-art model of choice. The self-attention mechanism of the BERT architecture [3, 8] allows for improved performance and understanding of context, which in turn produces more novel recommendations when compared to the baseline. For this model, we decided to feed the text as-is (without preprocessing) as the preprocessing steps used for the baseline seemed to worsen the performance. The top-5 recommendations are Landtmann’s Jausen Station, Mühlwasser Platz’l, Klyo, Zur Alten Kaisermühle, and propeller.

The novelty lies in the fact that SBERT was able to recommend restaurants that are near the Danube, like the restaurant from our example (see Subsect. 1.1), while also recommending restaurants from the same franchise like Landtmann’s Jausen Station. This is a distinction which could not be achieved by simple word frequency calculation from our baseline. Further insights for both recommenders will be discussed in Sect. 3.

2.1 Evaluation

Quantitative Evaluation. We consider only 2300 restaurants with an existing kitchen in their given tags, meaning that not all venues have a food specialty (i.e. Viennese, Chinese etc.). We used the three biggest kitchen types and their subtypes: the Italian kitchen, Asian together with its subtypes Japanese, Chinese, Korean, Thai, Vietnamese; and lastly the Viennese kitchen. For the Viennese kitchen, we also include Austrian, Tyrolean, and Styrian cuisines as subtypes. While it might seem counter-intuitive to categorize Austrian cuisine as a subtype of Viennese, this was simply done for practicality purposes, since more restaurants have the tag Viennese rather than Austrian. All these listed cuisines cover more than 70% of all restaurants that have a kitchen attribute. To demonstrate the performance with restaurants from smaller kitchen groups, we chose the Indian kitchen. The evaluation metric used is the hit rate which is defined as the total number of recommendations (from a top-10 recommendation list) with the same kitchen type (or subtype) as the given restaurant divided by the total number of recommendations (ten). Qualitative Evaluation (Domain Expert Interview). The qualitative evaluation was done in the form of an interview with the help of an expert from Falter. For the interview, we divided the restaurants into three categories: Restaurants with a given kitchen attribute; Restaurants with a focus on certain food but no specified kitchen tag in the data-set; Restaurants with no particular focus on any food and no specified kitchen tag in the data-set. For each of the first two categories we chose two restaurants and for the last category only one restaurant, for which recommendations were generated. For each of these restaurants, we had one top-5 list from TF-IDF and one from SBERT leading to a total of 50 recommendations. The interviewee did not know which model generated the recommendations until the interview had concluded and they were asked to rate the recommendations and give feedback based on their opinion.

3 Results

Table 1 shows the quantitative results from the hit rate metric. We observe TF-IDF outperforms its counterpart on Italian and Asian kitchens but greatly underperforms when it comes to the Viennese kitchen. A possible explanation for this could be the use of more common words for the first two cases (Italian: pizza, pasta etc.) (Asian: sushi, maki, bento etc.) whereas for the Viennese case, it could only be characterised by the adjective “Viennese” and less common food names giving it a lower similarity score for TF-IDF. SBERT however picks up semantic similarities and thus performs better in this case. When looking at the hit rate for Indian kitchen, we see a significant drop in accuracy. This could be due to the low number of Indian cuisine restaurants, which comprise only 78 out of 2300 restaurants with an existing kitchen attribute.

Table 1. Hit Rate of TF-IDF and SBERT

From the conducted domain expert interview we had the following qualitative key findings: TF-IDF tends to provide less diverse suggestions compared to SBERT due to SBERT’s contextual understanding. While TF-IDF performs better when specific kitchen attributes are given, SBERT excels in general cases and offers more novel recommendations (see example from Sect. 2). Restaurant ratings are context-dependent, as a restaurant may receive a lower rating when surrounded by better suggestions. Additionally, the atmosphere of a restaurant significantly influences ratings, even when restaurant descriptions are similar. The previous point is very important as it can lead to recommendations from TF-IDF to perform worse than the ones from SBERT even if TF-IDF can find places that offer similar food but not the same atmosphere, which SBERT seems to perform better at. Lastly, the importance of considering price range and location in recommendations is emphasized, with the suggestion to recommend restaurants with similar price ranges.

4 Conclusion

In this work, we showed that baseline RS such as TF-IDF can outperform state-of-the art algorithms in special cases, whereas state-of-the-art algorithms offer more novel recommendations by using contextual understanding to their advantage. This leads to the practical advantage of cheaper and faster integration into the system architecture of an already running platform. Furthermore, we showed that domain expert interviews provide crucial insight to improve domain-dependent RS. During the interview, the expert highlighted the importance of grasping the atmosphere of a restaurant, which is not a metric that can be calculated out of the box. Especially, the knowledge gained during the qualitative evaluation leads to future work. Latent features, such as the atmosphere of a restaurant, should be incorporated into modern state-of-the-art restaurant recommender systems. In addition, the next step would be to conduct a user study to compare their perception and needs with the domain expert. Furthermore, the use of generative AI could be beneficial for improving recommendations within this domain.