Automatic Generation of Restaurant Reviews Using Natural Language Processing

Maldonado Castillo, Idalia; Aguirre Miranda, Ignacio Adrián; Olvera Mendoza, Alexis

doi:10.1007/978-3-031-51038-0_96

Part of the book series: Springer Proceedings in Business and Economics ((SPBE))

Included in the following conference series:

The International Conference on Strategic Innovative Marketing and Tourism

1532 Accesses

Abstract

Currently, there is plenty of information available on restaurant rating platforms, but sometimes it can be contradictory or difficult to analyze in depth. An important challenge for consumers is searching for useful opinions and making decisions based on reviews usually obtained from different social networks or rating platforms. This project addresses this issue through the design, development, and implementation of a system that generates global recommendations for a set of restaurants based on the analysis of reviews using Natural Language Processing (NLP) techniques. The system is based on a corpus of restaurant reviews in Mexico City, from which relevant aspects are extracted and synthesized to generate comprehensive reviews from various sources. The system can also evaluate customer satisfaction by identifying the positive and negative aspects mentioned in their reviews. In this way, it provides comprehensive information that helps diners make informed decisions. By gathering data from various sources, the system classifies and analyzes the information, providing an analysis rather than just displaying data. Another important aspect is that the project contributes to the promotion of the gastronomic offer in Mexico City, supporting tourism in a more informed way. By integrating customer perspectives, a more complete and realistic view of the restaurant experience is obtained. The importance of this project lies in the empirical evidence showing that consumer reviews are influenced by the average rating and the number of reviews. Given the overwhelming number of options and the need to provide relevant information efficiently, this project offers a solution by generating detailed reviews based on aggregated information from multiple sources, including consumer reviews and influencer critiques.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

On the web, there is an overwhelming number of options when searching for restaurant´s reviews, leading to the need to filter, prioritize, and deliver relevant information efficiently to alleviate the problem of information overload [1]. To address this situation, the present work proposes the generation of a comprehensive review from a collection of reviews gathered from restaurant evaluation platforms like Yelp, Google Reviews, and TripAdvisor.

2 Theoretical Framework

2.1 Artificial Intelligence in Consumer Decision Journey

Artificial Intelligence (AI) has transformed the marketing landscape by enabling companies to deliver personalized experiences, make data-driven decisions, and improve customer engagement. The role of AI technology in marketing and customer decision-making is expected to shape the future of customer interactions and business strategies. AI can analyze customer feedback and social media interactions to identify sentiment and customer satisfaction levels.

Companies that use advanced technology can also collect information about consumer preferences through digital data analysis and consumption patterns promoted by social media. Big data and experiments with machine learning are bringing together consumers’ personal values to determine their behavior and preferences in the markets [2].

In the modern consumer decision journey, consumer outreach has become more crucial than traditional push-style marketing. Word-of-mouth, internet reviews, and consumer interactions are significant touchpoints during the active-evaluation phase.

With the rise in popularity of digital platforms and social media, consumers are relying more on online reviews and recommendations from other consumers to shape their perceptions and to make purchasing decisions. Marketers must engage actively with consumers, manage brand reputation, and leverage user-generated content to build trust, credibility, and loyalty in this new consumer-centric landscape [3].

2.2 Aspect Extraction Module

Natural Language Processing (NLP) refers to the branch of computer science, and more specifically to the branch of AI, which deals with giving computers the ability to understand text and speech in the same way that humans do [4].

We make use of NLP by training a machine learning model for the extraction of aspect-opinion-sentiment triplets from a restaurant review. This model is trained with a corpus of labelled reviews obtained by processing data from restaurant review sites and social media.

For aspect extraction, an Aspect-Opinion-Sentiment Triplet Extraction (ASTE) model was used, focusing on the span-level approach. ASTE generates triplets consisting of an aspect target, the corresponding opinion term, and its associated polarity sentiment. The span-level approach explicitly considers the interaction between complete spans of aspects and opinions when predicting their sentiment relationship. As a result, it can make predictions with the semantics of complete spans, ensuring better sentiment consistency [5].

For example, in Fig. 1, the spans highlighted in orange are aspect target terms, and the interval in blue is the opinion term. From the same figure, the aspects are “food,” “service,” and “decoration”; there are three triplets: (food, wasn’t great, negative), (service, really nice, positive), and (decoration, really nice, positive).

A text box of an A S T E example. The text reads the food wasn't great, but the service and the decoration were really nice. An arrow from food is mapped to wasn't great and arrows from service and decoration are mapped to really nice. Food, wasn't great is negative. Service really nice is positive. — **Fig. 1**

When considering only word-by-word interactions, it is easy to mistakenly predict that “great” expresses a positive sentiment about “food.” For this reason, a segment-based model for ASTE (Span-ASTE) is implemented, which directly captures span-to-span interactions when predicting the sentiment relationship between an aspect and a pair of opinions.

Span-ASTE consists of three modules: sentence encoding, mention module, and triplet module. For the given example, the sentence is first introduced into the sentence encoding module to obtain token-level representations, from which interval-level representations are derived for each enumerated interval, such as “wasn’t great” “food”. Then, aspect category detection, Aspect Term Extraction (ATE), and Opinion Term Extraction (OTE) tasks are adopted to supervise the proposed dual-channel span reduction strategy, which obtains reduced aspect and opinion candidates, such as “food” and “not delicious,” respectively. Finally, each aspect candidate and opinion candidate are paired to determine the sentiment relationship between them [6].

For word embeddings, BETO was used, a language model based on the Transformer architecture specifically designed for natural language processing in Spanish [7]. BETO is an initiative to enable the use of pre-trained BERT models for natural language processing tasks in Spanish.

2.3 Review Generation

Using multiple reviews from different rating sites, the goal is to generate a single general review that integrates the positive and negative aspects of the products and services for each restaurant. To achieve this, it is necessary to group the significant criteria and from these groups identify the most relevant aspects.

The review generation module consists of two relevant implementations, the classification of the extracted triplets. For the first approach, the use of the unsupervised machine learning algorithm k-means was proposed to achieve a better organization of the extracted triplets. A k = 3 was chosen, where k represents the number of clusters the algorithm creates.

The algorithm identifies similar patterns or features among a set of elements and works by calculating the minimization of the sum of distances between each element and the proposed centroids. This process is done iteratively, updating the centroids by taking the position of the average of the objects belonging to that group as the new centroid [8].

We notice that, for all the triplets, usually one cluster corresponds to general aspects and opinions of the restaurant, for example, [restaurant, good], [place, clean], [restaurant, would return]. Another cluster corresponds to semi-general aspects related to the restaurant, such as [taste, spectacular], [quality, excellent], [service, impeccable]. Finally, the last cluster corresponds to specific dishes of the restaurant, for example, [scrambled eggs, delicious], [lemonade, tasty], [chicken, juicy].

Once having this classification, the probabilistic Latent Dirichlet Allocation (LDA) model was applied to each cluster concerning their aspects to find the three most relevant aspects in each group [9, 10]. Subsequently, the same method is applied again, this time to all the opinions for each found aspect to obtain the most relevant opinion for that aspect.

Once the most relevant aspect-opinion pairs have been obtained, they are sent to the GPT-4 language model through the API provided by OpenAI.

In this way, a review-formatted text is obtained from the n most relevant aspect-opinion pairs from the set of reviews and stored in a database. The process is automated, extracting reviews from established restaurant review sites (Yelp, Google Reviews, and TripAdvisor), and the generated reviews can be accessed through a web tool.

3 Methodological Considerations

This project was based on an exploratory methodology that focused on analyzing similar products and expanding on traditional restaurant evaluation sites. Machine learning, generative AI models, and NLP techniques were used to develop an innovative solution in restaurant review generation [6, 11].

Projects with a similar focus usually only filter or categorize reviews, but this project, on the other hand, is capable of synthesizing information from multiple reviews into a single, more detailed review that compiles the most important aspects. This is done by obtaining information from different restaurant evaluation sites such as Yelp, TripAdvisor, and Google Reviews. Reviews of these sites are attained to be later analysed automatically, and finally the aspects obtained are extracted and their sentiments are classified as positive or negative, constructing an aspect-opinion-sentiment triplet (e.g. [chicken, tasty, positive], [drinks, bad smell, negative]). In doing so, it facilitates the decision-making process for those consulting it, allowing them to obtain a result based on their own criteria. The complexity of the project lies in building the labelled training corpus of reviews, extracting restaurant aspects, assigning specific relevance to those aspects, and generating the review itself.

Research took place in January 2022, the review collection began in August 2022 and ended in June 2023, using data extracted from APIs provided by the rating platforms themselves.

Reviews in Spanish of restaurants in Mexico City were collected on the following rating sites: TripAdvisor 425 reviews, Google Reviews 1415 reviews, Yelp 7160 reviews; as well as 1000 reviews of various posts on Instagram.

For the elaboration of this model, the analysis of information from the paper.

“Centralization of Information to Understand the Consumer Within the Restaurant Sector” was used, as well as the use of a tool to label the information, since the system requires to be trained with data in a specific format.

4 Results and Discussion

A corpus with 10,000 manually labelled reviews was obtained to train the Span-ASTE triplet extraction model through the developed labelling tool. The performance achieved through training with this corpus and k-fold cross-validation is reflected in the metric values: precision of 0.7, recall of 0.63, and F1 score of 0.66.

The model extracts and leverages existing resources, allowing users to make purchasing decisions based on the experiences of other customers. The information is dynamic and constantly updated based on the opinions of all diners, unlike other tools like Google Maps or Yelp, which offer a general view of services without direct customer feedback. With the model, users save research time, as relevant aspects of the place are provided.

In terms of managerial implications, this solution can be beneficial for both companies and customers. For companies, analysing and generating global reviews enables them to better understand customer opinions and preferences, helping them improve their services and make more informed strategic decisions. Additionally, by providing more detailed and relevant reviews, companies can positively influence customer perception and increase satisfaction.

For customers, this solution saves them time and effort by providing a comprehensive review that succinctly summarizes the most important aspects of the restaurant. This helps them make informed and reliable decisions when choosing a restaurant or service, thus enhancing their overall experience.

A similar algorithm can be applied to any other tourist attraction or sector, such as accommodation. This allows the evaluation and perception of services based on customer experiences.

Below are screenshots of the developed system, Fig. 2 shows the selected Restaurant and the reference from the review sites where the reviews were obtained. The following figures (Figs. 3, 4, 5, 6 and 7) show relevant information about the restaurant such as the generated review, a graph with a summary of positive and negative reviews, relevant aspects found in the total reviews, etc.:

A screenshot of the restaurant search screen includes the space to enter restaurants, name, I D google, I D yelp, and I D trip Advisor and the google map of Mexico City. — **Fig. 2**

A screenshot of the restaurant screen presents the reviews on Pujol Restaurant of Mexico city. — **Fig. 3**

A semicircular graph presents the number of positives as 224 and negative aspects as negative 61. — **Fig. 4**

A pie chart presents the aspects of a restaurant screen. The words restaurant, experience, service, comida, and so on are mentioned. The number of the aspect restaurant is the highest. — **Fig. 5**

A donut chart with every aspect of the restaurant listed. — **Fig. 6**

Fig. 7

Restaurant screen, menu showing all reviews that were analyzed for the generated review. In each review it is possible to see the aspects highlighted in orange and the opinions that were expressed about it in blue. Hovering the pointer over an aspect highlights its corresponding opinion

Full size image

5 Conclusion

By including longer reviews in the training, there is a greater diversity of data, improving the extraction of aspect-opinion-sentiment triplets for longer reviews. However, it is necessary to increase the number of such reviews to further improve the metrics.

Contrary to our initial hypothesis that a larger corpus would improve the results, the performance decreased in all metrics, and none of the folds surpassed the threshold of 0.70 in the F1 score.

References

Isinkaye FO, Folajimi YO, Ojokoh BA (2015) Recommendation systems: Principles, methods and evaluation. Egyptian Informatics Journal, 16:261-273 Isinkaye FO, Folajimi YO, Ojokoh BA (2015) Recommendation systems: Principles, methods and evaluation. Egyptian Informatics Journal, 16:261–273.https://doi.org/10.1016/j.eij.2015.06.005
Cortés M (2019) Machine Learning, esencial en el análisis del comportamiento del consumidor. Retrieved from CIO México: https://cio.com.mx/machine-learning-esencial-en-el-analisis-del-comportamiento-delconsumidor/
McKinsey Quarterly (2009) The consumer decision journey. Retrieved from McKinsey & Company: https://www.mckinsey.com/capabilities/growth-marketingand-sales/our-insights/the-consumer-decision-journey
IBM (n.d.) ¿Qué es el procesamiento del lenguaje natural (PLN)? Retrieved from IBM: https://www.ibm.com/mx-es/topics/natural-language-processing
Peng H, Xu L, Bing L, Ewi L, Si L (2020) Knowing What, How and Why: A Near Complete Solution for Aspect-Based Sentiment Analysis. Proceedings of the AAAI Conference on Artificial Intelligence
Google Scholar
Xu L, Chia YK, Bing L (2021) Learning Span-Level Interactions for Aspect Sentiment Triplet Extraction. arXiv preprint arXiv:2107.12214
DCC UChile (2022) BETO: Spanish BERT. https://github.com/dccuchile/beto
Education Ecosystem (LEDU) (2018) Understanding K-means Clustering in Machine Learning. (Medium) https://towardsdatascience.com/understanding-k-means-clustering-in-machinelearning-6a6e67336aa1
Kulshrestha R (20n19) A Beginner’s Guide to Latent Dirichlet Allocation(LDA). Retrieved from Medium: https://towardsdatascience.com/latentdirichlet-allocation-lda-9d1cd064ffa2
Blei D M, Ng AY, Jordan MI (2003) Latent Dirichlet Allocation. Journal of Machine Learning Research 3:993–1022. https://doi.org/10.5555/944919.944937
OpenAI. (2023) GPT-4 Technical Report. arXiv:2303.08774

Download references

Acknowledgements

The results of this work were developed within the framework of the research project: “Promotion of tourism through the analysis of recommendations or reviews of tourist sites and restaurants using mobile applications,” with registration number assigned by SIP: 20232546. Developed at the Instituto Politécnico Nacional, Escuela Superior de Cómputo in México.

Author information

Authors and Affiliations

Escuela Superior de Cómputo-Instituto Politécnico Nacional, ciudad de México, México
Idalia Maldonado Castillo, Idalia Maldonado Castillo, Ignacio Adrián Aguirre Miranda & Alexis Olvera Mendoza
Escuela Superior de Computo, Instituto Politecnico Nacional, Mexico City, México
Idalia Maldonado Castillo

Authors

Idalia Maldonado Castillo
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Adrián Aguirre Miranda
View author publications
You can also search for this author in PubMed Google Scholar
Alexis Olvera Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Idalia Maldonado Castillo .

Editor information

Editors and Affiliations

University of West Attica, Athens, Greece
Androniki Kavoura
University of the Azores, Ponta Delgada, Portugal
Teresa Borges-Tiago
University of the Azores, Ponta Delgada, Portugal
Flavio Tiago

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maldonado Castillo, I., Aguirre Miranda, I.A., Olvera Mendoza, A. (2024). Automatic Generation of Restaurant Reviews Using Natural Language Processing. In: Kavoura, A., Borges-Tiago, T., Tiago, F. (eds) Strategic Innovative Marketing and Tourism. ICSIMAT 2023. Springer Proceedings in Business and Economics. Springer, Cham. https://doi.org/10.1007/978-3-031-51038-0_96

Download citation

DOI: https://doi.org/10.1007/978-3-031-51038-0_96
Published: 01 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-51037-3
Online ISBN: 978-3-031-51038-0
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics