Introduction

Due to the proliferation of the use of internet, e-commerce and social media, text data are readily available on the web. These text data can be a great resource for marketers who are eager to listen to the customers to better manage the marketing process. Traditionally, marketers send surveys to the customers; nowadays, product reviews, blogs or other digitized communication provide information on the attributes that are relevant to marketing decision making such as, adoption of a new product, possible composition of consideration set, etc. (Lee and Bradlow 2011). Due to the large volume and unstructured nature of the text data, sophisticated modeling is warranted, and researchers have been using data analytics, specifically machine learning techniques to uncover various patterns that helps the business (Mikalef et al. 2020b).

A large number of studies have examined the effect of big data analytics on firm’s performance and the evidence quite overwhelmingly suggests that data analytics improves firm’s decision making and innovation (Branda et al. 2018; Gupta and George 2016), customer relationship management, management of operations risk and efficiency, market performance (Wamba et al. 2017) and at the end, overall performance (Kiron 2013). By providing accessible information to managers, data analytics creates a competitive advantage (Mikalef 2020a). Studies have also shown a positive association between customer analytics and firm’s performance (German et al. 2014). Customer analytics may tap into different areas of customer experience ranging from purchasing behavior, prediction of buying trend to product recommendation, co-creation (Acharya et al. 2018) and opinion summarization about a specific feature of a product, etc.

The task of summarizing opinions or reviews has become one of the central research areas among the text mining community, mainly in the information retrieval literature (Mudasir et al. 2020; Hu et al. 2017). The techniques are becoming more sophisticated, and studies are increasingly reporting methods for extracting aspects/topics, textual summaries, etc. (Mudasir et al. 2020). The different formats and techniques provide different levels of understanding or precision of the content. Therefore, the users need to adapt these methods according to their own needs.

In the influential paper on automated text analysis in marketing, Berger et al. (2020) emphasized that regardless of the focus (to make a prediction, to assess impact or to understand a phenomenon), “doing text analysis well requires integrating skills, techniques, and substantive knowledge from different areas of marketing.” (Berger et al. 2020, p. 6). Text analysis yields its best result when the positivist analysis (the factual knowledge gained by a scientific process, usually a quantitative method) is used in combination with qualitative and interpretive analysis. For example, Kubler et al. (2017) used tailored marketing dictionary which allows the analysis to be interpreted in the marketing context, rather than in a general context. A word may have a different interpretation depending on the context where it is used. The author then utilized this exclusive dictionary ( tailored for marketing domain) inside an automatic text analysis-based sentiment extraction tool, namely, support vector machine (Cui and Curry 2005) to uncover different marketing metrics from user-generated content. Berger et al. (2020) further elaborated this point and explained that quantitative skill helps building the right mathematical model, but behavioral skill relates the phenomenon (the findings) to underlying psychological processes, and most importantly for marketers, strategy skill which can be defined as the skill to understand the findings from the big data and convert these findings into firm’s actionable items and outcomes helps reach firm’s goals. Therefore, these text data can ultimately aid a firm’s marketing decision making and be a great resource, but the combinations of above mentioned tools seem to be necessary. In this light, it is very important that marketers build their tools using machine learning techniques as well as the other soft skills, especially Marketing-specific knowledge and skill to get most out of the data (Ma and Sun 2020). However, Marketing analytics literature is still premature in providing guidance about the suitability of a particular analytics tool in crafting overall firm’s strategy (Vollrath and Villegas 2022). Machine learning and text mining experts are skilled in building accurate and precise mathematical models and often, their goal is to improve prediction. The “right answer” for goal might be different for different objectives. Therefore, when the goal is to improve overall marketing metrics, it is recommended in the literature that domain knowledge be incorporated in the process (Hair and Sarstedt 2021). In a recent paper, Huang and Rust (2021) elaborated that Artificial Intelligence use in Marketing should be in three stages: “Mechanical AI” for repetitive tasks, “Thinking AI” for analyzing data and making a decision and “Feeling AI” for understanding consumers and interacting with them. The latter two need domain knowledge as input to optimize the goal of improving marketing metrices. This paper responds to that call of integrating quantitative model with goal-specific domain knowledge to better assist managers in taking actions. A firm’s marketing decision making through text analysis task is better served when domain knowledge is incorporated rather than borrowing predefined model invented for a different purpose.

As mentioned before, opinion summarization has been an active area of research in information retrieval literature for over decades now, marketers need to tailor these methods according to their objectives and needs to leverage the strength of this huge textual data. The strength of statistical power and goal of marketers need to come together to fully utilize this opportunity. With this in mind, the current research focuses on comparing two common text mining techniques from a marketer’s perspective. Analyzing the text data of the reviews posted on Amazon.com, the current study compares: Latent Semantic Analysis (LSA) (Deerwester et al. 1990) and Probabilistic Latent Semantic Analysis (PLSA) in extracting useful summarizing information in terms of common themes. In the first context, the reviews are taken from the category of kitchen appliances where there were different brands and several kitchen products within this dataset of reviews. Second, only one brand of a product’s review is examined in handbag category. The objectives of review summarization are fundamentally different in these two scenarios when the analysis is intended to provide information about market research. The first scenario provides information about the whole market in that product category. It can be considered as market surveillance where important aspects of the product category are to be monitored by a particular seller to find out what characteristics of the product category are of main concern. These are also the key aspects of the whole customer experience that determine customer satisfaction or dissatisfaction. The insight can be used to improve an offering through innovation or by combining digital aspect to it, also known as digital innovation (Sahut et al. 2020). Since the information comes from well-represented consumers are “organic’ text, it is free from any bias and doesn’t restrict any topic which is a common problem even in well-crafted surveys (Savage and Burrows 2009). The second scenario examines a single brand. This is useful for brand managers when an in-depth analysis of consumers’ opinion is sought after. There have been studies that have looked at the performances of LSA and PLSA (Ke and Luo 2015; Kim and Lee, 2020). However, to the best of our knowledge, there is no study that compares these two methods in two different scenarios where marketing goals are different and evaluate the suitability of the techniques in differing contexts.

The rest of the paper is organized as follows: We review the literature on User-generated content, Customer analytics, opinion summarization and some text mining tools. Experimentation is presented next along with findings, followed by the Discussion and managerial implications.

Literature review

User-generated content

UGC refers to any content created by users or consumers of a product or service, such as product reviews, social media posts, blog articles, and videos. In recent years, user-generated content (UGC) has exploded, and these UGCs are often text data in the form of blogs, reviews, or social media interactions. The scholars have examined a range of issues (Iacobucci et al. 2019), such as how and why people make UGC contributions (Braune and Dana 2021; Moe and Schweidel 2012; Ransbotham et al. 2012) and the impacts of UGC (Zhang et al. 2012) including review rating and text (Sallberg et al. 2022), among others.

User-generated content (UGC) can benefit firms in several ways, including increased customer engagement (Bijmolt et al. 2010), improved brand loyalty (Llopis-Amorós et al. 2019), and brand co-creation (Koivisto and Mattila 2020). A study by Constantinides and Fountain (2008) found that UGC can positively impact the credibility and perceived quality of a brand, leading to increased brand loyalty and purchase intentions. Additionally, UGC can enhance the authenticity of a brand by providing real-life examples of product usage and customer experiences. More importantly, UGC can also provide valuable insights into customer preferences, needs, and touch points, which can help firms improve their products and services. In a study by Bernoff and Li (2008), it was found that UGC can help firms identify customer needs and trends, leading to improved innovation and product development.

In a study of UGC and its impact, Li et al., (2021) modeled consumer purchase decision process and found evidence that UGC impacts every state of this process. UGC can also provide valuable insights and ideas that firms can use to develop new products, services, or marketing strategies (Hanna et al. 2011).

Although UGC can be generated in various forms, product reviews ratings and content are very influential in terms of sales (Mudambi and Schuff 2010). The impact of review ratings on product sales has been thoroughly studied (Chevalier & Mayzlin 2006; Liu 2006) including various product categories. The sales of books (Chevalier & Mayzlin 2006) and movies (Liu 2006) were affected by ratings of the review generated by users. Research has also explored the impact of review content on marketing parameters, such as the helpfulness vote (Ghose and Ipeirotis 2011), consumer engagement (Yang et al. 2019) and digital innovation (Sahut et al. 2020). Although these data can provide valuable information about market and customers, it can be hard to decipher the actual information from the unstructured data (Zhu et al. 2013) and gave rise to customer analytics.

Customer analytics

As mentioned before, a large number of studies investigated the relationship between big data analytics or customer analytics and results suggest that data analytics enhances firm’s decision making and innovation (Branda et al. 2018; Gupta and George 2016). To analyze the customer generated text data, which most commonly occur across web, marketing scholars are using text analysis tools and methods to analyze these data automatically (Kamal 2015). These data types and analytical methods vary widely across different branches of Marketing analytics (Iacobucci et al. 2019). There are many cutting edge methods that have been used by Marketing scholars to analyze UGC and consumer reviews, in particular.

Ghose and Ipeirotis 2011 showed strong evidence that consumer review affect economic outcome, product sales and some aspects of reviews such as subjectivity, informativeness, readability, and linguistic correctness in reviews affects potential sales and perceived usefulness. They use Random forest model and text mining to uncover the insight. Netzer et al. 2012 came up with a market-structure perceptual map using consumer review data on diabetes drugs and sedan cars. The authors utilized the combination of text mining techniques and network analysis to introduce this map.

With a little bit different focus, Hou et al. (2022) studied driving factors of web-platform switching behavior using dataset of both blogging and microblogging activities of the same set of users. The authors used a sophisticated text analysis technique: multistate survival analysis. Skeen et al. (2022) took a very innovative approach to combine qualitative analysis with natural language processing and designed a mobile health app which was very customer centered.

Given this huge amount of user-generated content, it is quite useful to summarize consumers’ opinion in the aggregate level and derive marketing information from there. Li and Li (2013) summarized a large volume of microblogs to discover Market intelligence. Since our study is closely related to this area, we next review the literature on opinion summarization and sentiment analysis.

Opinion summarization and sentiment analysis in marketing

As the name implies, opinion summarization provides an idea about the whole document collection in brief. There is vast research investigating algorithms for summarization using different technical methods (Moussa et al. 2018). In Marketing related opinion summarization techniques, Vorvorean et al. (2013) introduced a method of using social media analytics that can decipher the topics of UGC, assess a major event and at the end, can have useful impact on marketing campaign.

Sentiment classification is one of the important steps in analyzing text data and can be used as part of opinion summarization. In this process, orientation of sentences or the whole documents are identified. This will result in an overall summarization of the documents as users get an idea about what is being said (positive and negative). There are several approaches in identifying sentiments which find out the adjective in the text and thus try to understand the positivity or negativity of the text (Li et al. 2018; Salehan and Kim 2016). Salehan and Kim (2016) used sentiment analysis to see the impact of online consumer review in terms of their readership and helpfulness.

Sentiment classification can be used as a simple summary, this method is very useful when there is a large collection of data involved and aggregate level opinion is sought after. Some technical methods studies (Jimenez et al. 2019; Kamps and Marx 2001) used WordNet-based approach using semantic distance from a word to “positive” and “negative” as a classification criterion between sentiments. Ku et al., (2006) used frequency of the terms for feature identification and used sentiment words to assign opinion scores. Lu et al. (2009) used natural language processing techniques to K (K = any number) interesting aspects and utilized bays classifier for sentiment prediction.

As mentioned before, extracting common themes along with its sentiment from user-generated content can be considered as summarizing the content since it tends to reflect the whole content. Next, we review some of the text analysis techniques that have been used in prior research.

Text analysis tools and methods

Studies have used a wide variety of techniques to analyze texts and specially to extract themes from texts. One of the foundational techniques to extract themes from a body of text is Latent Semantic Analysis (LSA). There are many studies that used LSA for the purpose of opinion summarization (Steinberger and Ježek 2009). Sidorova et al., (2008) used LSA to uncover the intellectual core of information research from published journal papers. The method mainly relies on the co-occurrence of the word and is not based on statistical modeling. Cosine distance can also be used in latent semantic analysis space to measure topics in the text (Turney and Littman 2003).

Another stream of techniques that focuses on extracting themes is defined as generative probabilistic model and is based on a solid foundation of statistics. Vocabulary distribution is used to find topics of texts. Basically, it first identifies the word frequencies and relation between other words (co-occurrences) effectively. There are several topic modeling approaches in this family. Probabilistic Latent Semantic Analysis (PLSA) (Hofmann 1999) and LDA (Latent Dirichlet Allocation) are the important ones. Table (Table 1) shows that identifies some key literature using these methods:

Table 1 Text analysis tools in prior research

Comparative studies between LSA and PLSA

There are some studies that have compared these two techniques (LSA vs. PLSA) in various contexts. One study (Kim et al. 2020) compared two text mining techniques to predict blockchain trends by analyzing 231 abstracts of papers and their topics. The techniques were W2V-LSA which is an improvised version of LSA and PLSA. The study concluded that the new technique W2V-LSA worked better in finding out proper topics and in showing a trend. Ke and Luo (2015) compared LSA and PLSA as automated essay scoring tools. The result showed that both methods have some correlation in their performances, and both did well in their task. A bit different, a study by Cvitanic et al. (2016) compared the suitability of using LDA and LSA in the context of textual content of patents. The study suggested that more work is needed to recommend one method versus another to analyze and categorize patents.

Although along the same line, the current study does not fully focus on summary presentation; instead, it focuses on features and their sentiment orientation that are visible in the topics. Summary presentation is often used to make the summary of the reviews more understandable to customers. From a managerial perspective, they need to know in detail what is being said about a particular feature. Therefore, the current study examines the topic extraction and the suitability of these two techniques from a managerial perspective. As mentioned in the previous paragraph, there have been studies where performance of these two techniques is compared. Some of them found evidence of the superiority of one method, some reported the same kind of efficiency, and some recommended more studies to conclude. However, to the best of our knowledge, no study has looked at these methods in two different contexts with varying objectives. Given the new understanding of automatic text analysis, where quantitative skill is to be combined with domain knowledge, and the fact that accuracy of retrieval is not the focus in marketing, the current study tries to fill the void in research in this area.

Methods and data

For the purpose of this study, as a starting point of domain-specific tool adaptation, we use two fundamental techniques (LSA and PLSA) of topic modeling. Both use topic modeling algorithms and the basic assumption of this type of modelling algorithms are (a) each document consists of a mixture of topics, and (b) each topic consists of a collection of words. LSA is one of the foundational techniques in topic modeling. LSA takes a document and terms matrix and decompose it in two reduced dimension matrices: one is document-topic matrix and the other is topic-term matrix. The whole technique is based upon singular value decomposition (SVD) and dimension reduction. pLSA, on the other hand, belongs to another stream of techniques within topic modeling. It is based on probabilistic method; Instead of SVD used in LSA, pLSA tries to come up with a probabilistic model with latent topics which can ultimately reproduce the data. There are other topic modeling techniques that build on pLSA like LDA (Latent Dirichlet Allocation) which is basically a Bayesian version of pLSA and therefore uses Dirichlet priors. Next, we describe the methods in detail:

Latent semantic analysis

Latent Semantic Analysis (LSA) is a text mining technique that extracts concepts hidden in text data. This is based solely on word usage within the documents and does not use a priori model. The goal is to represent the terms and documents with fewer dimensions in a new vector space (Han and Kamber 2006). Mathematically, it is done by applying singular value decomposition (SVD) on a term-by-document matrix (X) that holds the frequency of terms in all the documents of a given collection. When the new vector space is created by retaining a small number of significant factors k and X is approximated by X = TkSkDkT (Landauer et al. 1998). Term loadings (LT = TkSk) are rotated (varimax rotation is used) to obtain meaningful concepts of the document collection. The algorithm is shown in Fig. 1. It is implemented using Matlab.

Fig. 1
figure 1

Algorithm flow chart (LSA)

Probabilistic latent semantic analysis

Probabilistic Latent Semantic Analysis (pLSA) is another text mining method that was developed after LSA (Hofmann 1999). Unlike LSA, it is based on a probabilistic method, namely, a maximum likelihood model instead of a Singular Value Decomposition. The goal is to recreate the data in terms of term-document matrix by finding the latent topics. So, a model P(d,w) is put forward where document d and word w are in the corpus and P(d,w) corresponds to that entry in the document-term matrix. In this scenario, a document is sampled first, and in that document, a topic z is sampled, and based on the topic z, a word w is chosen. Therefore, d and w are conditionally independent given a hidden topic ‘z’. This can be represented in Fig. 2:

Fig. 2
figure 2

PLSA model

A document can be selected from the corpus with a probability of P(d). In the selected document, a topic z can be chosen from a conditional distribution with a probability P(z|d) and a word can be selected with a probability of P(w|z). The model makes two assumptions. First, the joint variable (d,w) is sampled independently, and more importantly, words and the documents are conditionally independent.

$${\text{P}}\left( {{\text{d}},{\text{w}}} \right) = {\text{ P}}\left( {\text{d}} \right){\text{P}}\left( {{\text{w}}|{\text{d}}} \right)$$

After some mathematical manipulation, it can be written in the following form.

$${\text{P}}\left( {{\text{d}},{\text{w}}} \right) = {\text{P}}\left( {\text{d}} \right)\sum {\text{P}}\left( {{\text{z}}|{\text{d}}} \right){\text{P}}\left( {{\text{w}}|{\text{d}}} \right)$$

The modeled parameters are commonly trained using an Expectation–Maximization algorithm. The equation lets us estimate the odds to find a certain word within a chosen document using the likelihood of observing some document and then based upon the distribution of topics in that document, the odds to find a certain word within that topic. In a flowchart form (Fig. 3):

Fig. 3
figure 3

Algorithm flow chart (PLSA)

Differences between LSA and PLSA

Both LSA and PLSA can recreate the data content based on the model. But there is an important difference between the two methods.

First, in LSA calculations, SVD is based on Matrix decomposition which is the F-norm approximation of the term frequency matrix, while PLSA relies on the likelihood function and prior probability of the latent class (probability of seeing this class in the data for a randomly chosen record, ignoring all attribute values) and, finds the maximum conditional probability of the model.

Second, in LSA, the recreated matrix X does not contain any normalized probability distribution, while in PLSA, the matrix of the co-occurrence table is a well-defined probability distribution. Both LSA and PLSA perform dimensionality reduction: LSA keeps only K singular values and PLSA, keeps K aspects.

For the purpose of the comparison, in the subsequent sections we need to find the comparable parameters of both models. From the mathematical and interpretation standpoint, the three matrices from SVD correspond to three probability distributions of PLSA:

  1. (a)

    T Matrix is synonymous to P(d|z) (doc to aspect).

  2. (b)

    D Matrix is related to P(z|w) (aspect to term).

  3. (c)

    S Matrix related to P(z) (aspect strength).

Performance Measure

To compare two techniques, one needs to evaluate the performance of each of these methods. In the analysis section, both quantitative evaluation and qualitative observations (Mei et al. 2007; Titov and McDonald 2008) are used to analyze the data results. Among the quantitative measure, precision/recall curve is the most widely used measure (Titov and McDonald 2008). Precision is defined as the number of relevant words retrieved divided by number of all words retrieved. This provides a measure of accuracy. The numbers of irrelevant words are counted to evaluate lack of accuracy.

$${\text{Precision = }}\frac{{\# ({\text{relevant}}\;{\text{items}}\;{\text{retrieved)}}}}{{\# ({\text{retrieved}}\;{\text{items)}}}} = P(\left. {{\text{relevant}}} \right|{\text{retrieved}})$$

Moreover, the following classification helps in the measure of accuracy:

 

Relevant

Nonrelevant

Retrieved

True positives (tp)

False positives (fp)

Not retrieved

False negatives (fn)

True negatives (tn)

Here, we measured the false positives and compared the two techniques. Ideally, false positives should be as low as possible. The measure of recall is used when the total of relevant words is known. Since, for conversational text, it is difficult to develop and measure a list of total relevant words, we did not use recall or false positive/negative as a measure of performance in this analysis.

Data

To begin, we utilized a dataset containing reviews of kitchen appliances. It was sourced (downloaded) from publicly available dataset collected by Blitzer et al. (2007). There were also reviews on books in this dataset. We excluded book reviews, because the content of the book written in the review may confound the topics of the review. In total, there were 406 kitchen appliances reviews included in the dataset, with 148 reviews being positive and 258 reviews being negative. Additionally, the authors analyzed a second dataset consisting of reviews for a specific brand of handbag, "Rose Handbag by FASH," that was obtained from Amazon.com in 2011. This dataset contained a total of 389 reviews. We used LSA and pLSA to extract hidden topics and associated words from both datasets, and subsequently compared the performance accuracies of the two methods.

Results

First, we analyze the brand-specific Handbag reviews. The reviews which got star rating 3 or more were classified in the positive reviews. On the other hand, reviews with star ratings 1 and 2 are classified as negative reviews. In the LSA model, three dimensions are retained after SVD. To compare the extracted topics with the topics extracted from the PLSA, we kept three topic groups for PLSA too (dimensions in LSA are comparable to topics in PLSA, shown in Table 2). For the positive reviews, the three topics/factors are named as “Leading positive attributes of the product”, “Core functionalities”, and “Affective” based on the associated words retrieved by both methods. On the other hand, for the negative reviews, the three topics are “not leather’, “Problems”, “Service failure” (shown in Table 3).

Table 2 Comparison of PLSA and LSA factors (and associated words) of the positive reviews of handbag
Table 3 Comparison of PLSA and LSA factors (and associated words) of the negative reviews of handbag

The comparison of the word associated with each positive topic (Table 2) shows that topics extracted by PLSA have more interpretability and contain more information. For example, for the positive reviews, the words which have high probability to be in the topic (“Leading Positive Attribute of the Product”) are “large”, “roomy”, “price”, “quality” (colored in pink). However, these important terms (since these words imply the competitive advantage of the brand and the topic) were not picked up by LSA. Moreover, among the words picked up by LSA, “review”, “purse”, “thank”, “shoulder” (colored in orange) is not relevant to this topic. The remaining words both in LSA and PLSA (colored black) contribute to the meaning of the factors (in both LSA and PLSA they are either relevant or neutral words). By neutral, we mean the words that are relevant and contribute to the better interpretation of the factor, but do not have unique power like the pink words in PLSA. For example, “amazing”, “beautiful”, “nice”, etc. contribute to the meaning of the “leading positive attributes” and help in the interpretation that customers are happy with these attributes of the product. However, these do not describe any of the leading attributes. The results show the top 10 terms (according to the probability for PLSA and loadings for LSA). A comparison of relevant and irrelevant words picked up by both methods are presented in Tables 4 and 5, respectively.

Table 4 Positive reviews relevant words extracted by both methods
Table 5 Positive reviews irrelevant words extracted by both methods

To quantify the performance superiority of one method over the other, precisions of the two methods are calculated and shown graphically in Figs. 4 and 5. The number of irrelevant words picked up by both methods implies the inferiority of the method. This is shown in Table 5. A method needs to yield a high precision as well as low irrelevant words to be considered as superior technique. As mentioned before, there are some words that are neutral: neither uniquely relevant nor irrelevant. They do not yield additional information about a topic but help understand the meaning of the topic. For example, in the case of positive reviews of a handbag, the words: nice, beautiful, or bag do not provide additional information, but provides better comprehension of the sentiment and topic. Hence, these are not counted towards relevancy or irrelevancy of the topic.

Fig. 4
figure 4

Precision curve of positive reviews

Fig. 5
figure 5

Irrelevant words of positive reviews

For the negative reviews, the same pattern emerges (Tables 6 and 7). The associated words with the first topic are almost identical in both methods. In the next topic (“Problem”), PLSA extracts more unique words that represent specific problems like “Rough”, “Thread”, “Material”, etc., which are not present in the LSA extraction. Both models convey the information that the product does not “look” like the “picture/photo”. Moreover, the service failure topic of PLSA also contains more specifics than LSA.

Table 6 Negative reviews relevant words
Table 7 Negative reviews irrelevant words

The precision of the two techniques for negative reviews are calculated. The Graphical representation of the precision curve is provided in Fig. 6:

Fig. 6
figure 6

Precision curve of negative reviews

The percentage of irrelevant words retrieved by the techniques is shown in Fig. 7. The graph shows that PLSA has a much lower percentage of irrelevant words than LSA (Fig. 7).

Fig. 7
figure 7

Percentage of retrieved irrelevant words in negative reviews

Fig. 8
figure 8

Example of positive and negative Reviews of the Handbag

It is quite clear from these figures that LSA performs less efficiently than PLSA when analyzing reviews from a particular brand, or LSA was not able to extract the specifics to the extent that PLSA did. The real example of positive and negative reviews (Fig 8) provides supports for the superiority of PLSA in this context. LSA was not able to effectively extract the complaints in negative review and large and spacious component of positive reviews.

With this in mind, we proceed to the next analysis to see if this pattern holds in other context. We extracted topics from a broader category “Kitchen Appliances” which contains reviews of various brands and appliances. As before, we divided positive and negative reviews into two groups based on their star rating. We then extract topics from the reviews. The results are shown in Table 8. A careful examination of the topics reveals that PLSA has formed the topics according to the specific appliances. For example, oven, pan and skillet; baking needs, then knives. On the other hand, if we look at the topics from LSA, it provides an overall summarization of the important aspects and attributes of this product category.

Table 8 Comparison of PLSA and LSA factors (associated words) of the positive reviews of kitchen appliances

It can be seen that LSA extracts topics that provide information about an attribute of the product category. For example, it can be inferred by looking at the factors extracted by LSA that, customers talk about core functionalities, aesthetics, branding, technical aspects, and affective content in the reviews. However, if the topics of PLSA are examined, it is evident that the topics are extracted according to the appliances. For example, first topic relates to “oven, pan, skillet”, the second one relates to “baking”, the third one to “knives”, and then “kettle and tea”. Unlike LSA topics, these do not express core themes of the reviews. Therefore, from a managerial perspective, information in the topics extracted by PLSA has little to no use. On the other hand, the topics in LSA provide the perspective of what customers generally look for in this broader product category. For example, customers are happy if the appliances have an aesthetic attribute in addition to the core functionalities and technical superiority. Moreover, this category seems to be a popular choice for gift giving. Customers also compare different brands when buying in this product category. All this information helps a manager decide about the attributes to include in a new product in this category or improvement of the product. Therefore, in this scenario, LSA works better in terms of interpretability. The following review supports the results we received from LSA which were not visible by PLSA.

“An elegantly designed LONG WIDE toaster…..Very clean, modern appearance. Looks great sitting on the kitchen counter, whereas many of the other toaster models today look like ugly chrome spaceships from the 1950's. Personally, I'm not into that kind of retro look……….” (aesthetics).

Or “This ice cream Maker is "GREAT". The fact that I can use an industrial motor (my kitchen Aid mixer) is fantastic…..” (technical aspect).

“….Also makes a fabulous wedding, shower, or housewarming gift. Forget expensive wedding registries—buy the bride a lodge dutch oven and skillet. She'll hand them down to the next generation…..” (gift giving/affective).

Therefore, depending on the objective of topic extraction, either PLSA or LSA becomes the superior method, and the superior performance of PLSA that was exhibited in the brand-specific reviews does not exist in every scenario. The result can be attributed to the fact that PLSA finds the highest probability terms that are likely to occur in the document. On the other hand, LSA tries to infer the topic based on the word co-occurrences.

We do not produce a performance measure curve for this section. As discussed before, the grouping of words is completely different, and a performance measure curve (or the table of relevance measurement) will not provide any meaningful comparison since there is no overlap of relevant and irrelevant words.

Discussion

User-generated contents are everywhere. This data contains information on sentiment and customer experiences about products or services. For market researchers, these contents are very useful and important. The use of content analysis goes back several decades in marketing. Qualitative content analysis reveals patterns, and this technique has been used in marketing for a long time (Bourassa et al. 2018; Phillips and Pohler 2019). However, contents found on the web are huge in size and usually, it is very cumbersome to manually analyze these unstructured texts. An intelligent and automated method is needed where the analysis of large amounts of data can be completed. Research has shown that competencies in big data analysis of a firm predict better performance measured by innovation, customer relationship management, etc. Big data analysis can assist in knowledge co-creation which in turn assists in making better decision (Acharya et al. 2018). More specifically, research points to the fact that domain knowledge should be incorporated while crafting the model and interpreting the result (Berger et al. 2020). Only by breaking the silos of different knowledge base, Marketing analytics can achieve its best result (Petrescu and Krishen 2021).

The current study tries to find the best method for extracting managerial information in two different marketing scenarios. Every technique has its own advantages and disadvantages. The suitability of the techniques depends on the context where it is being used. Although computer science researchers have been looking into this area for a long time, the marketing discipline started to investigate this area about a decade ago only. The knowledge and performance measures of the techniques cannot be directly transferred to the marketing domain since the performances are context specific. For example, from a retrieval perspective (Information Technology literature), success is the system’s ability to retrieve similar words or documents containing the same topic when a query word is provided to a system. So, the higher the performance, the higher the rate of finding out relevant (similar) words. On the contrary, in this marketing context, the higher the performance, the higher retrieval of the marketing manager’s important information terms/documents. The current study supports the idea that the choice of a text mining approaches should be domain-specific and augmented with domain knowledge.

As mentioned before, the two contexts were different in terms of specificity, meaning that one context contained customer reviews of only one brand of handbag and the other context contained reviews of different brands and appliances of “Kitchen Products”. The results show that, in the former case, PLSA extracted topics that are more meaningful and concrete. It was more interpretable and contained more information. LSA extracted topics well; but they were not as complete as PLSA topics. There were cross-words meaning that one word belonged to more than one factors. There was also a high number of irrelevant words in a topic compared to PLSA. Based on the precision and number of irrelevant words extracted by these two techniques, it can be concluded that in this context, PLSA works better in achieving the goal.

In the second context, where the goal was to learn important topics in a product category with lots of brands and products, LSA outperformed PLSA. Here also, PLSA extracted meaningful topics; but not aligned with important marketing interests. Each topic represented each appliance in the product category “kitchen appliances”. More importantly, it did not group the topics according to the discussion topics of the product category (hence product attribute), which are of the main interest from a marketing manager’s perspective. For example, PLSA extracted topics (Oven, Baking, Knives, etc.) may not provide a marketing manager with useful insights. It should be noted that from an information retrieval perspective PLSA might have done a fair or even superior job; however, depending on what kind of information is needed, PLSA is not a superior technique in this context. On the contrary, LSA grouped the topics according to the discussion topics of the review: core functionalities, technical aspect, branding, etc. This information is of interest to the marketing manager. Therefore, the study concludes that if the goal is to learn about a specific brand and its positive and negative attributes, PLSA reveals more specific information. However, if the goal is to learn about important aspects of a broader product category, LSA works better. The current study contributes in two ways: firstly, it responds to the recent call for research for marketing specific data analytics tool where marketing knowledge and goal is incorporated with sophisticated machine learning tools. Secondly, by experimenting in two different marketing scenarios, the study examines the suitability and superiority of two data analytics techniques.

Managerial implications

Managers can benefit greatly from understanding the topics of positive and negative reviews because they provide valuable insights into customer perceptions and preferences. By analyzing the topics that customers mention in their reviews, managers can identify areas of strength and weakness in their products, services, and overall customer experience. Using the right text mining tools, managers can identify areas for improvement. For example, the handbag should be improved in terms of its look (customers were disappointed that it did not look like leather). They can also identify areas of strength: The handbag was stylish and spacious. Managers can highlight these in their marketing messages and product descriptions, potentially driving sales and customer loyalty. Managers may also compare their product with competitors by evaluating competitors’ brands. In the broader product category, the topics may reveal important aspects of the category. For example, LSA revealed that aesthetics and gift giving were important in kitchen appliances, which might not be evident. Managers can track the topics mentioned in positive and negative reviews over time and thus, can identify changes in customer perceptions and preferences.

Limitation and future research

Like any other studies, this study is not without limitation. First, for the performance measurement, the study uses a precision measure, which looks at the number of relevant words retrieved in all retrieved words. However, there are words that are relevant to the topic but not useful. For example, in the handbag positive reviews, the words “nice”, “favorite” does not provide any additional information. But these words are not irrelevant words at all. To be conservative, the present study kept these words out from the “relevant” and “irrelevant” word counts so that the results are not biased. A count of irrelevant words provides another measure of performance that was used in the current study. However, the main criticism of this kind of performance measure is the subjectivity of the meaning. The precision measure is a binary approach that fails to capture the fuzziness in meaning of the words. Although the present study uses manual inspection to measure precision, the subjectivity often becomes a problem and may bias the result. To combat this problem to some extent, the ambiguous meaning words are left out while performing measurement of irrelevant words. Another limitation was that the dataset was small. However, the size of the dataset aided the manual coding of relevant/irrelevant words that was needed to come up with precision/ recall measure. Big dataset will introduce more noise and the result may lack objectivity. As recommended in the literature, automatic text analysis can learn from manual coding of small dataset and the model can then be applied to big dataset for real-life use (Chen et al. 2018).

Application of text mining in the marketing domain is a rising phenomenon. The fact that if a text mining technique is superior in terms of information retrieval (for representing the data, retrieving similar documents, or search purposes), it might not be a superior text mining technique for a marketer’s point of view. This idea warrants marketing researchers to experiment with techniques and find their suitability in different marketing contexts and needs.