1 Introduction

User intent modeling is a fundamental process within natural language processing models, with the primary aim of discerning a user’s underlying purpose or objective (Carmel et al. 2020). The comprehension and prediction of user goals and motivations through user intent modeling hold great significance in optimizing search engines and recommender systems (Zhang et al. 2019). This alignment of the user experience with preferences and needs contributes to enhanced user satisfaction and engagement (Oulasvirta and Blom 2008). Notably, state-of-the-art models such as ChatGPT have generated substantial interest for their potential in search engines and recommender systems (Cao et al. 2023), as they exhibit the capability to understand user intentions and engage in meaningful interactions.

The realm of user intent modeling finds extensive applications across diverse domains, spanning e-commerce, healthcare, education, and social media. In e-commerce, it plays a pivotal role in providing personalized product recommendations and detecting fraudulent product reviews (Tanjim et al. 2020; Wang et al. 2020; Guo et al. 2020; Paul and Nikolaev 2021). The healthcare sector leverages user intent modeling for delivering personalized health recommendations and interventions (Zhang et al. 2016; Wang et al. 2022). Similarly, within the educational sphere, it contributes to the tailoring of learning experiences to individual student goals and preferences (Liu et al. 2021; Bhaskaran and Santhi 2019). Furthermore, user intent modeling proves invaluable in comprehending user interests, preferences, and behaviors on social media, driving personalized content and targeted advertising delivery (Ding et al. 2015; Wang et al. 2019). Additionally, it plays a pivotal role in virtual assistants by aiding in the understanding of user queries and the provision of relevant responses (Penha and Hauff 2020; Hashemi et al. 2018).

User intent modeling approaches generally encompass a blend of models, including machine learning algorithms, to analyze various aspects of user input, such as words, phrases, and context. This approach enables the delivery of personalized responses within conversational recommender systems (Khilji et al. 2023). As a result, the domain of user intent modeling encompasses a diverse array of machine learning models, comprising Support Vector Machine (SVM) (Xia et al. 2018; Hu et al. 2017), Latent Dirichlet Allocation (LDA) (Chen et al. 2013; Weismayer and Pezenka 2017), Naive Bayes (Hu et al. 2018; Gu et al. 2016), as well as deep learning models like Bidirectional Encoder Representations from Transformers (BERT) (Yao et al. 2022), Word2vec(Da’u and Salim 2019; Ye et al. 2016), and Multilayer Perceptron (MLP) (Xu et al. 2022; Qu et al. 2016). A comprehensive examination of these models and their characteristics provides a holistic understanding of their advantages and limitations, thereby offering valuable insights for future research and development.

However, selecting the most suitable models within this domain presents challenges due to the multitude of available models and approaches (Zhang et al. 2019; Ricci et al. 2015). This challenge is further compounded by the absence of a clear classification scheme (Portugal et al. 2018), making it challenging for researchers and developers to navigate this diverse landscape and leading to uncertainties in model selection for specific requirements (Allamanis et al. 2018; Hill et al. 2016). Although user intent modeling in conversational recommender systems has been extensively explored, the research is dispersed across various sources, making it difficult to gain a cohesive understanding. The extensive array of machine learning models, concepts, datasets, and evaluation measures within this domain adds to the complexity.

To streamline and synthesize this wealth of information, we conducted a systematic literature review, adhering to established protocols set forth by Kitchenham et al. (2009), Xiao and Watson (2019), and Okoli and Schabram (2015). Furthermore, we developed a decision model derived from the literature review data, designed to assist in selecting user intent modeling methods. The utility of this decision model was evaluated through two academic case studies, structured in accordance with Yin’s guidelines (Yin 2009).

The study is structured as follows: we begin by defining the problem statement and research questions, followed by an outline of the employed research methods in Sect. 2. Subsequently, we delve into the methodology of the Systematic Literature Review (SLR) in Sect. 3, with findings and analysis presented in Sect. 4. In Sect. 5, we shift our focus to the practical application of the collected data through the introduced decision model. Furthermore, we present academic case studies validating our research in Sect. 6. The outcomes, lessons learned, and implications of our findings are discussed in Sect. 7, with a comparative analysis of related studies provided in Sect. 8. Finally, Sect. 9 offers a summary of our contributions and outlines future research directions.

2 Research approach

This study uses a systematic research approach, combining an SLR and Case Study Research, to investigate approaches in user intent modeling for conversational recommended systems. The SLR was instrumental in collating relevant information from existing literature, and the case studies provided a means to evaluate the application of our findings.

2.1 Problem statement

Conversational recommender systems can be considered a multidisciplinary field, residing at the intersection of various domains including Human–Computer Interaction (HCI) (de Barcelos Silva et al. 2020; Rapp et al. 2021), Conversational AI (Zaib et al. 2022; Saka et al. 2023), Conversational Search Systems (Keyvan and Huang 2022; Yuan et al. 2020), User Preference Extraction & Prioritization (Pu et al. 2012; Liu et al. 2022), and Contextual Information Retrieval Systems (Tamine-Lechani et al. 2010; Chen et al. 2015). To develop an effective conversational recommender system empowered by user intent modeling, a comprehensive understanding of models and approaches recognized in other domains such as “topic modeling” (Vayansky and Kumar 2020), “user intent prediction” (Qu et al. 2019), “conversational search” (Zhang et al. 2018), “intent classification” (Larson et al. 2019), “intent mining” (Huang et al. 2018), “user response prediction” (Qu et al. 2016), “user behavior modeling” (Pi et al. 2019), and “concept discovery” (Sun et al. 2015) is essential. These concepts are discussed prominently within the context of search engines and recommender systems. This study acknowledges the evolving nature of modern search engines, increasingly incorporating conversational features, blurring the lines between traditional search engines and conversational recommender systems (Beel et al. 2016; Jain et al. 2015). Consequently, the research scope extends to encompass the analysis of search engines as conversational recommender systems, enabling exploration of how they engage with users, with a particular emphasis on accurately modeling and predicting user intent. It is essential to emphasize that developing conversational recommender systems requires a holistic understanding of key domains contributing to their efficacy and relevance. This study centers on integrating distinct yet interconnected fields pivotal for advancing conversational recommender systems, with each domain playing a unique role in shaping their design, functionality, and user engagement aspects.

An analysis of current approaches in user intent modeling reveals significant challenges, necessitating a systematic literature review and the development of a decision model:

Scattered knowledge: Concepts, models, and characteristics of intent modeling are widely dispersed in academic literature (Portugal et al. 2018), requiring systematic consolidation and categorization to advance conversational recommender systems.

Model integration: Combining different user intent modeling models is complex, demanding an analysis of their compatibility and synergistic potential (von Rueden et al. 2020).

Trends and emerging patterns: Understanding the evolving field of user intent modeling requires a comprehensive review of current trends and emerging patterns (Chen et al. 2015; Jordan and Mitchell 2015).

Assessment criteria: Selecting suitable evaluation measures for intent modeling is complex, necessitating tailored metrics for effective assessment (Telikani et al. 2021; Singh et al. 2016).

Dataset selection: Identifying appropriate datasets reflecting diverse intents and user behaviors for training and evaluating intent modeling approaches is a significant challenge (Yuan et al. 2020).

Decision-making framework: The absence of a decision model in current literature covering various intent modeling concepts and offering guidelines for model selection and evaluation highlights the critical need for such a model (Farshidi et al. 2020; Farshidi 2020).

These challenges form the basis of our research, leading to our systematic literature review and the development of a decision model to assist researchers in addressing these complexities in conversational recommender systems and user intent modeling.

2.2 Research questions

The research questions, formulated in response to the identified challenges, are as follows:

\(RQ_1\): What models are commonly used in intent modeling approaches for conversational recommender systems?

\(RQ_2\): What are the key characteristics and features supported by these models used in intent modeling?

\(RQ_3\): What trends are observed in employing models for intent modeling in conversational recommender systems?

\(RQ_4\): What evaluation measures and quality attributes are commonly used to assess the performance and efficacy of intent modeling approaches?

\(RQ_5\): Which datasets are typically considered in the literature for training and evaluating machine learning models in intent modeling approaches?

\(RQ_6\): How can a decision model be developed to assist researchers in selecting appropriate models for intent modeling approaches?

2.3 Research methods

A mixed research method was utilized to address these research questions, combining SLR and Case Study Research (Jansen 2009; Johnson and Onwuegbuzie 2004). The SLR provided an in-depth understanding of user intent modeling approaches, and the case studies evaluated the practical application of the proposed decision model.

The SLR followed guidelines by Kitchenham et al. (2009), Xiao and Watson (2019), and Okoli and Schabram (2015) to identify models, their definitions, combinations, supported features, potential evaluation measures, and relevant concepts. A decision model was developed from the SLR findings, influenced by previous studies on multi-criteria decision-making in software engineering (Farshidi 2020).

Two case studies were conducted to evaluate the decision model’s practicality, following Yin’s guidelines (Yin 2017). These studies tested whether the decision model effectively aided researchers in selecting models for their projects.

This mixed research method, encompassing SLR and case studies, provided insights and practical solutions for advancing intent modeling in conversational recommender systems.

2.4 Research methods

A mixed research method was utilized to address these research questions, combining SLR and Case Study Research (Jansen 2009; Johnson and Onwuegbuzie 2004). The SLR provided an in-depth understanding of user intent modeling approaches, and the case studies evaluated the practical application of the proposed decision model.

The SLR followed guidelines by Kitchenham et al. (2009), Xiao and Watson (2019), and Okoli and Schabram (2015) to identify models, their definitions, combinations, supported features, potential evaluation measures, and relevant concepts. A decision model was developed from the SLR findings, influenced by previous studies on multi-criteria decision-making in software engineering (Farshidi 2020).

Two case studies were conducted to evaluate the decision model’s practicality, following Yin’s guidelines (Yin 2017). These studies tested whether the decision model effectively aided researchers in selecting models for their projects.

This mixed research method, encompassing both SLR and case studies, provided insights and practical solutions for advancing intent modeling in conversational recommender systems.

3 Systematic literature review methodology

This section outlines the review protocol employed in this study to systematically collect data from the literature on user intent modeling approaches for conversational recommender systems. The SLR review protocol, as depicted in Fig. 1, systematically collects and analyzes data relevant to this area.

Fig. 1
figure 1

The review protocol used in this study, following the guidelines by Kitchenham et al. (2009), Xiao and Watson (2019), and Okoli and Schabram (2015). The protocol consists of 12 elements for systematically collecting and extracting data from relevant studies, ensuring rigorous investigation and adherence to scientific standards. For details on how the protocol aligns with the guidelines, see Appendix A

(1) Problem formulation: The review protocol began by defining the problem and formulating research questions, followed by identifying research methods suitable for these questions. The procedures of Xiao and Watson (2019) were followed for defining the problem statement and formulating research questions, as detailed in Sects. 2.1 and 2.2. Analysis showed the first five research questions were suitable for exploration via an SLR. The outcomes of this SLR informed the development of a decision model. The final research question, focusing on the decision model’s development and application, was addressed through case study research.

(2) Initial hypotheses: A set of keywords was initially selected to locate primary studies relevant to the research questions. These keywords helped identify potential seed papers, marking the start of our literature review and facilitating a systematic exploration of relevant publications.

(3) Initial data collection: Primary studies’ characteristics, including source, URL, title, keywords, abstract, venue, venue quality, publication type, number of citations, publication year, and relevancy level, were manually collected. This process aided in focusing the review and establishing inclusion/exclusion criteria.

(4) Query string definition: The search query was developed by analyzing keywords, abstracts, and titles from primary studies, focusing on terms prevalent in relevant and high-quality papers. This method refined our search to include pertinent publications.

(5) Digital library exploration: Digital libraries such as ACM, ScienceDirect, and Elsevier were searched using the formulated query. This exploration ensured a thorough coverage of relevant publications.

(6) Relevancy Evaluation: Publications’ characteristics were evaluated for relevance to our research questions and challenges, confirming the inclusion of pertinent publications in our review.

(7) The pool of publications: The selected papers formed the basis of our review. This collection was expanded through the snowballing process, providing a thorough examination of the literature.

(8) Publication pruning process: Inclusion/exclusion criteria were strictly applied to the pool of publications, filtering out irrelevant content and focusing on relevant and high-quality studies.

(9) Quality assessment process: The quality of remaining publications was evaluated based on criteria like clarity of research questions and findings, ensuring the inclusion of only high-quality studies.

(10) Data extraction and synthesizing: Systematic data extraction from selected publications facilitated the identification and summarization of key information.

(11) Knowledge base: The final selection of publications formed a knowledge base, with extracted data linking findings and sources. This base serves as a resource for future research and further analysis.

(12) Snowballing process: Additional relevant papers were identified by reviewing references in selected publications, enhancing the review’s comprehensiveness.

This systematic review protocol ensured rigorous standards in collecting and analyzing literature on user intent modeling approaches, ensuring the validity and reliability of our study.

3.1 Review protocol

This section details the implementation of the review protocol, as depicted in Fig. 1, for our SLR.

3.1.1 Pool of publications

In our systematic literature review, the manual search phase, comprising the Initial Hypothesis and Initial Data Collection stages, preceded the automatic search phase. During the manual phase, we initially collected publications and extracted keywords indicative of common terms in noteworthy high-quality papers. These keywords were subsequently used as the foundation for the automatic search phase, beginning with the Query String Definition step.

In the manual search phase, we initially gathered a set of primary studies using search terms to identify relevant publications addressing our research questions. These terms were refined based on our domain understanding, considering publication abstracts, keywords, and titles. This process led to the identification of 314 highly relevant and high-quality publications. A publication was deemed ’relevant’ if it addressed at least one of our research questions (Sect. 2.2). We evaluated quality based on criteria like publication venue reputation (CORE Rankings PortalFootnote 1 and Scimago Journal & Country Rank (SJR)Footnote 2), citation count, and recency.

Subsequently, we employed Sketch Engine (Kilgarriff et al. 2014), a topic modeling tool, to extract frequently mentioned keywords from these 314 primary studies. We considered keywords that appeared at least three times and used them to formulate our search query for the automatic search phase, focusing on topics related to user intent modeling in search engines and recommender systems, including intent detection, prediction, interactive modeling, conversational search, classification, and user behavior modeling. Our search query combined keywords using logical operators “AND” and “OR,” resulting in the following query:

(“user intent” OR “user intent modeling” OR “topic model” OR “user intent detection” OR “user intent prediction” OR “interactive intent modeling” OR “conversational search” OR “intent classification” OR “intent mining” OR “conversational recommender system” OR “user response prediction” OR “user behavior modeling” OR “interactive user intent” OR “intent detection” OR “concept discovery”) AND (“search engine” OR “recommender system”)

In the automatic search phase, we assessed the relevance of these papers by examining their titles, abstracts, keywords, and conclusions, classifying them as ’highly relevant’ (addressing at least three research questions), ’medium relevant’ (addressing two questions), ’low relevant’ (addressing one question), or ’irrelevant’ (not addressing any questions). After this evaluation, we excluded irrelevant publications from the pool, leaving 3,828 relevant publications out of the initial 13,168 search results.

The publications underwent rigorous screening, adhering strictly to our predefined inclusion and exclusion criteria, ensuring the selection of only the most pertinent and high-quality publications for data extraction and analysis. We assessed the effectiveness of our search query by comparing results with those from a manual search, confirming the consistency and accuracy of our approach. This evaluation verified that our query included publications identified as high-quality and highly relevant during the manual phase, affirming the successful retrieval of publications relevant to user intent modeling in search engines and recommender systems.

3.1.2 Publication pruning process

In systematic literature reviews or meta-analyses, inclusion/exclusion criteria play a pivotal role as definitive guidelines for determining study relevance and eligibility. These criteria guarantee the selection of high-quality studies that directly address the research question.

For our study, we implemented stringent inclusion and exclusion criteria to eliminate irrelevant and low-quality publications. These criteria considered several factors, including the publication venue’s quality, publication year, citation counts, and relevance to our research topic. We precisely defined and consistently applied these criteria to include only high-quality and relevant publications.

We categorized publications based on their quality using assessments from the CORE Rankings Portal and SJR:

  • Publications with “A*” or “Q1” indicators were classified as “Excellent.”

  • Those with “A” or “Q2” were deemed “Good.”

  • B” or “Q3” were categorized as “Average.”

  • C” or “Q4” were labeled as “Poor.”

  • Publications without quality indicators on these platforms were marked as “N/A.”

Publications classified as “Poor” or “N/A” were excluded from further consideration. Additional exclusion criteria encompassed publications with low citation counts, older publication dates, or classification as Gray literature (e.g., books, theses, reports, and short papers).

After applying our predefined inclusion/exclusion criteria, we identified and selected 1067 publications from the initial pool of 3828 publications.

3.1.3 Quality assessment process

During the SLR, we assessed the quality of the selected publications after applying the inclusion/exclusion criteria. Several factors were taken into consideration to evaluate the quality and suitability of the publications for our research:

Research method: We evaluated whether the chosen research method was appropriate for addressing the research question. The clarity and transparency of the research methodology were also assessed.

Research type: We considered whether the publication presented original research, a review article, a case study, or a meta-analysis. The relevance and scope of the research in the field of machine learning were also taken into account.

Data collection method: We evaluated the appropriateness of the data collection method in relation to the research question. The adequacy and clarity of the reported data collection process were also assessed.

Evaluation method: We assessed whether the chosen evaluation method was suitable for addressing the research question. The transparency and statistical significance of the reported results were considered.

Problem statement: We evaluated whether the publication identified the research problem and provided sufficient background information. The clarity and definition of the research question were also taken into account.

Research questions: We assessed the relevance, clarity, and definition of the research questions in relation to the research problem.

Research challenges: We considered whether the publication identified and acknowledged the challenges and limitations associated with the research.

Statement of findings: We evaluated whether the publication reported the research results and whether the findings were relevant to the research problem and questions.

Real-world use cases: We assessed whether the publication provided real-world use cases or applications for the proposed method or model.

Based on these assessment factors, a team of five researchers involved in the SLR evaluated the publications’ quality. Each researcher independently assessed the publications based on the established criteria. In cases where there were discrepancies or differences in evaluating a publication’s quality, the researchers engaged in discussions to reach a consensus and ensure a consistent assessment.

Through this collaborative evaluation process, a final selection of 791 publications was made from the initial pool of 1,067 publications. These selected publications demonstrated high quality and relevance to our research question, meeting the predefined inclusion/exclusion criteria. The consensus reached by the research team ensured a rigorous and reliable selection of publications for further analysis and data extraction in the SLR.

3.1.4 Data extraction and synthesizing

During the data extraction and synthesis phase of the SLR, our primary objective was to address the identified research questions and gain insights into the foundational models commonly employed by researchers in their intent modeling approaches. We aimed to understand the features of these models, the associated quality attributes, and the evaluation measures utilized by research modelers to assess their approaches. Furthermore, we explored the potential combinations of models that researchers incorporated into their research papers.

We extracted relevant data from the papers included in our review to achieve these objectives. In our perspective, evaluation measures encompassed a range of measurements and key performance indicators (KPIs) used to evaluate the performance of the models. Quality attributes represent the characteristics of models that are not easily quantifiable and are typically assigned values using Likert scales or similar approaches. For example, authors may assess the performance of a model as high or low compared to other models. On the other hand, features encompassed any characteristics of models that authors highlighted to demonstrate specific functionalities. These features played a role in the selection of models by research modelers. Examples of features include ranking and prediction capabilities.

In this study, ’models’ are conceptualized as structured, mathematical, or computational frameworks employed for simulating, predicting, or classifying phenomena within user intent modeling in conversational recommender systems. These models are organized into a variety of categories, reflecting diverse methodologies. This includes Supervised Learning, where models are trained on labeled data for accurate predictions; Unsupervised Learning, which uncovers patterns in unlabeled data; and Collaborative Filtering, among others, each offering unique insights into user interactions. Furthermore, the study emphasizes the critical role of development metrics such as Cosine similarity (Gunawan et al. 2018) and KL Divergence (Bigi 2003), which are not just evaluation tools but are fundamental in refining and optimizing the functionality of these models. Algorithmic and computational techniques like ALS (Takács and Tikk 2012) and BM25 (Robertson et al. 2004) also play an integral part in the implementation and efficacy of these categorized models (refer to Sect. 4.1).

By extracting and analyzing this data, we aimed to comprehensively understand the existing literature, including popular open-access datasets used for training and evaluating the models. This knowledge empowered us to contribute insights and recommendations to the academic community, supporting them in selecting appropriate models and approaches for their intent modeling research endeavors.

3.2 Search process

In this study, we followed the review protocol presented in this section (see Fig. 1) to gather relevant studies.

Table 1 An overview of the systematic search process for identifying relevant publications on user intent modeling for conversational recommender systems

The search process involved an automated search phase, which utilized renowned digital libraries such as ACM DL, IEEE Xplore, ScienceDirect, and Springer. However, Google Scholar was excluded from the automated search due to its tendency to generate numerous irrelevant studies. Furthermore, Google Scholar significantly overlaps the other digital libraries considered in this SLR. Table 1 provides an overview of the sequential phases of the search process, outlining the number of studies encompassed within each stage. It provides insights into the search process conducted in four phases:

Phase 1 (Pool of Publications): We initially performed a manual search, resulting in 314 relevant publications from Google Scholar. Additionally, automated searches from ACM DL, IEEE Xplore, ScienceDirect, and Springer contributed to the pool of publications with 586, 82, 921, and 1,896 relevant papers, respectively.

Phase 2 (Publication pruning process): In this phase, the inclusion/exclusion criteria were applied to the collected publications, ensuring the selection of high-quality and relevant studies. The numbers were reduced to 311 in ACM DL, 9 in IEEE Xplore, 246 in ScienceDirect, and 379 in Springer.

Phase 3 (Quality assessment process): Quality assessment was conducted for the publications based on several criteria, resulting in a final selection of 1067 studies from all sources.

Phase 4 (Data extraction and synthesizing + Snowballing process): During this phase, data extraction and synthesis were performed to gain insights into foundational intent modeling models, quality attributes, evaluation measures, and potential combinations of models used by researchers. Additionally, snowballing, involving reviewing references of selected publications, led to an additional 20 relevant papers. Applying the review protocol and snowballing, we retrieved 791 high-quality studies for our comprehensive analysis and synthesis in this systematic literature review.

4 Findings and analysis

In this section, we present the SLR results and provide an overview of the collected dataFootnote 3, which were analyzed to address the research questions identified in our study.

4.1 Models

This study defines a ’model’ as a structured, mathematical, or computational framework specifically designed for simulating, predicting, or classifying phenomena within user intent modeling in conversational recommender systems. These models have been organized into distinct categories, each representing a unique approach to comprehending and interpreting user interactions.

Model categories: Our categorization includes various methodologies such as Supervised Learning, Unsupervised Learning, Collaborative Filtering, and others. For instance, models under Supervised Learning rely on labeled data for training, enabling them to make informed predictions or classifications. Unsupervised Learning models, in contrast, derive insights autonomously from unlabeled data, revealing underlying patterns without explicit guidance.

Development metrics: To measure and refine model performance, development metrics like Cosine similarity (Gunawan et al. 2018) and Kullback–Leibler (KL) Divergence (Bigi 2003) are employed. These metrics are not just evaluative tools; they are pivotal in enhancing system functionality and optimization throughout the development process. In the development and assessment of conversational recommender systems, it is essential to differentiate between metrics used for system development and those applied for model evaluation. Metrics such as Cosine similarity and KL Divergence are integral during the development phase, where they contribute significantly to system functionality and optimization. These metrics help fine-tune the system by assessing similarity measures and information loss. Conversely, the evaluation of model performance relies on a distinct set of measures, which are crucial for understanding the efficacy and accuracy of models in real-world applications. These evaluation measures are detailed in Sect. 4.5, providing insights into how well the models perform regarding user intent prediction and recommendation accuracy.

Algorithmic and computational techniques: The study also underscores the importance of various algorithmic and computational techniques, such as ALS (Takács and Tikk 2012) and BM25 (Robertson et al. 2004). These techniques are integral to the practical implementation of the categorized models, aiding in critical tasks like data processing and system optimization.

The SLR conducted reveals a multifaceted landscape of models used in user intent modeling, each marked by its distinct methodology and application. Detailed information about these models and their categorizations can be found in the appendix (Appendix C).

Key categories such as Classification (Qu et al. 2019; Zhang et al. 2016) and Clustering (Zhang et al. 2021; Agarwal et al. 2020) models, Convolutional Neural Network (CNN)(Wang et al. 2020; Zhang et al. 2016), Deep Belief Networks (DBN)(Zhang et al. 2018; Hu et al. 2017), and Graph Neural Networks (GNN) (Yu et al. 2022; Lin et al. 2021) are highlighted. These categories, detailed in our SLR, represent a spectrum of techniques and approaches within user intent modeling.

Table 2 The mapping of 59 models to their respective categories in user intent modeling, showcasing models featured in at least six publications

Table 2 presents an overview of the 59 most frequently mentioned models in the SLR on user intent modeling. The table showcases the models appearing in at least six publications (columns) and their corresponding 18 categories (rows). Each model in user intent modeling can often be categorized into multiple categories, highlighting their versatility and diverse functionalities. For example, GRU4Rec (Hidasi and Karatzoglou 2018), a widely recognized model in the field (cited in 10 publications included in our review), exhibits characteristics that align with various categories. GRU4Rec falls under Supervised Learning, as it uses labeled examples during training to predict user intent. Additionally, it incorporates Collaborative Filtering techniques by analyzing user behavior and preferences to generate personalized recommendations, associating it with the Collaborative Filtering category (Latifi et al. 2021). Moreover, GRU4Rec can be classified as a Classification model as it categorizes input data into specific classes or categories to predict user intent (Park et al. 2020). It also demonstrates traits of Regression models by estimating and predicting user preferences or ratings based on the available data. Considering its reliance on recurrent connections, GRU4Rec can be associated with the Recurrent Neural Networks (RNN) category, enabling it to process sequential data and capture temporal dependencies (Ludewig and Jannach 2018). Lastly, GRU4Rec’s ability to cluster similar users or items based on their behavior and preferences places it within the Clustering category. This clustering capability provides valuable insights and recommendations to users based on their respective clusters.

4.2 Features

In our research, we analyzed user intent modeling within conversational recommender systems. This involved the identification of 74 distinct features, each frequently mentioned in a minimum of six publications. These features provide an alternative means of categorizing models based on the specific functions they are designed to serve, as described by the authors in their studies. Subsequently, we categorized the models utilized in these systems based on the features they support, presenting the results systematically in Table 3.

Table 3 Maps 74 features to corresponding models in user intent modeling, dividing them into 20 categories. For comprehensive definitions and explanations, see Appendix D

We grouped these features into 20 categories, each reflecting specific contexts and applications. Features such as historical data references (Zhou et al. 2020; White et al. 2013; Zou et al. 2022) enable models to leverage past interactions for future predictions, while algorithm-agnostic models (Zhou et al. 2019; Musto et al. 2019; Mandayam Comar and Sengamedu 2017) offer flexibility in selecting the most suitable algorithms for specific tasks. Model-based features (Ding et al. 2022; Pradhan et al. 2021; Yu et al. 2018), which rely on statistical methods (Schlaefer et al. 2011; Kim et al. 2017) and semantic analysis (Zhang and Zhong 2016; Xu et al. 2015), are used to provide predictions based on predefined models.

The categorization includes various focus areas: ’Rule-Based Approaches’ use pattern and template methods to interpret user intent, while ’Query Processing’ models specialize in refining user queries to improve interaction quality. In ’Predictive Modeling’, the focus is on forecasting user preferences using techniques such as Prediction and Ratings Prediction. ’Text Analytics’ involves models that perform Topic Modeling, Text Similarity, and Semantic Analysis, which are crucial for analyzing user dialogues. Personalization features, ranging from ’User-Based Personalization’ and ’Temporal Personalization’ to ’Content-Based Personalization’ and ’Interaction-Based Personalization’, adapt recommendations according to user activity, time factors, content characteristics, and user interactions. Finally, ’Recommendation Techniques’ cover a broad spectrum of models optimized for tasks like Item Recommendation, Hybrid Recommendation, and Ranking.

Table 3 not only illustrates the mapping of features to models in user intent modeling but also highlights the frequency of explicit mentions of these features in relevant publications. The color coding in each cell indicates the level of support for each feature by the models, with gray cells denoting an absence of evidence supporting the feature’s compatibility with a particular model, based on our comprehensive review of 791 papers. For example, LDA is frequently mentioned in the context of pattern-based approaches within rule-based methods (Tang et al. 2010; Li et al. 2014), underscoring its applicability in scenarios where patterns are analyzed to extract meaningful insights.

The process of mapping features to models in user intent modeling requires an in-depth understanding of the particular features and the capabilities of the available models. For instance, in text analysis and natural language processing, models like LDA, TF-IDF, and BERT are often chosen for their effectiveness in semantic analysis and topic modeling. Similarly, for predictive modeling tasks, SVM, Random Forest, and Gradient Boosted Decision Trees (GBDT) are preferred due to their accuracy in classification and regression tasks. In cases where temporal dynamics are significant, models like LSTM, GRU, and Markov Chains are utilized for their ability to handle sequential data effectively. Furthermore, for tasks involving recommendation systems, models like Matrix Factorization (MF), Collaborative Filtering (CF), and Neural Collaborative Filtering (NCF) are often employed for their efficiency in capturing user preferences and generating personalized recommendations.

4.3 Model combinations

The data extraction and synthesis phase of the SLR identified 59 models, each referenced in a minimum of six publications. These models were often integrated to address various research considerations, such as feature requirements, quality attributes, and evaluation measures, as illustrated in Fig. 3. The selected publications discussed combinations of models based on the authors’ research and evaluated the outcomes of these combinations.

Fig. 2
figure 2

Shows the matrix representation of model combinations in user intent modeling research. The matrix shows the combinations of 59 models, with each cell indicating the number of publications discussing the model combination. Diagonal cells show the count of publications discussing each model individually. Green cells represent a higher number of research articles, yellow and red cells indicate a lower number, and gray cells show areas with no evidence of valid combinations. The last row indicates the frequency of publications where models were combined with others. For example, 451 publications mentioned LDA in combination with other models. This combination matrix offers insights into the frequency and popularity of model combinations, helping researchers identify existing combinations and potential research areas

To analyze model combinations, a matrix similar to a symmetric adjacency matrix was created, with models as nodes and combinations as edges in a graph. This matrix, shown in Fig. 2, includes 59 models. Diagonal cells indicate the count of publications discussing each model independently, such as 205 papers on LDA (Chen et al. 2013; Weismayer and Pezenka 2017) and 122 on TF-IDF (Binkley et al. 2018; Izadi et al. 2022).

Matrix cells show the number of papers discussing model combinations. For instance, 57 papers explored the LDA and TF-IDF combination (Venkateswara Rao and Kumar 2022), and 35 examined SVM and LDA (Yu and Zhu 2015).

The matrix uses color coding to indicate the research volume associated with each combination. Green cells represent higher research volumes, yellow and red lower volumes, and gray cells indicate areas lacking evidence of valid combinations. These gray areas present opportunities for future research.

The combination matrix provides an overview of model combinations in user intent modeling research, highlighting the frequency of their use in literature and serving as a resource for identifying existing combinations and potential research areas.

Combining various models, often termed as ’ensemble’ or ’hybrid’ modeling (Sagi and Rokach 2018), can enhance the predictive power (Beemer et al. 2018) and accuracy of conversational recommender systems. However, this approach is subject to certain constraints and requires careful consideration.

Firstly, it’s crucial to acknowledge that while combining models is possible, it’s not always straightforward or advantageous. The feasibility of integrating multiple models depends on several factors:

Fig. 3
figure 3

The decision-making process researchers employ in selecting intent modeling approaches within the academic literature

Compatibility: The models to be combined must be compatible in terms of input and output data formats, scale, and the nature of predictions they make (Srivastava et al. 2020). For instance, combining a probabilistic model with a neural network requires a harmonious interface where the output of one can effectively serve as the input for another.

Complexity and overfitting: Increasing model complexity can lead to overfitting, where the model performs well on training data but poorly on unseen data (Sagi and Rokach 2018). It is essential to balance the complexity with the generalizability of the model.

Computational resources: More complex ensembles demand greater computational power and resources. This can be a limiting factor, especially in real-time applications (Bifet et al. 2009).

Interpretability: Combining models can sometimes lead to a loss of interpretability, making it challenging to understand how predictions are made, which is crucial for certain applications (Wang and Lin 2021).

Regarding a more in-depth analysis, combining models indeed necessitates a thorough evaluation of their individual and collective performance. This includes assessing how they complement each other, their synergistic potential, and the trade-offs involved.

4.4 Model trends

In recent studies, machine learning models have witnessed significant advancements across various fields, leading to notable trends in their development and application. However, it is worth noting that our study goes beyond recent years. By using the term “models,” we refer to a wide range of models that research modelers can employ in user intent modeling.

Table 4 The trend of models mentioned in user intent modeling research over publication years, highlighting the popularity and emergence of various models

To gain insights into the usage patterns of these models, we organized the 59 selected models (mentioned in at least six publications) based on the publication years of the studies that referenced them. The span of these publications ranges from 2002 to 2023. Table 4 provides an overview of these trends.

Among the selected models, LDA, TF-IDF, SVM, CF, and MF emerged as the top five most frequently mentioned models, appearing in over 500 papers. It is important to note that while some recently gained substantial attention, such as BERT (Yao et al. 2022), CF (Yadav et al. 2022), LSTM (Xu et al. 2022; Gozuacik et al. 2023), DNN (Yengikand et al. 2023), and GRU (Chen and Wong 2020; Elfaik 2023), our study encompasses models from various time periods.

These trends shed light on the popularity and usage patterns of different models in user intent modeling. By identifying frequently mentioned models and observing shifts in their prevalence over time, researchers and practitioners can stay informed about the evolving landscape of user intent modeling and make informed decisions when selecting models for their specific applications (Zaib et al. 2022; Ittoo and van den Bosch 2016).

4.5 Quality models and evaluation measures

In AI-based projects, selecting high-quality models and using evaluation measures is crucial. Quality attributes, defined in studies (de Barcelos Silva et al. 2020; Hernández-Rubio et al. 2019), reflect a model’s performance, effectiveness, and user-centric features in conversational recommender systems. These attributes are essential for a comprehensive evaluation but are not straightforward to measure empirically. They often require subjective assessment or indirect methods. “Novelty,” for example, relates to the uniqueness of recommendations (Cremonesi et al. 2011). Although challenging to quantify, methods like user studies or item distribution analysis can offer insights into a model’s novelty. Conversely, evaluation measures, as discussed in literature (Zaib et al. 2022), provide a quantitative assessment of model outputs. These attributes and measures are pivotal in delivering accurate and reliable results, as various studies demonstrate (Pan et al. 2022; Pu et al. 2012; Hernández-Rubio et al. 2019).

While accuracy is a commonly employed evaluation measure, it may not adequately represent the model’s performance, especially in imbalanced classes. Alternative measures such as precision (Salle et al. 2022; Baykan et al. 2011), recall (Wang et al. 2022; Phan et al. 2010), and F1-score (Yu et al. 2019; Ashkan et al. 2009) are used to evaluate model performance, particularly when dealing with imbalanced data. Additionally, evaluation measures like the area under the curve (AUC) (Xu et al. 2016; Liu et al. 2022) and receiver operating characteristic (ROC) (Wu et al. 2019; Wang et al. 2020) curve are frequently used to assess binary classifiers. These measures provide insights into the model’s ability to differentiate between positive and negative instances, particularly when the costs of false positives and false negatives differ.

For ranking problems, evaluation measures such as mean average precision (MAP) (Mao et al. 2019; Ni et al. 2012) and normalized discounted cumulative gain (NDCG) (Liu et al. 2020; Kaptein and Kamps 2013) are commonly employed. These measures evaluate the quality of the ranked lists generated by the model and estimate its effectiveness in predicting relevant instances.

When evaluating regression models, measures such as root mean squared error (RMSE) (Cai et al. 2014; Colace et al. 2015) and mean absolute error (MAE) (Yao et al. 2017; Yadav et al. 2022) are used to quantify the discrepancy between predicted values and actual values of the target variable.

The selection of appropriate evaluation measures is crucial to ensure the accuracy and reliability of machine learning models. The suitable measure(s) choice depends on the specific problem domain, data type, and project objectives. These factors are pivotal in selecting the most appropriate quality attributes and evaluation measures. Table 5 presents the quality attributes and evaluation measures identified in at least six publications. Performance, Effectiveness, Diversity, Usefulness, and Stability are among the top five quality attributes. Precision, recall, F1-score, accuracy, and NDCG are among the top five evaluation measures identified in the SLR. For detailed explanations of the identified quality attributes and evaluation measures, please refer to Appendix E.

Table 5 An overview of quality models and evaluation measures used in machine learning, including performance metrics such as accuracy, precision, recall, F1-score, and AUC-ROC, as well as other evaluation techniques such as cross-validation, holdout validation, and confusion matrices

4.6 Datasets

Datasets are fundamental to machine learning and data science research, as they provide the raw material for training and testing models and enable the development of solutions to complex problems. They come in various forms and sizes, ranging from small, well-curated collections to large, messy datasets with millions of records. The quality of datasets is crucial (Pan et al. 2022), as high-quality data ensure the accuracy and reliability of models, while poor-quality data can introduce biases and inaccuracies. Data quality encompasses completeness, accuracy, consistency, and relevance, and ensuring data quality involves cleaning, normalization, transformation, and validation.

The size and complexity of datasets pose challenges in terms of storage, processing, and analysis. Big datasets require specialized tools and infrastructure to handle the volume and velocity of data. On the other hand, complex datasets, such as graphs, images, and text, may require specialized techniques and models for extracting meaningful information and patterns.

Furthermore, the availability of datasets is a vital consideration in advancing machine learning research and applications. Open datasets that are freely accessible and well-documented foster collaboration and innovation, while proprietary datasets may restrict access and impede progress (Zhang et al. 2016; Teevan et al. 2008; Ittoo and van den Bosch 2016). Data sharing and ethical considerations in data use are increasingly recognized, leading to efforts to promote open-access and responsible data practices.

In this study, we identified 80 datasets that researchers have utilized in the context of intent modeling approaches, and these datasets were mentioned in at least two publications. Table 6 provides an overview of these datasets and their frequency of usage from 2005 to 2023. Notably, TREC, MovieLens, Amazon, Yelp, and AOL emerged as the top five datasets commonly used in evaluating intent modeling approaches for recommender systems (Wang et al. 2021; Papadimitriou et al. 2012; Wang et al. 2020) and search engines (Fan et al. 2022; Liu et al. 2022; Konishi et al. 2016). These datasets have been utilized in over 200 publications, highlighting their significance and wide adoption in the field.

The datasets selected for this study cover a broad range of scenarios in user intent modeling for conversational recommender systems. This diversity aligns with the comprehensive nature of the research. Each dataset contributes unique insights into user behaviors, preferences, and interactions, which are crucial for understanding and effectively modeling user intent within conversational interfaces.

Table 6 Datasets commonly used for user intent modeling approaches

The variety of datasets reflects the complexity of conversational recommender systems, which need to address varied user needs, contexts, and interaction modes. Including datasets that differ in size, structure, and origin ensures the study captures a broad spectrum of user interactions and system responses, providing a solid foundation for developing and evaluating intent modeling approaches.

5 Decision-making process

This section describes how researchers make decisions when selecting intent modeling approaches. It illustrates a systematic approach to choosing intent modeling methods based on academic literature.

5.1 Decision meta-model

Research modelers face the challenge of selecting the most suitable combination of models to develop an intent modeling approach for a conversational recommender system. In this section, we present a meta-model for the decision-making process in the context of intent modeling. Adopting this meta-model is based on the principles outlined in the ISO/IEC/IEEE standard 42010 (ISO 2011), which provides a framework for conceptual modeling of Architecture Description. This process requires a systematic approach to ensure that the chosen models effectively capture and understand users’ intentions. Let’s consider a scenario where research modelers encounter this challenge and go through the decision-making process:

Goal and concerns: The research modelers aim to build an intent modeling approach for a conversational recommender system. Their goal is to accurately determine the underlying purposes or goals behind users’ requests, enabling personalized and precise responses. The modelers have concerns regarding quality attributes and functional requirements, and they aim to achieve an acceptable level of quality based on their evaluation measures.

Identification of models and features: To address this problem, the modelers consider various models that can capture users’ intentions in the conversational context. They identify essential features, such as user intent prediction or context analysis based on their concerns. They explore the available models and techniques, such as Supervised Learning, Unsupervised Learning, Recurrent Neural Networks, Deep Belief Networks, Clustering, and Self-Supervised Learning Models. The modelers also consider the recent trends in employing models for intent modeling.

Evaluation of models: The modelers review the descriptions and capabilities of several models that align with capturing users’ intentions in conversational interactions. They analyze each model’s strengths, limitations, and applicability to the intent modeling problem. They consider factors such as the models’ ability to handle natural language input, understand context, and predict user intents accurately. This evaluation allows them to shortlist a set of candidate models that have the potential to address the intent modeling challenge effectively.

In-depth analysis: The research modelers conduct a more detailed analysis of the shortlisted models. They examine the associated techniques for each model to ensure their suitability in the conversational recommender system. They assess factors such as training data requirements, model complexity, interpretability, and scalability. Additionally, they explore the possibility of combining models to identify compatible combinations or evaluate the existing literature on such combinations. If necessary, further study may be conducted to assess the feasibility of model combinations. This step helps them identify the optimal combination of models that best capture users’ intentions in the conversational setting and address their concerns.

5.2 A decision model for intent modeling selection

Decision theories have wide-ranging applications in various fields, including e-learning (Garg et al. 2018) and software production (Xu and Brinkkemper 2007; Fitzgerald and Stol 2014; Rus et al. 2003). In the literature, decision-making is commonly defined as a process involving problem identification, data collection, defining alternatives and selecting feasible solutions with ranked preferences (Fitzgerald et al. 2017; Kaufmann et al. 2012; Garg 2020; Garg et al. 2017; Sandhya et al. 2018; Garg 2019). However, decision-makers approach decision problems differently, as they have their priorities, tacit knowledge, and decision-making policies (Doumpos and Grigoroudis 2013). These differences in judgment necessitate addressing them in decision models, which is a primary focus in the field of multiple-criteria decision-making (MCDM).

MCDM problems involve evaluating a set of alternatives and considering decision criteria (Farshidi et al. 2023). The challenge lies in selecting the most suitable alternatives based on decision-makers’ preferences and requirements (Majumder 2015). It is important to note that MCDM problems do not have a single optimal solution, and decision-makers’ preferences play a vital role in differentiating between solutions (Majumder 2015). In this study, we approach the problem of model selection as an MCDM problem within the context of intent modeling approaches for conversational recommender systems.

Let \(Models={m_1,m_2, \dots , m_{\Vert Models\Vert }}\) be a set of models found in the literature (decision space), such as LDA, SVM, and BERT. Let \(Features={f_1,f_2, \dots , f_{\Vert Features\Vert }}\) be a set of features associated with the models, such as ranking, prediction, and recommendation. Each model \(m \in Models\) supports a subset of the set Features and satisfies a set of evaluation measures (\(Measures={e_1,e_2, \dots , e_{\Vert Measures\Vert }}\)) and quality attributes (\(Qualities={q_1,q_2, \dots , q_{\Vert Qualities\Vert }}\)). The objective is to identify the most suitable models, or a combination of models, represented by the set \(Solutions \subset Models\), that address the concerns of researchers denoted as Concerns, where \(Concerns \subseteq \{ Features \cup Measures \cup Qualities \}\). Accordingly, research modelers can adopt a systematic strategy to select combinations of models by employing an MCDM approach. This approach involves taking Models and their associated Features as input and applying a weighted combination to prioritize the Features based on the preferences of decision-makers. Subsequently, the defined Concerns are considered, and an aggregation method is utilized to rank the Models and propose fitting Solutions. Consequently, the MCDM approach can be formally expressed as follows:

$$\begin{aligned} MCDM: Models \times Features \times Concerns \rightarrow Solutions \end{aligned}$$

The decision model developed for intent modeling, using MCDM theory and depicted in Fig. 3, is a valuable tool for researchers working on conversational recommender systems. This approach helps researchers explore options systematically, consider important factors for conversational interactions, and choose the best combination of models to create an effective intent modeling approach. The decision model suggests five steps for selecting a combination of models for conversational recommender systems:

  1. (1)

    Models: In this phase, researchers should gain insights into best practices and well-known models employed by other researchers in designing conversational recommender systems. Appendix B can be used to understand the definitions of models, while Appendix C can help become familiar with the categories used to classify these models. Table 2 illustrates the categorization of models in this study, and Table 4 presents the trends observed among research modelers in utilizing models to build their conversational recommender systems.

  2. (2)

    Feature requirements elicitation: In this step, researchers need to fully understand the core aspects of the intent modeling problem they are studying. They should carefully analyze their specific scenario to identify the key characteristics required in the models they seek, which may involve using a combination of models. For instance, researchers might consider prediction, ranking, and recommendation as essential feature requirements for their conversational recommender systems. Researchers can refer to Appendix D to gain a better understanding of feature definitions and model characteristics, which will help them select the most suitable features for their intent modeling project.

  3. (3)

    Finding feasible solutions: In this step, researchers should identify models that can feasibly fulfill all of their feature requirements. Table 3 can be used to determine which models support specific features. For example, the table shows that 99 publications explicitly mentioned Collaborative Filtering as a suitable model for applications requiring predictions, and 94 publications indicated CF’s applicability for ranking. Moreover, 46 studies employed CF for item recommendation. Based on these findings, if a conversational recommender system requires these three feature requirements, CF could be selected as one of the potential solutions. If the number of feature requirements increases, the selection problem can be converted into a set covering problem (Caprara et al. 2000) to identify the smallest sub-collection of models that collectively satisfy all feature requirements.

  4. (4)

    Selecting feasible combinations: In this phase, researchers need to assess whether the identified models can be integrated or combined. Figure 2 provides information on the feasibility of combining models based on the reviewed articles in this study. If the table does not indicate a potential combination, it does not necessarily imply that the combination is impossible. It means no evidence supports its feasibility, and researchers should investigate the combination independently.

  5. (5)

    Performance analysis: After identifying a set of feasible combinations, researchers should address their remaining concerns regarding quality attributes and evaluation measures. Table 5 and Appendix E can be used to understand the typical concerns other researchers in the field employ. Additionally, Table 6 provides insights into frequently used datasets across domains and applications. Researchers can then utilize off-the-shelf models from various libraries, such as TensorFlowFootnote 4 and scikit-learn,Footnote 5 to build their solutions (pipelines). These solutions can be evaluated using desired datasets to assess whether they meet all the specified concerns. This phase of the decision model differs from the previous four phases, as it requires significant ad-hoc efforts in developing, training, and evaluating the models. By employing this decision-making process, research modelers can develop an intent modeling approach that accurately captures and understands users’ intentions in the conversational recommender system. This enables personalized and precise responses, enhancing the overall user experience and satisfaction.

Table 7 An overview of the feature requirements considered by the case study participants (Manzoor and Jannach 2022; Tanjim et al. 2020) during their decision-making process for developing their conversational recommender systems

6 Evaluation of findings: case studies

In this section, we detail the evaluation of our proposed decision model (see Sect. 5) through two scientific case studies. These studies were conducted by a team of eight researchers from the University of California San Diego, USA, and the University of Klagenfurt, Austria. The primary goal was to test the decision model’s applicability in the participants’ projects and to understand their decision-making processes better.

During the case studies, participants specified their unique feature requirements, which we recorded in Table 3. Essentially, after reviewing the features listed in the Table, the participants defined their requirements. Using these data, we pinpointed suitable models from the extensive information in Table 2 and Table 3. We then examined potential combinations of these models, as depicted in Fig. 2.

To evaluate the significance and recognition of the chosen models in academic circles, we undertook a detailed analysis, referencing Table 4. This examination yielded insights into the models’ popularity and relevance over time within the research community. The most notable and trending combinations were then presented to the case study participants. Figure 3 provides a schematic of the typical decision-making process researchers follow when selecting models for intent modeling.

Table 7 offers a thorough summary of the conducted case studies. This table outlines the specific contexts of each study, the feature requirements identified by the participants, the model selections made by the researchers based on these requirements, and the outcomes from applying our decision model in each scenario. The following sections will delve deeper into these case studies, discussing the addressed concerns, the results achieved using the decision model, and the conclusions drawn from our comprehensive analysis.

6.1 Case study method

Case study research is an empirical research method (Jansen 2009) that investigates a phenomenon within a particular context in the domain of interest (Yin 2017). Case studies can be employed to describe, explain, and evaluate a hypothesis. They involve collecting data regarding a specific phenomenon and applying a tool to evaluate its efficiency and effectiveness, often through interviews. In our study, we followed the guidelines outlined by Yin (2009) to conduct and plan the case studies.

Objective: The main aim of this research was to conduct case studies to evaluate the effectiveness of the decision model and its applicability in the academic setting for supporting research modelers in selecting appropriate models for their intent modeling approaches.

The cases: We conducted two case studies within the academic domain to assess the practicality and usefulness of the proposed decision model. The case studies aimed to evaluate the decision model’s effectiveness in assisting research modelers and researchers in selecting models for their intent modeling tasks.

Methods: For the case studies, we engaged with research modelers and researchers actively involved in intent modeling approaches. We collected data through expert interviews and discussions to gain a comprehensive understanding of their specific requirements, preferences, and challenges when selecting models. The case study participants provided valuable insights into the decision-making process and offered feedback on the suitability of the decision model for their intent modeling needs.

Selection strategy: In line with our research objective, we employed a multiple case study approach (Yin 2009) to capture a diverse range of perspectives and scenarios within the academic domain. This selection strategy aimed to ensure the credibility and reliability of our findings. We deliberately selected two publications from highly regarded communities with an A* CORE rank. We verified the expertise of the authors, who actively engage in selecting and implementing intent modeling models. Their knowledge and experience allowed us to consider various factors in different application contexts, including quality attributes, evaluation measures, and feature requirements.

By conducting these case studies, our research aimed to validate the practicality of the decision model and demonstrate its value in supporting research modelers and researchers in their intent modeling endeavors. The insights gained from the case studies provided valuable feedback for refining the decision model and contributed to advancing the intent modeling field within the academic community.

6.2 Case study 1:

The first case study presented in our paper revolves around a research project conducted at the University of Klagenfurt in Austria. The study focused on investigating a retrieval-based approach for conversational recommender systems (CRS) (Manzoor and Jannach 2022). The primary objective of the researchers was to assess the effectiveness of this approach as an alternative or complement to language generation methods in CRS. They conducted user studies and carefully analyzed the results to understand the potential benefits of retrieval-based approaches in enhancing user intent modeling for conversational recommender systems.

Throughout the project, the case study participants made two important design decisions (models), TF-IDF and BERT, to develop the CRS. They evaluated their approach using MovieLens and ReDial datasets to measure its performance.

By applying the decision model presented in our paper (in Sect. 5.2), the case study participants identified six essential features that were crucial in guiding their decision-making process for selecting the most suitable models and datasets. These features provided valuable insights into designing and implementing an effective retrieval-based approach for conversational recommender systems, contributing to improving user intent modeling in this context.

6.2.1 Feature requirements

In this section, we outline the feature requirements that the case study participants considered during their decision-making process for the research project. Each feature requirement was carefully chosen based on its relevance and potential to enhance the retrieval-based approach for CRS. Below are the feature requirements and their rationale for selection:

Semantic analysis: The case study participants recognized the importance of analyzing the meaning and context of words and phrases in natural language data. Semantic analysis helps the model understand user intents more accurately, leading to more relevant and contextually appropriate recommendations.

Term weighting: Assigning numerical weights to terms or words in a document or dataset helps the machine learning model comprehend the significance of different terms in the data. The participants adopted term weighting to improve the model’s ability to identify relevant features and make better recommendations.

Content-based recommendations: This feature involves utilizing item characteristics or features to recommend similar items to users. The participants valued this approach, allowing the system to tailor recommendations based on users’ past interactions and preferences.

Ranking: The case study participants sought a model capable of ranking items or entities based on their relevance to specific queries or users. By incorporating ranking, the system ensures that the most relevant recommendations appear at the top, enhancing user satisfaction.

Transformer-based: Transformer-based models, such as neural networks, excel at learning contextual relationships in sequential data like natural language. The participants chose this approach to effectively leverage the model’s ability to understand and process conversational context.

End-to-end approach: The case study participants preferred an end-to-end modeling strategy, where a single model directly learns complex tasks from raw data inputs to desired outputs. By avoiding intermediate stages and hand-crafted features, the participants aimed to simplify the model and improve its performance in CRS tasks.

6.2.2 Results and analysis

During the expert interview session with the case study participants, we systematically followed the decision model presented in Sect. 5.2 to identify appropriate combinations of models that align with the defined feature requirements for their conversational recommender systems. In the initial steps (Steps 1 and 2), we collaboratively established the essential feature requirements for their CRS, carefully considering the critical aspects that would enhance their system’s performance. Subsequently, we referred to Table 3 (Steps 3 and 4) to evaluate which models could fulfill these specific feature requirements.

Upon analyzing the table, both the case study participants and we discovered that BERT offered support for Semantic Analysis, Content-Based Recommendations, Ranking, Transformer-Based, and End-To-End Approaches. Additionally, TF-IDF was found to be supportive of Term Weighting, Content-Based Recommendations, and Ranking. This insightful information made us realize that combining these two models would adequately address all six feature requirements for their CRS. Consequently, the case study participants confirmed that combining BERT and TF-IDF would be a suitable choice to fulfill their CRS needs. This combination was validated as a compatible and valid option, consistent with the guidance provided by the decision model.

The data presented in Table 4 further reinforce the popularity and relevance of BERT and TF-IDF as widely used models for conversational recommender systems. The case study participants were well aware of these trends and acknowledged that their model choices aligned with prevailing practices. This alignment provides additional validation to their model selections, demonstrating their dedication to adopting the latest technologies in their research project to create an effective CRS.

Furthermore, Table 6 provides valuable insights into the popularity and significance of various datasets, including MovieLens and ReDial. These datasets have been cited and utilized in over 50 publications, underscoring their recognition within the research community. The case study participants acknowledged the widespread use of these datasets by other researchers, reflecting an interesting trend in dataset selection. This awareness further highlights their commitment to utilizing well-established and reputable datasets in their research, contributing to the credibility and reliability of their study findings.

6.3 Case study 2

The second case study presented in our paper focuses on a research project conducted at the University of California San Diego in the United States (Tanjim et al. 2020). The study introduces the Attentive Sequential model of Latent Intent (ASLI) to enhance recommender systems by capturing users’ hidden intents from their interactions.

Understanding user intent is essential for delivering relevant recommendations in conventional recommender systems. However, user intents are often latent, meaning they are not directly observable from their interactions. ASLI addresses this challenge by uncovering and leveraging these latent user intents.

Using a self-attention layer, the researchers (case study participants) designed a model that initially learns item similarities based on users’ interaction histories. They incorporated a Temporal Convolutional Network (TCN) layer to derive latent representations of user intent from their actions within specific categories. ASLI employs an attentive model guided by the latent intent representation to predict the next item for users. This enables ASLI to capture the dynamic behavior and preferences of users, resulting in state-of-the-art performance on two major e-commerce datasets from Etsy and Alibaba.

By utilizing the decision model presented in our paper (in Sect. 5.2), the case study participants identified eight essential features crucial in guiding their decision-making process for selecting the most suitable models and datasets.

6.3.1 Feature requirements

In this section, we present the feature requirements that were crucial considerations for the case study participants during their decision-making process for the research project. The following are the feature requirements and the reasons behind their selection:

Pattern-based: In the case study, the researchers aimed to improve conversational recommender systems by capturing users’ hidden intents from their interactions. By identifying user interactions and behavior patterns, the ASLI model can make informed guesses about users’ intents and preferences, leading to more accurate and relevant recommendations.

Prediction: The ASLI model predicts the next item for users based on their latent intents derived from their historical interactions within specific categories. The model can deliver personalized and effective recommendations by predicting users’ preferences and future actions.

Historical data-driven recommendations: The researchers used previously collected data from users’ interactions to train the ASLI model. By analyzing historical data, the model can identify patterns, relationships, and trends in users’ behaviors, which inform its predictions and recommendations for future interactions.

Click-through recommendations: In the case study, the ASLI model considers users’ clicks on items to understand their preferences and improve the relevance and ranking of future recommendations. The model can adapt and refine its recommendations by utilizing click-through data to meet users’ needs better.

Item recommendation: The ASLI model suggests items to users based on their previous interactions, enabling it to offer personalized recommendations tailored to individual users’ preferences and behaviors.

Transformer-based: ASLI is a neural network model based on the Transformer architecture. Transformers are well-suited for learning context and meaning from sequential data, making them suitable for capturing the dynamic behavior and preferences of users in conversational recommender systems.

Network architecture: The ASLI model’s network architecture is crucial in guiding information flow through the model’s layers. By designing an effective network architecture, the researchers ensure that the model can capture and leverage users’ latent intents to make accurate recommendations.

Attentive: ASLI utilizes attention mechanisms to focus on the most relevant parts of users’ interactions and behaviors. The model can better understand users’ intents and preferences by paying attention to critical information, leading to more attentive and accurate recommendations.

6.3.2 Results and analysis

During the expert interview session with the case study participants, we used the decision model (outlined in Sect. 5.2) to identify suitable combinations of models that align with the defined feature requirements for their conversational recommender systems. In Steps 1 and 2, we collaboratively established the essential feature requirements for the ASLI, carefully considering critical aspects to enhance system performance. Then, in Steps 3 and 4, we referred to Table 3 to evaluate models that could fulfill these specific feature requirements.

According to the table, both the case study participants and ourselves found that the GRU model supports Prediction, Historical Data-Driven Recommendations, Click-Through Recommendations, Network Architecture, and Attentive features. Additionally, the LDA model supports Pattern-Based and Item Recommendation features. We also discovered that BERT is the only model in our list supporting Transformer-Based features, and the case study participants agreed with this combination, considering these models as the baseline of their approach. However, after performance analysis, they found that GRU’s performance was unsatisfactory in their setting. Consequently, they chose to develop their own model from scratch, modifying the self-attentive model. It’s worth noting that the Self-attentive model only supports Network Architecture and Attentive features, making it a suitable baseline in combination with other models for their solutions. The case study participants mentioned considering LDA and BERT as potential models for their upcoming research project due to their similar requirements, although they were not previously aware of this combination. As per Step 5 of the decision model, researchers should address any remaining concerns about quality attributes and evaluation measures after identifying feasible combinations. Thus, the decision model provided valid models in this case study, but in real-world scenarios, model combinations may be modified based on other researchers’ concerns, such as quality attributes and evaluation measures.

The case study participants emphasized the value of the data presented in Table 4 and their intention to incorporate it into their future design decisions. Understanding trends in model usage is crucial to identify models that may perform well in conversational recommender systems, considering similar concerns and requirements from other researchers.

Furthermore, Table 6 indicates that Etsy and Alibaba datasets are not widely known in the context of user intent modeling, although the case study participants clarified that these datasets are well-known in e-commerce services, aligning with their project’s specific domain of focus. Nonetheless, they expressed their intention to utilize the data presented in this table to explore potential datasets for evaluating their approach and comparing their work against other approaches in the literature.

7 Discussion

7.1 SLR outcomes

Code sharing: Our review of 791 publications revealed that only 68 (8.59%) explicitly shared their code repositories, such as GitHub. This observation underscores a significant gap in code sharing among researchers, posing challenges to replicating experiments and advancing scientific knowledge. Open access to code is imperative for ensuring transparency and reproducibility in machine learning research (Haefliger et al. 2008).

Singleton models: The systematic literature review yielded 600 models, with 352 (58.66%) being singletons. This trend indicates a preference for developing unique models tailored to specific research questions. However, an overreliance on singletons might hinder the generalizability of findings and the ability to compare different methods. Promoting the use of common models or establishing standard evaluation benchmarks is essential to enhance reproducibility and comparability in machine learning research (Amershi et al. 2019).

Model combination: The methodology for combining models in some publications was not clearly articulated, making it difficult to understand the techniques employed and their efficacy. Clear documentation of model combination techniques and their underlying rationale is crucial for ensuring transparency and facilitating the replication and extension of research findings (Kuwajima et al. 2020). The challenge lies in determining the effectiveness of integrating different models without extensive contextual information. The current approach, based on literature and general requirements, provides a foundational framework but may not capture the specific nuances needed for particular applications. Future research should involve detailed analyses of model combinations in specific scenarios, using case studies or empirical evaluations to provide insights into the interactions and complementarity of different models, thereby enhancing the practical applicability of intent modeling methods in conversational recommender systems.

Model variations: Our analysis identified a diverse range of model variations, such as BERT4Rec (Chen et al. 2022), SBERT (Garcia and Berton 2021), BERT-NeuQS (Hashemi et al. 2020), BioBERT (Carvallo et al. 2020), ELBERT (Gao and Lam 2022), and RoBERTa (Wu et al. 2021), primarily derived from BERT (Devlin et al. 2018). Despite the utility of these variations in addressing different tasks, their extensive use complicates model comparison and experiment replication. Establishing standardized categories for model variations would aid researchers in discerning model differences and similarities, thereby promoting model sharing, reuse, and collaborative progress in machine learning research (Sarker 2021).

Trends: As depicted in Fig. 2, LDA is a predominant model in user intent modeling approaches (Table 4). Although traditional models like LDA have significantly contributed to the field and inspired the development of advanced models such as BERT (Devlin et al. 2018), the adoption of traditional models has possibly declined due to the emergence of sophisticated models like BERT. The bidirectional contextual embeddings and transformer architecture of BERT have demonstrated remarkable performance across various NLP tasks, attracting considerable attention from the research community. The preference for modern models is also influenced by the trade-off between the interpretability of traditional models and the complexity of advanced models like BERT, as well as the diversity of NLP applications (Ribeiro et al. 2016).

Datasets: Only 394 out of 791 publications (49.81%) utilized public, open-access datasets, indicating a reliance on proprietary datasets by more than half of the publications. This limitation hinders data reuse and poses challenges to research reproducibility and credibility. While 253 public open-access datasets were identified, 173 (68.37%) were mentioned in only one publication and not reused, highlighting deficiencies in dataset-sharing practices. The limited availability of datasets impedes the reproduction and validation of results, comparison, and benchmarking of models, and identification of state-of-the-art techniques. Moreover, the lack of diverse and openly accessible datasets may result in biased model development and evaluation, limiting the applicability of models to real-world scenarios and diverse user populations. Addressing these issues necessitates fostering a culture of openness and collaboration within the research community.

7.2 Case study participants

The case study participants showed a careful and thorough approach to decision-making by conducting extensive research and literature reviews. This method allowed them to select models for their research project carefully, showcasing the effectiveness of the decision model in helping researchers make well-informed and compatible model choices for developing conversational recommender systems.

Both case study participants emphasized the value of using the decision model and the knowledge gained during this study. They expressed their intention to use this information to make informed decisions when selecting the appropriate combinations of models for user intent modeling approaches.

Furthermore, the case study participants recognized that the decision model serves as a valuable tool for generating an initial list of models to develop their approaches. However, they acknowledged that Step 5 of the decision model highlights the importance of further analysis, such as performance testing, to identify the right combinations of models that work well for specific use cases. This recognition underscores the need for practical testing and validation to ensure the chosen model combinations are effective and suitable for their particular research goals.

The use of well-known datasets, such as MovieLens and ReDial in the first case study and Etsy and Alibaba datasets in the second case study, underlines the researchers’ commitment to using credible data sources for evaluation. The decision model allowed researchers to consider dataset popularity and relevance, enhancing the credibility and reliability of their study findings.

The decision model provided valuable insights into the trends in model usage, as presented in Table 4. Both case study participants expressed interest in incorporating these trends into their future research decisions, ensuring they stay up-to-date with the latest advancements in intent modeling approaches.

Throughout the case studies, the discussion highlighted the dynamic nature of the decision-making process. While the decision model offered feasible model combinations based on feature requirements, the final choices were influenced by additional factors such as model performance, quality attributes, and evaluation measures. This adaptability showcased the decision model’s flexibility in accommodating researchers’ unique priorities and preferences.

Both case studies effectively demonstrated that the decision model offers a systematic approach to model selection and helps researchers explore various options and combinations of models. This exploratory nature allowed researchers to consider novel solutions and build upon existing models, creating innovative intent modeling approaches.

The success of the decision model in assisting researchers in their model selection process holds promising implications for the broader academic community. By providing a structured and comprehensive methodology, the decision model can streamline the development of conversational recommender systems with accurate intent modeling capabilities, ultimately enhancing user experience and satisfaction.

7.3 Threat to validity

Validity evaluation is essential in empirical studies, encompassing SLRs and case study research (Zhou et al. 2016). This paper’s validity assessment covers various dimensions, including Construct Validity, Internal Validity, External Validity, and Conclusion Validity. Although other types of validity, such as Theoretical Validity and Interpretive Validity, are relevant to intent modeling, they are not explicitly addressed in this context due to their relatively limited exploration.

Construct validity pertains to the accuracy of operational measures or tests used to investigate concepts. In this research, we developed a meta-model (refer to Fig. 3) based on the ISO/IEC/IEEE standard 42010 (ISO 2011) to represent the decision-making process in intent modeling for conversational recommender systems. We formulated comprehensive research questions by utilizing the meta-model’s essential elements, ensuring an exhaustive coverage of pertinent publications on intent modeling approaches.

Internal validity concerns verifying cause-effect relationships within the study’s scope and ensures the study’s robustness. We employed a rigorous quasi-gold standard (QGS) (Zhang et al. 2011) to minimize selection bias in paper inclusion. Combining manual and automated search strategies, the QGS provided an accurate evaluation of sensitivity and precision. Our search spanned four major online digital libraries, widely regarded to encompass a substantial portion of high-quality publications relevant to intent modeling for conversational recommender systems. Additionally, we used snowballing to complement our search and mitigate the risk of missing essential publications. The review process involved a team of researchers, including three principal investigators and five research assistants. Furthermore, the findings were validated by real-world researchers in intent modeling to ensure their practicality and effectiveness.

External validity pertains to the generalizability of research findings to real-world applications. This study considered publications discussing intent modeling approaches across multiple years. Although some exclusions and inaccessibility of studies may impact the generalizability of SLR and case study results, the proportion of inaccessible studies (less than 2%) is not expected to affect the overall findings significantly. The knowledge extracted from this research can be applied to support the development of new theories and methods for future intent modeling challenges, benefiting both academia and practitioners in this field.

Conclusion validity ensures that the study’s methods, including data collection and analysis, can be replicated to yield consistent results. We extracted knowledge from selected publications, encompassing various aspects such as Models, Datasets, Evaluation Metrics, Quality Attributes, Combinations, and Trends in intent modeling approaches. The accuracy of the extracted knowledge was safeguarded through a well-defined protocol governing the knowledge extraction strategy and format. The authors proposed and reviewed the review protocol, establishing a clear and consistent approach to knowledge extraction. A data extraction form was employed to ensure uniform extraction of relevant knowledge, and the acquired knowledge was validated against the research questions. All authors independently determined quality assessment criteria, and crosschecking was conducted among reviewers, with at least three researchers independently extracting data, thus enhancing the reliability of the results.

8 Related work

The development of conversational recommender systems is significantly influenced by the findings from SLRs in various related research domains, each contributing to the collective understanding of user intent modeling. These SLRs are pivotal in gathering and analyzing data to interpret user needs within conversational interfaces.

In the field of Human–Computer Interaction, key SLRs conducted by de Barcelos Silva et al. (2020), Rapp et al. (2021), Iovine et al. (2023), Jiang et al. (2013), and Jindal et al. (2014) have systematically collected and analyzed data to understand how user-friendly interfaces can enhance user engagement and satisfaction in conversational systems, a core aspect of user intent modeling.

Similarly, in Conversational AI, the SLRs by Zaib et al. (2022) and Saka et al. (2023) have aggregated research findings focusing on simulating natural, human-like interactions, a key component in understanding and modeling user intent in conversational recommender systems.

The research in Conversational Search Systems, notably synthesized by Keyvan and Huang (2022), and Yuan et al. (2020), represents comprehensive reviews of the dynamics of user-system interaction for information retrieval. These studies align with user intent modeling by providing insights into how conversational systems can better parse and understand user queries.

For User Preference Extraction & Prioritization, SLRs by Pu et al. (2012), Liu et al. (2022), Zhang et al. (2019), and Hernández-Rubio et al. (2019) have methodically reviewed the literature to inform how conversational recommender systems can more accurately and contextually tailor their recommendations.

Table 8 Positions our study within the existing body of literature on conversational recommender systems, highlighting user intention understanding from various perspectives

In the realm of Contextual Information Retrieval Systems, the systematic reviews by Tamine-Lechani et al. (2010), Chen et al. (2015), and Latifi et al. (2021) have contributed to understanding the impact of explicit and implicit user queries and contextual factors, crucial for refining user intent modeling in conversational systems.

Our SLR encapsulates these efforts, covering a total of 791 publications. We highlight the collective contribution of these SLRs to the field of user intent modeling in conversational recommender systems. Table 8 summarizes these efforts, offering a comparative analysis and showcasing the contributions of our study. Notably, our review reveals that while there is a substantial amount of literature on individual aspects of user intent modeling, a comprehensive, integrated approach in the form of an SLR is less common. The synthesis of findings from HCI, Conversational AI, Conversational Search Systems, User Preference Analytics, and Contextual Information Retrieval forms the foundation for advancing user intent modeling in conversational recommender systems (Dodeja et al. 2024; Zhang et al. 2024).

In Table 8: Column 1 shows the authors of the studies, Column 2 indicates the year of publication, and Column 3 indicates the type of publications, which could be either academic or gray literature. Column 4 highlights the research methods that the publications employed. Column 5 signifies the main focus of the topic of the publications, and Column 6 indicates the Application or Domain that the publication conducted research on. Column 7 (# Reviewed publications) indicates the number of publications that each study reviewed in its research. Column 8 (Decision model) indicates whether a selected publication offered a decision model based on its findings from the data captured in the literature. Column 9 (Trend) shows if the selected study reported on the trends in employing models that it found. Column 10 (Datasets) shows if the researchers reported on the training or evaluation datasets. Column 11 (Model categories) indicates whether the publications reported on categories of the models (if they categorized the models). Column 12 (Model combinations) indicates if they reported on model combinations and integration. Column 13 (Feature/Model Mapping) shows if they offered the features that the models support. The subsequent four columns (Columns 14, 15, 16, and 17) show the number of quality attributes, features, evaluation measures, and models that each study reported. Columns 18, 19, 20, and 21 indicate how many quality attributes, features, evaluation measures, and models that the selected publications reported are in common with the ones in our study. Finally, Column 22 (Coverage (%)) shows the percentage of common concepts between our study and each selected publication.

Academic literature reviews dominate the selected studies, representing over 80 percent of the reviewed literature, aligning with our primary focus on academic sources. The research methods in these studies include SLR, Case Study, Survey, and Review. However, none of the reviewed SLRs employed case studies to evaluate their findings, relying solely on the SLR process. Our study adopts a more comprehensive approach by incorporating case studies into our research methods, offering a holistic perspective on decision-making in user intent modeling.

Our study places a significant emphasis on decision-making processes and decision models. Among the reviewed SLRs, only one paper (Pu et al. 2012) focused on this aspect, while our study introduces a decision model based on existing literature. This model serves as a valuable tool for research modelers to make informed decisions and identify suitable models or combinations for specific scenarios.

In terms of trends within models, four studies (Zaib et al. 2022; Pu et al. 2012; Chen et al. 2015; Zhang et al. 2019) (23.52%) reported on this aspect. Additionally, seven studies (Latifi et al. 2021; Yuan et al. 2020; Jindal et al. 2014; Hernández-Rubio et al. 2019; Zaib et al. 2022; Keyvan and Huang 2022; Pan et al. 2022) (41.17%) provided insights into open-access datasets, valuable for training or evaluating models.

Furthermore, our study categorizes models similar to eight other SLRs (de Barcelos Silva et al. 2020; Rapp et al. 2021; Pan et al. 2022; Liu et al. 2022; Zhang et al. 2019; Hernández-Rubio et al. 2019; Jindal et al. 2014; Yuan et al. 2020) (47.05%). However, only two publications (Hernández-Rubio et al. 2019; Yuan et al. 2020) (11.76%) reported on model combinations, suggesting a research gap in effective model integration.

9 Conclusion and future work

In this paper, the investigation focused on the decision-making process involved in selecting intent modeling approaches for conversational recommender systems. The primary aim was to tackle the challenge encountered by research modelers in determining the most effective model combination for developing intent modeling approaches.

To ensure the credibility and reliability of our findings, we conducted a systematic literature review and carried out two academic case studies, meticulously examining various dimensions of validity, including Construct Validity, Internal Validity, External Validity, and Conclusion Validity.

Drawing inspiration from the ISO/IEC/IEEE standard 42010 (ISO 2011), we devised a meta-model as the foundational framework for representing the decision-making process in intent modeling. By formulating comprehensive research questions, we ensured the inclusion of relevant studies and achieved an exhaustive coverage of pertinent publications.

Our study offers a holistic understanding of user intent modeling within the context of conversational recommender systems. The SLR analyzed over 13,000 papers from the last decade, identifying 59 distinct models and 74 commonly used features. These analyses provide valuable insights into the design and implementation of user intent modeling approaches, contributing significantly to the advancement of the field.

Building on the findings from the SLR, we proposed a decision model to guide researchers and practitioners in selecting the most suitable models for developing conversational recommender systems. The decision model considers essential factors such as model characteristics, evaluation measures, and dataset requirements, facilitating informed decision-making and enhancing the development of more effective and efficient intent modeling approaches.

We demonstrated the practical applicability of the decision model through two case studies, showcasing its usefulness in real-world scenarios. The decision model aids researchers in identifying initial model sets and considering essential quality attributes and functional requirements, streamlining the process and enhancing its reliability.

The significance of contributions in user intent modeling cannot be overstated in the current landscape of scientific research. Whether actively advancing the fundamentals or exploring its applications within their respective domains, scientists are undeniably conscious of this field. Amidst this crucial juncture, our study is essential as it consolidates the field’s foundations. We envision our research to become an integral component of essential literature for newcomers, fostering the promotion of this vital field and streamlining researchers’ efforts in selecting suitable models and techniques. By solidifying the understanding and relevance of User Intent Modeling, we aim to facilitate future advancements and innovation in this study area.

To ensure the longevity and up-to-dateness of the knowledge base constructed from our SLR, we are enthusiastic about taking the necessary steps to maintain its relevance and value for future researchers embarking on similar projects. We plan to establish a collaborative platform or repository, inviting researchers to contribute their latest findings and studies pertaining to the addressed research challenges. By fostering a community-driven approach, we aim to create an engaging environment that encourages regular and meaningful contributions. To streamline the process, we intend to develop user-friendly interfaces and implement effective content moderation to ensure the knowledge base’s scientific integrity.

Additionally, we aim to extend the current methodology by introducing more detailed criteria and context-specific frameworks for the selection and integration of intent modeling methods in conversational recommender systems. This involves developing nuanced frameworks that assess model compatibility and integration potential, tailored to address the unique challenges and requirements of specific domains and conversational scenarios. By deepening the analysis of how different models interact and complement each other in varying contexts, future research will not only refine the decision-making process for method selection but also enhance the overall effectiveness and user-centricity of conversational recommender systems.

Moreover, we are excited to explore implementing an automated data crawling mechanism, periodically and systematically searching reputable literature sources and academic databases. This technology will enable seamless integration of the latest research into the knowledge base. Additionally, we are committed to maintaining a record of changes and updates to the knowledge base, including precise timestamps and new information sources. This transparent documentation will empower future researchers to follow the knowledge base’s evolution and confidently leverage it for their specific research needs. By embracing these proactive measures, we envision establishing a continuously updated and robust knowledge base that serves as a valuable resource for researchers in the dynamic domain of user intent modeling and recommender systems.