1 Introduction

The explosion of digital content has made it increasingly difficult for users to find and consume items that are relevant and of interest to them. Recommender systems have emerged as a solution to this problem by providing personalized recommendations to users. They have been utilized in many domains such as accommodation search in Airbnb [1], video suggestion in YouTube [2], and movie recommendation in Netflix [3].

The growing volume of news articles available on the internet has made it increasingly difficult for users to find relevant and interesting news articles. However, traditional recommender systems often fail to capture the dynamic nature of users' preferences and the changing trends in news articles. Traditional recommender systems recommend articles in accordance with how similar they are to articles that a user has previously shown interest in. As a general rule, similarity is measured using the distance between two pieces of text. A small distance indicates high similarity, and a high distance indicates low similarity. However, the news preferences of individuals are influenced by several factors including the context, the recent trends on social media, among others. It is important to keep in mind that news recommender systems face special challenges because items are rapidly changing, readers’ information is limited, and the relevance of articles is highly context-dependent [4]. As a result, there has been a growing interest in developing personalized news recommendation systems that can provide users with news articles that match their preferences and interests. One approach to building such systems is to leverage social media trends to identify the most relevant and popular news articles for a given user. Contextual information is another important factor in providing personalized news recommendations. Users’ preferences and reading habits can vary depending on their location, time of day, and other contextual factors. By incorporating contextual information, news recommender systems can personalize the recommendations for each user, taking into account their current context and preferences [5]. Capturing context and trends from users can be achieved in several ways, such as analyzing the content of the articles that users are clicking on, tracking the social media activity of users, using collaborative filtering to identify similar users based on their click behavior, and using contextual information such as the time of day, location, device, and user profile to personalize recommendations [5].

This paper proposes a context-aware personalized news recommendation system that incorporates contextual information to enhance the performance of the recommendation system. Specifically, click behavior is being investigated as a form of contextual information. For example, if a user frequently clicks on news articles related to technology during specific hours of the day or week, this information can be used to personalize the news recommendations for that user.

Our approach involves collecting, extracting, exploring, cleaning, and processing a large dataset of news articles from 19 distinct internet news sources, totaling 22,657 English pieces. Four different recommender systems were built using different techniques, including content-based methods such as TF-IDF, Bag-of-Words, and Word2Vec, and a collaborative filtering system based on click behavior. These models were evaluated using well-known performance measurements, including precision and recall, among others. To demonstrate the applicability of the model, a web interface was constructed, and we used RMSE and MAE to evaluate the performance of the collaborative filtering model. A comparative study was also carried out to assess the accuracy of different algorithms with different baseline methods.

The rest of the paper is organized as follows. Section 2 provides a background and review of related work on news recommender systems. In Sect. 3, we describe the methodology used to design and implement our Context-Aware News Recommender System. Section 4 presents the evaluation results of our experiments, including the performance of the four different recommender systems, as well as a comparison study of different algorithms with different baseline methods. In Sect. 5, we discuss the implications of our findings and highlight the importance of context and collaboration in personalizing news recommendations. Lastly, in Sect. 6, we conclude the paper by summarizing our main contributions and proposing avenues for future research in this area.

2 Background and Related Work

Machine learning algorithms are widely used in personalized recommendations to analyze user data, such as past purchases, browsing and search history, and clickstream data, to identify patterns and make predictions about what the user is likely to be interested in.

For more than a decade, researchers have studied recommendation systems in the news realm using a variety of methodologies [6, 7]. There are several types of machine learning algorithms that are commonly used in personalized news recommendation systems [8]:

  1. 1.

    Collaborative filtering: This algorithm analyzes the behavior of similar users to make recommendations for a given user, based on the assumption that users who behave similarly in the past are likely to behave similarly in the future [9]. Recommender systems can utilize item similarity in addition to user similarity (e.g., “Users who liked this item X also liked Y”) to perform collaborative filtering [10].

  2. 2.

    Content-based filtering: This algorithm analyzes the content of items, such as text, images, and videos, to make recommendations for a given user, based on the assumption that users who have shown interest in certain types of content are likely to be interested in similar content in the future [11,12,13,14]. The system builds the user’s profile by utilizing information created intentionally (by clicking “likes”) or implicitly (like clicking on links) to include the metadata of the things with which the user has engaged. The system becomes more accurate the more data it receives [15].

  3. 3.

    Hybrid systems: These systems combine multiple recommendation algorithms to provide better recommendations. For example, a hybrid system can combine collaborative filtering and content-based filtering to provide more accurate and diverse recommendations [16, 17].

These machine learning algorithms are often used in combination with other techniques, such as natural language processing, sentiment analysis, and social network analysis, to provide more accurate and personalized recommendations to users.

The field of recommender systems initially focused on news forums, as pioneered by Group Lens [18]. Several studies have explored news recommender systems, such as Liu et al. [19], who implemented and analyzed a hybrid recommender system on the news aggregator Google News. This model takes into account both content information and reading patterns of the community, extracting users’ “genuine” long-term interests by removing the community bias from their consumption behavior. The authors designed the model to consider short-term changes in users’ interests, as well as the location of the user and the reading behavior of the community in the past hour to predict the relevance of new articles. Their comparison against Das et al. [9], who implemented a collaborative filtering system that only considered logged-in users for testing, showed a 30% improvement in the hybrid model.

Some studies have focused on user challenges or building models to address user needs. For example, a study by [15] highlighted the most urgent difficulties in News Recommender Systems (NRS) that influence reader responses at different stages of the news suggestion life cycle. The most common challenge in NRS is user modeling, followed by timeliness, with news content quality as an emerging challenge. The authors presented different models for the timeliness challenge and found that the timeliness models may be unable to deal with dynamic user actions. They also discussed different user modeling methods and found that these models need to include not only users’ history, but also their short-term, seasonal, diverse, and sequential interests to be more effective. The authors also discussed the effects of news recommendations on readers’ behavior.

To assist readers in locating relevant news items and reduce information overload, a study by [4] suggested using a hybrid of content-based and collaborative filtering techniques to extract emotion features from news phrases as a supplement to TF-IDF characteristics. The Ekman model, which has six emotion types, was used to depict emotion in their method. This model was evaluated for over a month, and the system’s emotional impact was tracked. The results showed that the suggested system had a positive influence on users’ emotions, with an 11-fold increase in positive emotional responses compared to a simple news app.

Moreover, using deep reinforcement learning, a paper by [10] proposed a sequential recommendation framework to capture users’ short- and long-term interests, including integrating social news with micro-learning recommendations that can help users derive valuable outcomes from their online presence. This recommendation algorithm was integrated into an application that measured the ability of the user to learn, and the results were assessed, and feedback was incorporated to make the online experience more efficient.

In addition, the authors in [20] proposed a personalized news recommendation system that combines collaborative filtering-based and content-based filtering methods to improve the accuracy of news recommendation. The proposed system aims to address the issues of scalability due to a large news corpus and enriching the user’s profile. Overall, the paper demonstrates the effectiveness of the hybrid approach in improving the accuracy and diversity of news recommendations.

Current news recommendation techniques commonly rely on inferring user interest from their click behavior on news articles. For example, the authors in [21] suggested learning user representations from clicked news articles by assessing their similarities to candidate news. Alternatively, some methods employ more advanced techniques to capture diverse and multi-grained user interest in news. For instance, the study in [22] uses hierarchical user interest modeling, where each user is represented in a three-level hierarchical interest tree instead of a single user embedding. However, users may click on a news article due to being attracted to its title, but they may not find the article’s content satisfying after reading it. Therefore, click behaviors based only on news titles and modeling user interest without considering the article content may not necessarily reflect user interest. Table 1 lists advantages and disadvantages of related works in news recommender systems.

Table 1 Advantages and disadvantages of related works in news recommender systems

In this paper, we propose a news recommendation method that models user interest by incorporating information about the user's past interactions with articles (click behaviors) and information about the articles themselves. By considering both user behavior and article content, our proposed method aims to generate personalized recommendations that are tailored to the user’s interests and provide more accurate and relevant recommendations to users.

3 Methodology

Our methodology starts with collecting, extracting, exploring, cleaning, and processing the data. A large dataset of news articles was collected, and this entire process together makes up the data preprocessing stage. The data are usually represented by many features, such as title, text, link, author, and keywords; hence, the feature extraction process aims to extract the relevant features. The preprocessed data are then used as input to the proposed model.

We have built four different news recommender systems to evaluate their performance, using various techniques for building the models, including:

  • A content-based recommender system using the TF-IDF method.

  • A content-based recommender system using the Bag-of-Words method.

  • A content-based recommender system using the Word2Vec method.

  • A collaborative filtering recommender system based on click behavior.

The models are evaluated using well-known performance measurements, including precision and recall. To demonstrate the applicability of the model, we constructed a web interface and used RMSE and MAE to analyze the collaborative filtering model. We also conducted a comparison study to compare the accuracy of different algorithms with different baseline methods, including random and recency. In this section, we will describe each component in detail.

3.1 Data

We collected data from 19 popular online news sources, including Cable News Network (CNN),Footnote 1 British Broadcasting Corporation (BBC),Footnote 2 Slate,Footnote 3 Breitbart News,Footnote 4 Politico,Footnote 5 The Hill,Footnote 6 Canadian Broadcasting Corporation (CBC),Footnote 7 The Washington Post,Footnote 8 The Globe and Mail,Footnote 9 TechCrunch,Footnote 10 GameSpot,Footnote 11 Global News,Footnote 12 Toronto Star,Footnote 13 Channel NewsAsia (CNA),Footnote 14 National Public Radio (NPR),Footnote 15 The New Yorker,Footnote 16 The New York Times,Footnote 17 The Wall Street Journal (WSJ),Footnote 18 and The Week.Footnote 19 The news sources were selected based on their popularity and diversity in terms of their geographical location, political orientation, and coverage of different topics.

A Python script was written to automate the data collection process using a Python news article crawling and extraction library called Newspaper3k.Footnote 20 The dataset includes approximately 22,657 news articles, which were collected over a period of two months. The articles cover a wide range of domains and topics, including politics, sports, entertainment, health, and technology, among others. The distribution of articles across different domains and topics is roughly balanced, with the largest proportion of articles covering politics and international news. Table 2 shows the number of articles collected for each news source before cleaning.

Table 2 Number of collected data for each news source before cleaning

The data collected from news websites are often noisy and have different formats. We wrote a Python script to clean the data to ensure that it is in a structured format that can be analyzed by machine learning algorithms.

Initially, we discarded any row that contained empty text, summary, and keywords. We then filtered out all non-English content and removed any duplicated articles. After cleaning the data, the total number of collected articles decreased to 10,068. By applying these data cleaning techniques, we can improve the quality and relevance of the data and reduce the noise and redundancy in the data.

We then extracted features from the data that can be used to train machine learning models. These features include the news article ID, title, text, link, authors, and keywords. Table 3 presents the dataset’s columns and their corresponding meanings.

Table 3 Dataset’s columns and their corresponding meanings

3.2 Model Building

3.2.1 Content-Based Recommender System Using the TF-IDF Method

The TF-IDF (Term Frequency and Inverse Document Frequency) method is a commonly used technique for text analysis. It assigns weights to words in a document based on their frequency and importance in the corpus. The weight of a word is calculated by multiplying its term frequency (TF) (1) and inverse document frequency (IDF) (2). TF measures the frequency of the word in the document, while IDF measures the rarity of the word in the corpus. Using TF-IDF (3), we can give greater emphasis to less frequent words in the corpus, which can be useful in obtaining a more accurate vector representation of the data:

$$\mathrm{TF} (i,j) = \frac{\#\mathrm{ Times \; word \; i \; appears \; in \; document }\;j}{\#\mathrm{ Words \; in \; document }\; \, j}$$
(1)
$$\mathrm{IDF} (i,D) = \frac{{log}_{e} (\#\mathrm{documents \; in \, the \; corpus } \; D)}{\#\mathrm{Documents \; containing \; word }\; i}$$
(2)
$$\mathrm{Weight} \left(i,j\right)=\mathrm{ TF}(i,j) \times \mathrm{ IDF}(i,D)$$
(3)

A word’s TF-IDF value is higher if it occurs more frequently in a document than in other documents. We used the Euclidean distance to calculate the distance between vectors. The smaller the Euclidean distance between the considered vectors, the more similar their corresponding documents are. Conversely, a larger Euclidean distance between the vectors indicates less similarity between their corresponding documents.

We obtained the results using the TF-IDF model, as shown in Fig. 1, for the article “Forever Chemicals” which is found in Fast Food Wrappers.

Fig. 1
figure 1

Recommended articles using TF-IDF method

The figure displays the similarity scores of 10 articles to the target article “Forever Chemicals”, as calculated using the TF-IDF method. The articles are ranked in descending order based on their similarity scores, with the most similar article at the top and the least similar at the bottom.

3.2.2 Content-Based Recommender System Using the Bag-of-Words Method

The Bag-of-Words (BoW) method is a widely used approach in content-based recommendation systems that tracks the frequency of words in a document. In our case, each headline is considered an individual document, and the set of all headlines constitutes a corpus. With the BoW approach, each document is represented by a horizontal vector of d dimensions, where d refers to the number of distinct words in the corpus. This vocabulary represents the unique words in the corpus. It does not consider the arrangement of the words; all that matters is the count of the vectors.

To calculate the similarity between two documents, we used the Euclidean distance measure, which is defined as the square root of the sum of the squared differences in the word counts between two vectors. The Euclidean distance measure is commonly used in content-based recommendation systems to calculate the similarity between two documents based on their content.

The Euclidean distance measure ranges from 0 to infinity, where a value of 0 indicates that the two documents are identical, and higher values indicate that they are increasingly dissimilar.

Let p and q be two documents represented as vectors using the BoW method. Then, the Euclidean distance between p and q can be computed as

$$d\,(p,q) = \sqrt {\sum\limits_{i = 1}^{n} {(q_{i} - p_{i} )^{2} } }$$
(4)

To recommend articles to a user based on their reading history, we computed the similarity score between the user’s reading history and each article in the corpus using the Euclidean distance measure. The articles were then ranked in ascending order of similarity scores, with the most similar article at the bottom and the least similar at the top.

Figure 2 shows the recommended articles for a user who read the article “Forever Chemicals” using the Bag-of-Words (BoW) method. The articles are ranked in descending order of similarity scores, with the most similar article at the top and the least similar at the bottom. The figure lists the title and similarity score of each recommended article.

Fig. 2
figure 2

Recommended articles using the BoW method

3.2.3 Content-Based Recommender System-Based Using Word2Vec Embedding

The Content-Based Recommender System Using Word2Vec Embedding is a widely used approach in natural language processing (NLP) for evaluating and categorizing semantic similarities between words based on their distributional features in vast samples of language data. In 2013, Google invented the Word2Vec technique for calculating semantic similarity. Word2Vec is a single-layer neural network model that learns word associations from a vast literature corpus to produce word embeddings that efficiently convey semantic meaning and relationships.

To represent each headline as a vector, we used the average Word2Vec model, which involves adding the vector representation of all the available words in the headline and then calculating the average. This approach allows us to represent each headline as a dense vector in a high-dimensional space, where the distance between vectors reflects the semantic similarity between headlines.

During training, Word2Vec observes patterns and represents each word using a d-dimensional vector. The larger the corpus, the better the results are. Due to the small size of our corpus, we used Google’s 1.5 GB pre-trained model on Google News articles. This standard model, which has millions of new articles, contains vector representations for billions of words, with approximately 300-dimensional dense vectors representing each word.

We created three systems using this model. The first system recommends articles based on their titles only. The second system recommends articles based on their titles and categories, which provides additional contextual information to enhance personalization. The third system recommends articles based on their titles, categories, and authors, which further enhances personalization by incorporating information about the authors of the articles.

Overall, our approach to content-based recommendation using Word2Vec Embedding provides an effective and scalable method for evaluating and categorizing semantic similarities between headlines. By leveraging the power of Word2Vec and incorporating additional contextual information, we can provide highly personalized recommendations to users based on their reading history and preferences.

3.2.3.1 Recommending Articles Based on Article Title

Figure 3 shows the results of using the Word2Vec embedding model to recommend articles based on their title only. The “Title” column lists the titles of ten articles, and the “Similarity” column shows a score indicating the similarity of each title to the original article title used as a query. The higher the score, the more similar the title is to the original query title.

Fig. 3
figure 3

Recommended articles based on similarity scores of article titles using Word2Vec Model

3.2.3.2 Recommending Articles Based on Article Title and Category

To evaluate the similarity between article titles and categories, we used one-hot encoding to obtain feature representation for categories. During the recommendation process, we assigned higher weights to certain categories to increase their significance. Conversely, features with lower weights were given less significance. The updated function now includes new arguments for weightage of title similarity and category. In Fig. 4, we present the results of the model using the article headline “Forever chemicals” found in fast food wrappers and the article category of “Food”. The figure lists recommended articles along with their respective similarity scores, weighted similarity scores, category similarity scores, and category names.

Fig. 4
figure 4

Recommended articles based on similarity scores of article titles and categories using Word2Vec model

3.2.3.3 Recommending Articles Based on Article Title, Category, and Author

To improve the accuracy of article recommendations, we also used one-hot encoding for authors. During the recommendation process, we assigned higher weights to certain categories and authors to increase their significance. Conversely, features with lower weights were given less significance.

The updated function now includes new arguments for weightage of title similarity, category, and author. In Fig. 5, we present the results of the model using the article headline “4 Arrested After Men’s Dismembered Bodies Are Found In Pond,” the article category of “BLACK VOICES,” and the article author, David Lohr.

Fig. 5
figure 5

Recommended Articles based on similarity scores of article titles, categories, and authors using Word2Vec model

3.2.4 Collaborative-Based Recommender System Using Click Behavior

The Collaborative-Based Recommender System using click behavior is a widely used approach in news recommendation systems that leverages user click behavior to identify patterns and make personalized recommendations. To build the model, we created a simulated dataset that stored user IDs and the articles they visited, capturing the users’ browsing behavior on the website. Figure 6 provides a sample of the session dataset used to build the model, showing the article ID and user ID. This simulated dataset was generated based on a clickstream dataset obtained from a publicly available source.

Fig. 6
figure 6

Sample of session dataset showing article IDs and user IDs

To generate the session dataset used to build the model, we used the clickstream dataset to calculate pairwise correlations between the users' clickstream and the news articles. This allowed the model to determine if the articles were similar based on highly correlated clickstream distributions among users.

Specifically, we used the Jaccard similarity coefficient to measure the similarity between the clickstream behavior of the users and the articles they visited.

The Jaccard similarity coefficient is a widely used measure of similarity between sets and is defined as the size of the intersection divided by the size of the union of two sets.

Let A and B be two sets of users who clicked on two different articles, then the Jaccard similarity coefficient between A and B can be calculated as follows:

$${\text{J}}\left( {{\text{A}},{\text{B}}} \right) \, = \, |{\text{A}} \cap {\text{B}}\left| { \, / \, } \right|{\text{A}} \cup {\text{B}}|$$
(5)

A high Jaccard similarity coefficient indicated that the two articles were frequently clicked by the same users, suggesting that they were related in some way. The results of the model are presented in Fig. 7, which showed the article ID, title, and correlation score. The correlation score was a numerical value between − 1 and 1 that measured the strength of the relationship between two variables. In this case, it measured the similarity between the clickstream behavior of the users and the articles they visited. Figure 7 presents the results of the model specifically for the article “Forever chemicals”.

Fig. 7
figure 7

Recommended articles based on clickstream behavior of users method

To recommend articles to a user based on their click behavior, we used the correlation scores to identify the articles that were most similar to the articles the user had clicked on. We then ranked the similar articles in descending order of correlation scores, with the most similar article at the top and the least similar at the bottom. This approach allows us to provide highly personalized recommendations to users based on their click behavior.

Overall, using pairwise correlations between user clickstream behavior and news articles, we were able to simulate realistic user behavior and generate a simulated dataset for training and evaluating the recommendation system.

4 Evaluation and Results

We evaluated the models using two metrics: precision and recall. Precision measures the proportion of recommended articles that are relevant to the user, while recall measures the proportion of relevant articles that are recommended to the user. To demonstrate the applicability of the model, a web interface was constructed, and RMSE and MAE were used to analyze the collaborative filtering model. In addition, a comparison study was conducted to compare the accuracy of different algorithms with different baseline methods, including random and recency.

4.1 Content-Based Recommender System Using the TF-IDF method

Figure 1 shows the results of the TF-IDF method for the target article “Forever Chemicals”. The articles are ranked based on their similarity scores, with the most similar article having the highest score. The results suggest that the TF-IDF method recommends a diverse set of articles with similarity scores ranging from 1.27 to 1.31. The titles of the recommended articles cover a variety of topics, such as politics, entertainment, environment, and current events. Some of the titles, such as “The Future of Food Courts According to Food Junction” and “10 Years Ago, Screenwriters Went on Strike and Changed Television Forever,” are somewhat related to the original article's topic, while others, such as “Donald Trump Taunts Joe Biden” and “Killer Mike Tried to Call Out Joy Reid,” are not directly related.

It is worth noting that the TF-IDF method takes into account the importance of rare words in the corpus, which may explain why some of the recommended articles have a relatively low similarity score despite having some common keywords with the target article. Overall, the results indicate that the TF-IDF method can recommend articles that cover a diverse range of topics but still have some degree of relevance to the original article.

4.2 Content-Based Recommender System Using the Bag-of-Words Method

Figure 2 shows the recommended articles using the Bag-of-Words (BoW) method for a user who read the article “Forever Chemicals”. The articles are ranked based on their similarity scores, with the most similar article having the highest score. The results suggest that the BoW method recommends a diverse set of articles with similarity scores ranging from 2.83 to 3.00. The titles of the recommended articles cover a variety of topics, such as parenting, immigration, politics, and entertainment. Some of the titles, such as “Why Are We So Hard on Moms?” and “All They Will Call You Will Be Deportees,” appear to be related to social issues, while others, such as “Finally, Scientists Have Found a True Millipede” and “What to Expect from the 2022 Oscars,” are more lighthearted.

Overall, the results indicate that the BoW method can recommend articles that are diverse in topic but still have some degree of relevance to the original article. However, it's worth noting that the method may not always provide the most accurate or relevant recommendations, as it relies solely on the frequency of words in the articles and does not take into account other factors such as context or user preferences.

4.3 Content-Based Recommender System-Based Using Word2Vec Embedding

The Word2Vec embedding model has been used to build a content-based recommender system for recommending articles based on their titles, categories, and authors. The model has been trained on a vast literature corpus to learn word associations and produce word embeddings that convey semantic meaning and relationships. The model provides a vector representation for each word, which is used to calculate the distance between titles and recommend articles that are semantically similar to the query article.

The results of the evaluation show that the model performs well in recommending articles based on the similarity of their titles. The recommended articles listed in Fig. 3 have high similarity scores, indicating that they are semantically similar to the original query title.

The model has also been extended to recommend articles based on their titles and categories. The results in Fig. 4 show that the model performs well in recommending articles that are semantically similar to the query title and belong to the same category as the query article. The weighted similarity scores take into account the significance of the categories, which increases the accuracy of the recommendations.

Overall, the Word2Vec embedding model is an effective approach for building a content-based recommender system that recommends articles based on their semantic meaning and relationships. The model can be further improved by incorporating more features such as authors, keywords, and text summary. However, the performance of the model heavily depends on the quality and size of the corpus used for training, which can affect the accuracy and generalization of the recommendations.

4.4 Collaborative-Based Recommender System Using Click Behavior

The Collaborative-Based Recommender System using click behavior has been evaluated using Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics. The evaluation has been performed on five splits of the data, and the results are presented in Table 4. The mean and standard deviation of Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and computation times for fitting and testing the model are shown. The MAE and RMSE metrics are used to evaluate the accuracy of the model in predicting the ratings of the test data, with lower values indicating better performance.

Table 4 Evaluation results of Collaborative-Based Recommender System using click behavior

The RMSE values range from 0.0359 to 0.0369, with a mean of 0.0364 and a standard deviation of 0.0004. The lower the RMSE, the better the performance of the model in predicting the ratings of the test data is. The RMSE values obtained in this evaluation indicate that the Collaborative-Based Recommender System is performing well and can accurately predict the ratings of the test data. The MAE values range from 0.0249 to 0.0254, with a mean of 0.0252 and a standard deviation of 0.0002. The lower the MAE, the better the performance of the model in predicting the ratings of the test data is. The MAE values obtained in this evaluation indicate that the Collaborative-Based Recommender System is performing well and can accurately predict the ratings of the test data. The Fit time and Test time columns in Table 4 show the time taken by the model to fit the training data and predict the ratings for the test data, respectively. The Fit time values range from 8.78 to 10.04 s, with a mean of 9.06 s and a standard deviation of 0.49 s. The Test time values range from 0.29 to 0.33 s, with a mean of 0.31 s and a standard deviation of 0.01 s. These values indicate that the Collaborative-Based Recommender System can make accurate predictions in a reasonable amount of time. Overall, the results of the evaluation show that the Collaborative-Based Recommender System using click behavior is an effective approach for recommending news articles based on the browsing history of similar users. The system can accurately predict the ratings of the test data and can make predictions in a reasonable amount of time. However, the performance of the system heavily depends on the quality and quantity of the browsing history dataset and may not be suitable for systems that do not have access to such data.

4.5 Comparison Study and User Feedback

To evaluate the effectiveness of our models, we used a combination of standard comparison metrics and user feedback to evaluate the performance of the proposed news recommendation algorithms [8]. The metrics used allowed us to quantitatively measure the recommendation performance and evaluate the real-time user experience. We were inspired by the work of [23] and compared our models to two baseline approaches. Specifically, we evaluated four models, including the Collaborative Filtering model based on click behavior, three content-based models (TF-IDF model, Bag-of-Words (BoW), and Word2Vec embedding). We also evaluated two baseline approaches, including the Random Recommender and the Recency-Based Recommender. The Random recommender generates K random recommendations. The Recency-based recommender ranks items based on their recency and returns the top K most recent items. These baselines allow us to evaluate the performance of the proposed models relative to simpler methods and can serve as a benchmark for future research in this domain. The Precision and Recall scores for all the recommender systems are presented in Fig. 8.

Fig. 8
figure 8

Precision and recall values for evaluated models and baseline approaches in the Context-Aware News Recommender System

As the Random Recommender and Recency-Based Recommender do not use a relevance score, the recall values are not applicable and were not reported. The figure allows for easy comparison of the performance of the different models on both precision and recall metrics.

The evaluation results show that the collaborative filtering model based on click behavior outperforms the baseline approaches and the content-based models. The collaborative filtering model uses information about the user’s past interactions with articles, as well as information about the articles themselves, to generate personalized recommendations that are tailored to the user’s interests. Compared to the other models, the collaborative filtering model has the highest precision, indicating that it is the most effective approach for recommending news articles based on the browsing history of similar users. However, the model is limited to the availability and quality of the browsing history dataset and may not be suitable for systems that do not have access to such data.

The content-based recommender systems, including the Word2Vec Embedding model and the BoW model, also outperform the baseline approaches and the TF-IDF model in terms of precision but are not as good as the collaborative filtering model. The BoW model is a simple and effective approach for content-based recommendation systems that prioritize the similarity of the content. However, it may not be as effective in capturing the semantic meaning of the text as the Word2Vec embedding model, which can capture the semantic meaning of the text and provide more context for the recommendations by including the categories and authors of the articles.

Compared to the other models, the TF-IDF model has the lowest precision. This could be because the model only considers the frequency of the words and does not take into account the semantic meaning of the text or the context of the articles.

The Random Recommender and the Recency-Based Recommender have very low precision values, indicating that they are not useful for recommending news articles. The Random Recommender generates completely random recommendations while the Recency-Based Recommender only considers the recency of the articles and ignores other factors such as the relevance or the quality of the articles. The Recency-Based Recommender performs slightly better than the Random Recommender, but the precision values are still very low, indicating that the recommendations are not very useful.

Traditional offline measures such as MAE and RMSE can evaluate the accuracy of the predicted ratings, but they do not capture the user’s subjective experience. User feedback, on the other hand, provides valuable insights into the effectiveness of the recommended news articles in meeting the user’s needs, preferences, and interests [8]. Therefore, we incorporated user feedback into the evaluation process to assess the effectiveness of the proposed news recommendation algorithm in improving user engagement and satisfaction.

We gathered input from 55 individuals who used the system and read the recommended news articles. After reading the articles, users were asked to submit feedback on their experience, which was divided into three categories: positive, negative, and neutral. Table 5 shows the user feedback for the proposed news recommendation algorithm.

Table 5 User feedback for the proposed news recommendation algorithm

The results show that 42 users gave positive feedback, 6 gave negative feedback, and 7 provided neutral feedback. The significant number of favorable responses suggests that the algorithm was effective in recommending relevant news articles to users. The negative feedback highlights areas for enhancement, while the neutral feedback suggests that some users found the recommended content uninteresting, implying that the recommendation process can be further improved.

We examined the feedback comments to gain a deeper understanding of the positive and negative feedback, identifying several emerging themes. The positive feedback themes include relevance, in which many users found the recommended articles to be relevant to their interests and preferences, diversity, where users appreciated the variety of news sources and topics covered by the recommended articles, and user interface, where users found the system easy to use and navigate. Conversely, negative feedback themes were article quality, with some users finding the recommended articles’ quality lacking, personalization, as some users felt the articles were not tailored to their individual interests and preferences, and technical issues, with a few users encountering technical problems while using the system.

These results suggest that the algorithm performed well overall, but there are areas for improvement. Negative feedback regarding article quality might be related to the data sources used for generating recommendations. In addition, using actual user click data instead of simulated data could help address personalization concerns. Enhancements to the system’s infrastructure and user interface design may also be helpful in resolving technical issues and enhancing the user experience.

In summary, these findings offer valuable insights into the proposed news recommendation algorithm’s strengths and weaknesses and suggest future research directions to improve news recommendation effectiveness.

5 Discussion

The context-aware personalized news recommendation system developed in this paper exhibits a high level of adaptiveness and robustness. By incorporating contextual information such as users’ click behavior, the system is able to dynamically adjust its recommendations to match the evolving preferences and interests of individual users. Moreover, the use of multiple recommendation techniques, including content-based and collaborative filtering methods, enhances the robustness of the system, ensuring that it can provide accurate and relevant recommendations even in the face of changing trends and user behavior. Overall, the system’s adaptiveness and robustness make it a valuable tool for helping users to discover relevant and interesting news articles in a rapidly changing media landscape.

All four models have their advantages and limitations. The content-based recommender systems can recommend articles based on the similarity of their content, but they do not take into account other factors such as user preferences or browsing history. The collaborative filtering model based on click behavior can recommend articles based on the preferences and browsing history of similar users, but it is limited to the availability and quality of the user-item matrix. The BoW model is a simple and effective approach for content-based recommendation systems. However, it has some limitations. For example, the BoW model does not consider the order of the words or the context in which they appear in the text. Compared to the other models, the content-based recommender system based on Word2Vec embedding may be more effective at capturing the semantic meaning of the text. By including categories and authors, the model can also provide more context and improve the accuracy of the recommendation system. Among the four models, the only one that considers user behavior is the collaborative filtering model based on click behavior, which has a low RMSE and MAE with a mean MAE of 0.0252 and a mean RMSE of 0.0364. The system can accurately predict the ratings of the test data and can make predictions in a reasonable amount of time.

Overall, the results suggest that a collaborative filtering approach based on click behavior is the most effective approach for the Context-Aware News Recommender System. This approach is able to generate personalized recommendations that are tailored to the user’s interests and is able to recommend a high proportion of relevant articles. The Content-Based models and Word2Vec Embedding model can also be useful in combination with the collaborative filtering approach to generate a wider range of recommendations. The Random and Recency Recommenders are not effective and should not be used in the system. An effective recommender system is crucial for recommending news articles that are relevant to the user’s interests and can improve user engagement and satisfaction.

Although our proposed approach shows promise in improving the personalization of news recommendations, there are several limitations that should be addressed in future research. One limitation is that our approach relies on simulated data, since obtaining real-world user click behavior data was challenging due to privacy concerns. Future work should explore alternative methods for obtaining real-world user data, such as through user studies or collaborations with news websites that are willing to share their clickstream data. In addition, user privacy concerns could be addressed with privacy-preserving techniques, such as differential privacy or federated learning, that allow for the analysis of user behavior data without compromising individual privacy.

Another limitation is that our approach does not account for the diversity of news articles that users may be interested in. Future research could explore the use of diversity-aware recommendation approaches that explicitly consider the diversity of recommended items.

Furthermore, future research could explore the use of additional contextual information to further improve the personalization of news recommendations. For example, incorporating users' social media activity or search history could provide additional insights into their interests and preferences. Similarly, incorporating more detailed user profiles, such as demographic information or reading habits, could enable more fine-grained personalization of news recommendations. Hybrid recommendation systems that combine different approaches, such as content-based and collaborative filtering methods, could also potentially improve the performance of personalized news recommendation systems by leveraging the strengths of different approaches.

The study findings contribute to the growing body of research on personalized recommendation systems, which has been increasingly studied in various domains. By studying the effectiveness of personalized news recommendation systems, our study provides insights that may be applicable to other domains, including academic recommendation systems. In particular, academic recommendation systems, which aim to provide personalized recommendations of scholarly articles, have received much attention in recent years. Similar to news recommendation systems, academic recommendation systems face the challenge of recommending relevant and high-quality articles to users based on their interests and preferences [21, 22]. While there are some differences in the types of data and features used in academic recommendation systems compared to news recommendation systems, many of the same techniques and approaches, such as content-based and collaborative filtering methods, have been explored in both domains [23]. By studying the effectiveness of personalized news recommendation systems, our study provides insights that may be applicable to other domains, including academic recommendation systems.

Finally, it is worth noting that GPT-based models, such as GPT-3, are gaining popularity in the field of natural language processing and could be applied to news recommendation systems to improve the quality and personalization of recommendations. GPT-based models could potentially be used to summarize news articles or extract key topics and concepts, which could then be used to inform content-based or collaborative filtering recommendation approaches. In addition, GPT-based models could potentially be used to generate personalized news summaries or even generate news articles tailored to users' interests and preferences. While GPT-based models are not without their limitations, they represent a promising direction for future research in the field of personalized news recommendation systems.

6 Conclusion

This paper has presented a context-aware personalized news recommendation system that incorporates contextual information to enhance the personalization of news recommendations. Our approach involved processing a large dataset of 22,657 news articles from 19 popular online news sources and developing four different recommender systems based on content-based methods (TF-IDF, Bag-of-Words, and Word2Vec) and collaborative filtering based on click behavior. The evaluation results demonstrated that incorporating contextual information and collaborative filtering can significantly improve the personalization of news recommendations, with the collaborative filtering model based on click behavior achieving the best performance.

The study contributes to the body of knowledge on news recommender systems by emphasizing the importance of contextual information and collaboration in personalizing news recommendations.