Exploring demographic information in social media for product recommendation

Zhao, Wayne Xin; Li, Sui; He, Yulan; Wang, Liwei; Wen, Ji-Rong; Li, Xiaoming

doi:10.1007/s10115-015-0897-5

Exploring demographic information in social media for product recommendation

Regular Paper
Published: 23 October 2015

Volume 49, pages 61–89, (2016)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Wayne Xin Zhao^1,2,
Sui Li⁴,
Yulan He³,
Liwei Wang⁴,
Ji-Rong Wen^1,2 &
…
Xiaoming Li⁴

2866 Accesses
42 Citations
3 Altmetric
Explore all metrics

Abstract

In many e-commerce Web sites, product recommendation is essential to improve user experience and boost sales. Most existing product recommender systems rely on historical transaction records or Web-site-browsing history of consumers in order to accurately predict online users’ preferences for product recommendation. As such, they are constrained by limited information available on specific e-commerce Web sites. With the prolific use of social media platforms, it now becomes possible to extract product demographics from online product reviews and social networks built from microblogs. Moreover, users’ public profiles available on social media often reveal their demographic attributes such as age, gender, and education. In this paper, we propose to leverage the demographic information of both products and users extracted from social media for product recommendation. In specific, we frame recommendation as a learning to rank problem which takes as input the features derived from both product and user demographics. An ensemble method based on the gradient-boosting regression trees is extended to make it suitable for our recommendation task. We have conducted extensive experiments to obtain both quantitative and qualitative evaluation results. Moreover, we have also conducted a user study to gauge the performance of our proposed recommender system in a real-world deployment. All the results show that our system is more effective in generating recommendation results better matching users’ preferences than the competitive baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence in recommender systems

Article Open access 01 November 2020

Qian Zhang, Jie Lu & Yaochu Jin

The Use of Influencers in Social Media Marketing

A systematic review and research perspective on recommender systems

Article Open access 03 May 2022

Deepjyoti Roy & Mala Dutta

Notes

http://www.brandwatch.com/wp-content/uploads/2013/02/Twitter-Landscape-2013-Extended-Version.
http://weibo.com.
http://jd.com.
http://sewm.pku.edu.cn/metis.
http://162.105.205.253:8667/metisrecommendation/special_en/.
http://www.ehow.com/info_10015346_product-demographic.html.
Given an attribute, we collect all the unique values filled in by users in our data collection, and only keep the values with high population. We further manually group similar values. Furthermore, we discretized attribute values based on the customer segmentation [11] (chapter five) in marketing and ensured balanced distribution probabilities over different values across different discretization intervals.
This will make \(\phi ^{(u,a)}\) no longer a valid probability distribution. But as will be shown later, it does not affect the construction of demographic feature vectors.
For example, we can sum the corresponding demographic-based probabilities for each attribute: User 1 will be assigned to a value of 2.52 by having \(1\times 1 + 0.9 \times 1 + 0.7\times 0.8 + 0.3\times 0.2\), while similarly user 2 will be assigned to a value of 1.44.
We distinguish normal users from spam users using the following three conditions: (1) an normal user should have a balanced number of tweets and retweets; (2) a normal user should not include any keywords relating to products or brands in her the nickname or profile description. (3) A normal user should not publish many tweets containing keywords relating products or brands.
To be more specific, the values of y are needed to be given in training, while in test we obtain the values of y by using the predicted output from the learnt ranking function f, and an item with a larger value for y will be ranked in a higher position, i.e., of more importance for recommendation.
On Sina Weibo, all the tweets from a user can be publicly seen by other registered users. The judges log into their own Weibo accounts and check the validity of each candidate query–product pair online. Each user’s public profile of a user is also checked and spam users are removed. The workload for each judge is about 5–7 times the number of qd pairs in Table 2, i.e., only 1/7–1/5 of the originally detected qd pairs are finally kept as training data.
https://sourceforge.net/p/lemur/wiki/RankLib/. RankLib might assign equal scores to items during ranking. In this case, we further sort the items of equal scores by their sales volume.
For the listwise approach, each training instance is an ordered list. However, the relative order between non-relevant products is not possible to obtain in our training data.
Balanced interleaving method reflects the intuition that the results of the two rankings A and B should be interleaved into a single ranking I in a balanced way, which ensures that any top k results in I always contain the top \(k_a\) results from A and the top \(k_b\) results from B, where \(k_a\) and \(k_b\) differ by at most 1.

References

Wang J, Zhang Y (2013) Opportunity model for e-commerce recommendation: right product; right time. In: Ser. SIGIR ’13
von Reischach F, Michahelles F, Schmidt A (2009) The design space of ubiquitous product recommendation systems. In: Ser. MUM ’09
Giering M (2008) Retail sales prediction and item recommendations using customer demographics at store level. SIGKDD Explor Newsl 10(2):84–89
Article Google Scholar
Xiao B, Benbasat I (2007) E-commerce product recommendation agents: use, characteristics, and impact. MIS Q 31:137–209
Google Scholar
Linden G, Smith B, York J (2003) Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput 7(1):76–80
Article Google Scholar
Hollerit B, Kröll M, Strohmaier M (2013) Towards linking buyers and sellers: detecting commercial intent on twitter. In: Ser. WWW ’13 companion
Zhao X-W, Guo Y, He Y, Jiang H, Wu Y, Li X (2014) We know what you want to buy: a demographic-based system for product recommendation on microblogs. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’14, 2014, pp 1935–1944
Baker M, Hart S (2007) The marketing book, 6th edn. Routledge, London
Google Scholar
Sridhar G (2007) Consumer involvement in product choice–a demographic analysis. XIMB J Manag 3:131–148
Google Scholar
Zeithaml VA (1985) The new demographics and market fragmentation. J Mark 49:64–75
Article Google Scholar
Tsiptsis K, Chorianopoulos A (2010) Data mining techniques in CRM: inside customer segmentation. Wiley, London
Book Google Scholar
Dong Y, Yang Y, Tang J, Yang Y, Chawla N-V (2014) Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’14, 2014, pp 15–24
Mislove A, Viswanath B, Gummadi K-P, Druschel P (2010) You are who you know: inferring user profiles in online social networks. In: Ser. WSDM ’10
Bi B, Shokouhi M, Kosinski M, Graepel T (2013) Inferring the demographics of search users: social data meets search queries. In: Ser. WWW ’13
Zou B, Zhou G, Zhu Q (2014) Negation focus identification with contextual discourse information. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (vol 1: long papers). Association for Computational Linguistics, Baltimore, Maryland, pp 522–530
(2012) US demographic and business summary data. Product guide
Zhai C, Lafferty JD (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst 22(2):179–214
Article Google Scholar
Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Ser. ACL ’04
Liu T-Y (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331
Article Google Scholar
Turney P-D (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics, ser. ACL ’02, 2002, pp 417–424
Ganjisaffar Y, Caruana R, Lopes C-V (2011) Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, ser. SIGIR ’11, 2011, pp 85–94
Zhang H, Riedl E, Petrushin V-A, Pal S, Spoelstra J (2012) Committee based prediction system for recommendation: KDD cup 2011, track2. In: Proceedings of KDD cup 2011 competition, San Diego, CA, USA, 2011, pp 215–229
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, Monterey
MATH Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Article MathSciNet MATH Google Scholar
Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Article MathSciNet MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MathSciNet MATH Google Scholar
Ho TK, Hull JJ, Srihari SN (1994) Decision combination in multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 16(1):66–75
Article Google Scholar
Joachims T (2006) Training linear svms in linear time. In Ser. KDD ’06
Freund Y, Iyer R, Schapire R-E, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
MathSciNet MATH Google Scholar
Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In Ser. ICML ’07
Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Ser. SIGIR ’07
Weng J, Lim E-P, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: WSDM
Chapelle O, Joachims T, Radlinski F, Yue Y (2012) Large-scale validation and analysis of interleaved search evaluation. ACM Trans Inf Syst 30(1):6:1–6:41
Article Google Scholar
Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Ser. WWW ’01
Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749
Symeonidis P, Tiakas E, Manolopoulos Y (2011) Product recommendation and rating prediction based on multi-modal social networks. In: Ser. RecSys ’11
Korfiatis N, Poulos M (2013) Using online consumer reviews as a source for demographic recommendations: a case study using online travel reviews. Expert Syst Appl 40(14):5507–5515
Article Google Scholar
Qiu L, Benbasat I (2010) A study of demographic embodiments of product recommendation agents in electronic commerce. Int J Hum Comput Stud 68(10):669–688
Article Google Scholar
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Article Google Scholar
Liu Y, Huang J, An A, Yu X (2007) ARSA: a sentiment-aware model for predicting sales performance using blogs. In: SIGIR
McGlohon M, Glance NS, Reiter Z (2010) Star quality: aggregating reviews to rank products and merchants. In: ICWSM
Ganu G, Kakodkar Y, Marian A (2013) Improving the quality of predictions using textual information in online user reviews. Inf Syst 38(1):1–15
Article Google Scholar
Zhang Y, Lai G, Zhang M, Zhang Y, Liu Y, Ma S (2014) Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In: SIGIR
Zhang Y, Zhang H, Zhang M, Liu Y, Ma S (2014) Do users rate or review? Boost phrase-level sentiment labeling with review-level sentiment classification. In: SIGIR
Pazzani M-J (1999) A framework for collaborative, content-based and demographic filtering. Artif Intell Rev 13(5–6):393–408
Article Google Scholar
Seroussi Y, Bohnert F, Zukerman I (2011) Personalised rating prediction for new users using latent factor models. In: ACM HH
Dai HK, Zhao L, Nie Z, Wen J-R, Wang L, Li Y (2006) Detecting online commercial intention (oci). In: WWW ’06

Download references

Acknowledgments

The authors thank the anonymous reviewers for their valuable and constructive comments. The work was partially supported by National Natural Science Foundation of China under Grant Nos. 61502502 and 61573026, the pilot project under Baidu open cloud service platform under Grant No. 4333150064, and the National Key Basic Research Program (973 Program) of China under Grant No. 2014CB340403. Xin Zhao was also partially supported by 2015 HTC Young Scholar Program.

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, China
Wayne Xin Zhao & Ji-Rong Wen
Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China
Wayne Xin Zhao & Ji-Rong Wen
School of Engineering and Applied Science, Aston University, Birmingham, UK
Yulan He
School of Electronics Engineering and Computer Sciences, Peking University, Beijing, China
Sui Li, Liwei Wang & Xiaoming Li

Authors

Wayne Xin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Sui Li
View author publications
You can also search for this author in PubMed Google Scholar
Yulan He
View author publications
You can also search for this author in PubMed Google Scholar
Liwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Rong Wen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wayne Xin Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, W.X., Li, S., He, Y. et al. Exploring demographic information in social media for product recommendation. Knowl Inf Syst 49, 61–89 (2016). https://doi.org/10.1007/s10115-015-0897-5

Download citation

Received: 24 March 2015
Revised: 19 September 2015
Accepted: 10 October 2015
Published: 23 October 2015
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10115-015-0897-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring demographic information in social media for product recommendation

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

The Use of Influencers in Social Media Marketing

A systematic review and research perspective on recommender systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring demographic information in social media for product recommendation

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in recommender systems

The Use of Influencers in Social Media Marketing

A systematic review and research perspective on recommender systems

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation