Knowledge and Information Systems

, Volume 49, Issue 1, pp 61–89 | Cite as

Exploring demographic information in social media for product recommendation

  • Wayne Xin Zhao
  • Sui Li
  • Yulan He
  • Liwei Wang
  • Ji-Rong Wen
  • Xiaoming Li
Regular Paper


In many e-commerce Web sites, product recommendation is essential to improve user experience and boost sales. Most existing product recommender systems rely on historical transaction records or Web-site-browsing history of consumers in order to accurately predict online users’ preferences for product recommendation. As such, they are constrained by limited information available on specific e-commerce Web sites. With the prolific use of social media platforms, it now becomes possible to extract product demographics from online product reviews and social networks built from microblogs. Moreover, users’ public profiles available on social media often reveal their demographic attributes such as age, gender, and education. In this paper, we propose to leverage the demographic information of both products and users extracted from social media for product recommendation. In specific, we frame recommendation as a learning to rank problem which takes as input the features derived from both product and user demographics. An ensemble method based on the gradient-boosting regression trees is extended to make it suitable for our recommendation task. We have conducted extensive experiments to obtain both quantitative and qualitative evaluation results. Moreover, we have also conducted a user study to gauge the performance of our proposed recommender system in a real-world deployment. All the results show that our system is more effective in generating recommendation results better matching users’ preferences than the competitive baselines.


E-commerce Product recommendation Product demographic Social media 



The authors thank the anonymous reviewers for their valuable and constructive comments. The work was partially supported by National Natural Science Foundation of China under Grant Nos. 61502502 and 61573026, the pilot project under Baidu open cloud service platform under Grant No. 4333150064, and the National Key Basic Research Program (973 Program) of China under Grant No. 2014CB340403. Xin Zhao was also partially supported by 2015 HTC Young Scholar Program.


  1. 1.
    Wang J, Zhang Y (2013) Opportunity model for e-commerce recommendation: right product; right time. In: Ser. SIGIR ’13Google Scholar
  2. 2.
    von Reischach F, Michahelles F, Schmidt A (2009) The design space of ubiquitous product recommendation systems. In: Ser. MUM ’09Google Scholar
  3. 3.
    Giering M (2008) Retail sales prediction and item recommendations using customer demographics at store level. SIGKDD Explor Newsl 10(2):84–89CrossRefGoogle Scholar
  4. 4.
    Xiao B, Benbasat I (2007) E-commerce product recommendation agents: use, characteristics, and impact. MIS Q 31:137–209Google Scholar
  5. 5.
    Linden G, Smith B, York J (2003) recommendations: item-to-item collaborative filtering. IEEE Internet Comput 7(1):76–80CrossRefGoogle Scholar
  6. 6.
    Hollerit B, Kröll M, Strohmaier M (2013) Towards linking buyers and sellers: detecting commercial intent on twitter. In: Ser. WWW ’13 companionGoogle Scholar
  7. 7.
    Zhao X-W, Guo Y, He Y, Jiang H, Wu Y, Li X (2014) We know what you want to buy: a demographic-based system for product recommendation on microblogs. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’14, 2014, pp 1935–1944Google Scholar
  8. 8.
    Baker M, Hart S (2007) The marketing book, 6th edn. Routledge, LondonGoogle Scholar
  9. 9.
    Sridhar G (2007) Consumer involvement in product choice–a demographic analysis. XIMB J Manag 3:131–148Google Scholar
  10. 10.
    Zeithaml VA (1985) The new demographics and market fragmentation. J Mark 49:64–75CrossRefGoogle Scholar
  11. 11.
    Tsiptsis K, Chorianopoulos A (2010) Data mining techniques in CRM: inside customer segmentation. Wiley, LondonCrossRefGoogle Scholar
  12. 12.
    Dong Y, Yang Y, Tang J, Yang Y, Chawla N-V (2014) Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’14, 2014, pp 15–24Google Scholar
  13. 13.
    Mislove A, Viswanath B, Gummadi K-P, Druschel P (2010) You are who you know: inferring user profiles in online social networks. In: Ser. WSDM ’10Google Scholar
  14. 14.
    Bi B, Shokouhi M, Kosinski M, Graepel T (2013) Inferring the demographics of search users: social data meets search queries. In: Ser. WWW ’13Google Scholar
  15. 15.
    Zou B, Zhou G, Zhu Q (2014) Negation focus identification with contextual discourse information. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (vol 1: long papers). Association for Computational Linguistics, Baltimore, Maryland, pp 522–530Google Scholar
  16. 16.
    (2012) US demographic and business summary data. Product guideGoogle Scholar
  17. 17.
    Zhai C, Lafferty JD (2004) A study of smoothing methods for language models applied to information retrieval. ACM Trans Inf Syst 22(2):179–214CrossRefGoogle Scholar
  18. 18.
    Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Ser. ACL ’04Google Scholar
  19. 19.
    Liu T-Y (2009) Learning to rank for information retrieval. Found Trends Inf Retr 3(3):225–331CrossRefGoogle Scholar
  20. 20.
    Turney P-D (2002) Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics, ser. ACL ’02, 2002, pp 417–424Google Scholar
  21. 21.
    Ganjisaffar Y, Caruana R, Lopes C-V (2011) Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, ser. SIGIR ’11, 2011, pp 85–94Google Scholar
  22. 22.
    Zhang H, Riedl E, Petrushin V-A, Pal S, Spoelstra J (2012) Committee based prediction system for recommendation: KDD cup 2011, track2. In: Proceedings of KDD cup 2011 competition, San Diego, CA, USA, 2011, pp 215–229Google Scholar
  23. 23.
    Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth and Brooks, MontereyMATHGoogle Scholar
  24. 24.
    Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Friedman JH (2000) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Ho TK, Hull JJ, Srihari SN (1994) Decision combination in multiple classifier systems. IEEE Trans Pattern Anal Mach Intell 16(1):66–75CrossRefGoogle Scholar
  28. 28.
    Joachims T (2006) Training linear svms in linear time. In Ser. KDD ’06Google Scholar
  29. 29.
    Freund Y, Iyer R, Schapire R-E, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969MathSciNetMATHGoogle Scholar
  30. 30.
    Cao Z, Qin T, Liu T-Y, Tsai M-F, Li H (2007) Learning to rank: from pairwise approach to listwise approach. In Ser. ICML ’07Google Scholar
  31. 31.
    Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Ser. SIGIR ’07Google Scholar
  32. 32.
    Weng J, Lim E-P, Jiang J, He Q (2010) Twitterrank: finding topic-sensitive influential twitterers. In: WSDMGoogle Scholar
  33. 33.
    Chapelle O, Joachims T, Radlinski F, Yue Y (2012) Large-scale validation and analysis of interleaved search evaluation. ACM Trans Inf Syst 30(1):6:1–6:41CrossRefGoogle Scholar
  34. 34.
    Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Ser. WWW ’01Google Scholar
  35. 35.
    Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734–749Google Scholar
  36. 36.
    Symeonidis P, Tiakas E, Manolopoulos Y (2011) Product recommendation and rating prediction based on multi-modal social networks. In: Ser. RecSys ’11Google Scholar
  37. 37.
    Korfiatis N, Poulos M (2013) Using online consumer reviews as a source for demographic recommendations: a case study using online travel reviews. Expert Syst Appl 40(14):5507–5515CrossRefGoogle Scholar
  38. 38.
    Qiu L, Benbasat I (2010) A study of demographic embodiments of product recommendation agents in electronic commerce. Int J Hum Comput Stud 68(10):669–688CrossRefGoogle Scholar
  39. 39.
    Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135CrossRefGoogle Scholar
  40. 40.
    Liu Y, Huang J, An A, Yu X (2007) ARSA: a sentiment-aware model for predicting sales performance using blogs. In: SIGIRGoogle Scholar
  41. 41.
    McGlohon M, Glance NS, Reiter Z (2010) Star quality: aggregating reviews to rank products and merchants. In: ICWSMGoogle Scholar
  42. 42.
    Ganu G, Kakodkar Y, Marian A (2013) Improving the quality of predictions using textual information in online user reviews. Inf Syst 38(1):1–15CrossRefGoogle Scholar
  43. 43.
    Zhang Y, Lai G, Zhang M, Zhang Y, Liu Y, Ma S (2014) Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In: SIGIRGoogle Scholar
  44. 44.
    Zhang Y, Zhang H, Zhang M, Liu Y, Ma S (2014) Do users rate or review? Boost phrase-level sentiment labeling with review-level sentiment classification. In: SIGIRGoogle Scholar
  45. 45.
    Pazzani M-J (1999) A framework for collaborative, content-based and demographic filtering. Artif Intell Rev 13(5–6):393–408CrossRefGoogle Scholar
  46. 46.
    Seroussi Y, Bohnert F, Zukerman I (2011) Personalised rating prediction for new users using latent factor models. In: ACM HHGoogle Scholar
  47. 47.
    Dai HK, Zhao L, Nie Z, Wen J-R, Wang L, Li Y (2006) Detecting online commercial intention (oci). In: WWW ’06Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  • Wayne Xin Zhao
    • 1
    • 2
  • Sui Li
    • 4
  • Yulan He
    • 3
  • Liwei Wang
    • 4
  • Ji-Rong Wen
    • 1
    • 2
  • Xiaoming Li
    • 4
  1. 1.School of InformationRenmin University of ChinaBeijingChina
  2. 2.Beijing Key Laboratory of Big Data Management and Analysis MethodsBeijingChina
  3. 3.School of Engineering and Applied ScienceAston UniversityBirminghamUK
  4. 4.School of Electronics Engineering and Computer SciencesPeking UniversityBeijingChina

Personalised recommendations