Advertisement

Knowledge and Information Systems

, Volume 53, Issue 3, pp 723–747 | Cite as

Mining collective knowledge: inferring functional labels from online review for business

  • Feifan Fan
  • Wayne Xin Zhao
  • Ji-Rong Wen
  • Ge Xu
  • Edward Y. Chang
Regular Paper
  • 458 Downloads

Abstract

With the increasing popularity of online e-commerce services, a large volume of online reviews have been constantly generated by users. In this paper, we propose to study the problem of inferring functional labels using online review text. Functional labels summarize and highlight the main characteristics of a business, which can serve as bridges between the consumption needs and the service functions. We consider two kinds of semantic similarities: lexical similarity and embedding similarity, which characterize the relatedness in two different perspectives. To measure the lexical similarity, we use the classic probabilistic ranking formula, i.e., BM25; to measure the embedding similarity, we propose an extended embedding model which can incorporate weak supervised information derived from review text. These two kinds of similarities compensate each other and capture the semantic relatedness in a more comprehensive way. We construct a test collection consisting of four different domains based on a Yelp dataset and consider multiple baseline methods for comparison. Extensive experiments have shown that the proposed methods are very effective.

Keywords

Functional label Pseudo-label Embedding model Online review 

Notes

Acknowledgements

The authors thank the anonymous reviewers for their valuable and constructive comments. The work was partially supported by National Natural Science Foundation of China under the Grant Number 61502502, Beijing Natural Science Foundation under the Grant Number 4162032, and Open Fund Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (No. MJUKF201703).

References

  1. 1.
    Archak N, Ghose A, Ipeirotis P (2007) Show me the money! Deriving the pricing power of product features by mining consumer reviews. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining (KDD)Google Scholar
  2. 2.
    Barker K, Cornacchia N (2000) Using noun phrase heads to extract document keyphrases. In: Advances in artificial intelligence. Springer, Berlin, pp 40–52Google Scholar
  3. 3.
    Bengio Y, Ducharme R, Vincent P, Janvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155MATHGoogle Scholar
  4. 4.
    Bengio Y, LeCun Y, Henderson D (1993) Globally trained handwritten word recognizer using spatial representation, convolutional neural networks, and hidden markov models. In: 7th NIPS conference on advances in neural information processing systems 6, Denver, Colorado, USA, pp 937–944Google Scholar
  5. 5.
    Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022Google Scholar
  6. 6.
    Bordes A, Usunier N, García-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 2787–2795Google Scholar
  7. 7.
    Branavan SRK, Chen H, Eisenstein J, Barzilay R (2008) Learning document-level semantic properties from free-text annotations. In: Proceedings of the Association for Computational Linguistics (ACL)Google Scholar
  8. 8.
    Breck E, Choi Y, Cardie C (2007) Identifying expressions of opinion in context. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), Hyderabad, IndiaGoogle Scholar
  9. 9.
    Ganu G, Elhadad N, Marian A (2009) Beyond the stars: improving rating predictions using review text content. In: Proceedings of the 12th international workshop on the web and databases (WebDB)Google Scholar
  10. 10.
    Ganu G, Kakodkar Y, Marian AL (2013) Improving the quality of predictions using textual information in online user reviews. Inf Syst 38(1):1–15CrossRefGoogle Scholar
  11. 11.
    Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining (KDD), pp 168–177Google Scholar
  12. 12.
    Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the conference on web search and web data mining (WSDM), pp 219–230Google Scholar
  13. 13.
    Jones KS, Walker S, Robertson SE (2000) A probabilistic model of information retrieval: development and comparative experiments—part 1. Inf Process Manag 36(6):779–808CrossRefGoogle Scholar
  14. 14.
    Jones KS, Walker S, Robertso SE (2000) A probabilistic model of information retrieval: development and comparative experiments—part 2. Inf Process Manag 36(6):809–840CrossRefGoogle Scholar
  15. 15.
    Kiros R, Salakhutdinov R, Zemel RS (2014) Multimodal neural language models. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, pp 595–603Google Scholar
  16. 16.
    Kiros R, Zemel RS, Salakhutdinov RR (2014) A multiplicative model for learning distributed text-based attribute representations. In: Advances in neural information processing systems 27: annual conference on neural information processing systems 2014, Montreal, Quebec, Canada, pp 2348–2356Google Scholar
  17. 17.
    Krämer B (1995) Classification of generic places: explorations with implications for evaluation. J Environ Psychol 15(1):3–22CrossRefGoogle Scholar
  18. 18.
    Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31th international conference on machine learning, ICML 2014, Beijing, China, pp 1188–1196Google Scholar
  19. 19.
    Litvak M, Last M (2008) Graph-based keyword extraction for single-document summarization. In: Proceedings of the workshop on multi-source multilingual information extraction and summarization. Association for Computational Linguistics, pp 17–24Google Scholar
  20. 20.
    Liu Y, Huang J, An A, Yu X (2007) ARSA: A sentiment-aware model for predicting sales performance using blogs. In: Proceedings of the ACM special interest group on information retrieval (SIGIR)Google Scholar
  21. 21.
    Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 366–376Google Scholar
  22. 22.
    McGlohon M, Glance NS, Reiter Z (2010) Star quality: aggregating reviews to rank products and merchants. In: ICWSM. The AAAI PressGoogle Scholar
  23. 23.
    Mei Q , Ling X, Wondra M, Su H, Zhai CX (2007) Topic sentiment mixture: modeling facets and opinions in weblogs. In: Proceedings of WWW, New York, NY, USA. ACM Press, pp 171–180Google Scholar
  24. 24.
    Mihalcea R, Tarau P (2004) Textrank: bringing order into texts. Association for Computational LinguisticsGoogle Scholar
  25. 25.
    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. CoRR, abs/1301.3781Google Scholar
  26. 26.
    Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 3111–3119Google Scholar
  27. 27.
    Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135CrossRefGoogle Scholar
  28. 28.
    Socher R, Chen D, Manning CD, Ng AY (2013) Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 926–934Google Scholar
  29. 29.
    Tomokiyo T, Hurst M (2003) A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on multiword expressions: analysis, acquisition and treatment, vol 18. Association for Computational Linguistics, pp 33–40Google Scholar
  30. 30.
    Wan X, Xiao J (2008) Collabrank: towards a collaborative approach to single-document keyphrase extraction. In: Proceedings of the 22nd international conference on computational linguistics, vol 1. Association for Computational Linguistics, pp 969–976Google Scholar
  31. 31.
    Wan X, Xiao J (2008) Single document keyphrase extraction using neighborhood knowledge. AAAI 8:855–860Google Scholar
  32. 32.
    Wang J, Zhao WX, He Y, Li X (2014) Infer user interests via link structure regularization. ACM TIST 5(2):23:1–23:22Google Scholar
  33. 33.
    Xu X, Tan S, Liu Y, Cheng X, Lin Z (2012) Towards jointly extracting aspects and aspect-specific sentiment knowledge. In: 21st ACM international conference on information and knowledge management, CIKM’12, Maui, HI, USA, pp 1895–1899Google Scholar
  34. 34.
    Zhao WX, Li S, He Y, Chang EY, Wen J-R, Li X (2016) Connecting social media to e-commerce: Cold-start product recommendation using microblogging information. IEEE Trans Knowl Data Eng 28(5):1147–1159CrossRefGoogle Scholar
  35. 35.
    Zhao XW, Wang J, He Y, Nie JY, Li X (2013) Originator or propagator?: incorporating social role theory into topic models for twitter content analysis. In: 22nd ACM international conference on information and knowledge management, CIKM’13, San Francisco, CA, USA, pp 1649–1654Google Scholar
  36. 36.
    Zhao X, Jiang J, Yan H, Li X (2010) Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Cambridge, MA. Association for Computational Linguistics, pp 56–65Google Scholar

Copyright information

© Springer-Verlag London 2017

Authors and Affiliations

  • Feifan Fan
    • 1
    • 4
  • Wayne Xin Zhao
    • 2
    • 3
  • Ji-Rong Wen
    • 2
    • 3
  • Ge Xu
    • 4
    • 5
  • Edward Y. Chang
    • 6
  1. 1.Institute of Computer Science and TechnologyPeking UniversityBeijingChina
  2. 2.School of InformationRenmin University of ChinaBeijingChina
  3. 3.Beijing Key Laboratory of Big Data Management and Analysis MethodsBeijingChina
  4. 4.Fujian Provincial Key Laboratory of Information Processing and Intelligent ControlMinjiang UniversityFuzhouChina
  5. 5.Department of Computer ScienceMinjiang UniversityFuzhouChina
  6. 6.HTC Research & HealthcareSan FranciscoUSA

Personalised recommendations