Skip to main content
Log in

A Skip-gram-based Framework to Extract Knowledge from Chinese Reviews in Cloud Environment

  • Published:
Mobile Networks and Applications Aims and scope Submit manuscript

Abstract

With the development of cloud computing technologies, eBusiness systems and applications pay more attention on customer reviews, such as commodity, customer’s emotion. These review data contain a vast amount of valuable information. It is challenging to extract knowledge from these reviews in cloud environment, because they are massive, usually distributed, and keep constantly changing. In this paper, a novel framework to extract knowledge from Chinese review data is proposed, which mainly includes building knowledge space, retrieving knowledge and optimizing results. For Chinese reviews, a skip-gram-based model is used to train review data and generate the knowledge space. To quickly build knowledge space, an algorithm based on hierarchical softmax is proposed, which does not need any feature extraction and modelization. This algorithm is applicable for massive data and conveniently extended in cloud environment. When retrieving knowledge and optimizing results, our framework uses euclidean distance to find the knowledge, closely linked to the query, and uses 2-gram algorithm to optimize the results. Experimental results show that our framework is practical and efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Swapna Gottipati, Jing Jiang (2012) Finding thoughtful comments from social media. In: Proceedings of 20th International Conference on Computational Linguistics, pages 995–1010 Citeseer

  2. Marios Kokkodis (2012) Learning from positive and unlabeled amazon reviews towards identifying trustworthy reviewers. In: Proceedings of the 21st International Conference on World Wide Web, pages 545–546. ACM

  3. Michele Banko, Oren Etzioni, Turing Center (2008) The tradeoffs between open and traditional relation extraction. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pages 28–36. ACL

  4. Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, Ji-Rong Wen (2009) Statsnowball: a statistical approach to extracting entity relationships. In: Proceedings of the 18th International Conference on World Wide Web, pages 101–110. ACM

  5. Ruiji Fu, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, Ting Liu (2014) Learning semantic hierarchies via word embeddings. In: Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics, pages 1199–1209 ACL

  6. Chen Min, Mao Shiwen, Yunhao Liu (2014) Big data: A survey. Mob Networks Appl 19(2):171–209

    Article  Google Scholar 

  7. Niu Feng, Ce Zhang, Christopher Ré, Jude W Shavlik (2012) Deepdive Web-scale knowledge-base construction using statistical learning and inference. VLDS J 12(1):25–28

    Google Scholar 

  8. Xu Yu, Li Peng, Zhixing Huang, Hai Zhuge (2014) A framework for automated construction of resource space based on background knowledge. Futur Gener Comput Syst 32(8):222–231

    Article  Google Scholar 

  9. Johannes Hoffart, Fabian M Suchanek, Klaus Berberich, Edwin Lewis-Kelham, Gerard De Melo, Gerhard Weikum (2011) Yago2:exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of the 20th International Conference on World Wide Web, pages 229–232 ACM

  10. Brambilla Marco, Ceri Stefano, Halevy Alon (2013) Special issue on structured and crowd-sourced data on the web. The VLDB J 22(5):587–588

    Article  Google Scholar 

  11. Sarkas Nikos, Paparizos Stelios, Panayiotis Tsaparas (2010) Structured annotations of web queries

  12. Gao Yunjun, Liu Qing, Zheng Baihua, Chen Gang (2014) On efficient reverse skyline query processing. Expert Syst Appl 41(7):3237–3249

    Article  Google Scholar 

  13. Raghunathan Rohit, De Sushovan, Kambhampati Subbarao (2014) Bayesian networks for supporting query processing over incomplete autonomous databases. J Intell Inf Syst 42(3):595–618

    Article  Google Scholar 

  14. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, Jeff Dean (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of 26th Annual Conference on Neural Information Processing Systems, pages 3111- 3119. IEEE

  15. ANSJ. https://github.com/NLPchina/ansj_seg

  16. Pascal Denis, Benoît Sagot (2012) Coupling an annotated corpus and a lexicon for state-of-the-art pos tagging. Lang Resour Eval 46(4):721–736

  17. Jie Zhang, Xiaoyin Wang, Dan Hao, Bing Xie, Lu Zhang, Hong Mei (2015) A survey on bug-report analysis. Sci China Inf Sci 58(2):1–24

    Article  Google Scholar 

  18. Gary B Huang, Honglak Lee, Erik Learned-Miller (2012) Learning hierarchical representations for face verification with convolutional deep belief networks. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2518–2525, IEEE

Download references

Acknowledgments

This work was supported in part by National Natural Science Foundation of China under No.61272411.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Feng Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, F., Zhu, H., Jin, H. et al. A Skip-gram-based Framework to Extract Knowledge from Chinese Reviews in Cloud Environment. Mobile Netw Appl 20, 363–369 (2015). https://doi.org/10.1007/s11036-015-0612-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11036-015-0612-5

Keywords

Navigation