Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop

Dhingra, Komal; Yadav, Sumit Kr

doi:10.1007/s13042-017-0768-3

Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop

Original Article
Published: 16 December 2017

Volume 10, pages 2143–2162, (2019)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

701 Accesses
19 Citations
Explore all metrics

Abstract

Online reviews are the most easily available free information sources used by both organizations and customers to make decisions. Establishments are utilizing significance of opinions to earn undue profit by hiring professionals known as spammers, giving positive comments on their products and negative opinions on their competitor’s product. This activity is known as opinion spamming and should be identified to give genuine results containing sentiments towards a product. So far, opinion spam detection has been considered as a discrete classification problem, generally as spam and non-spam. However, it involves uncertainty as suspicious behavior of a user might be due to coincidence. As, fuzzy logic handles real world uncertainty very well, we propose a novel fuzzy modeling based solution to the problem. We have proposed four fuzzy input linguistic variable and considered suspicious level of a spammer group to be one of—Ultra, Mega, Immense, Highly, Moderate, Slightly and Feebly. We have defined novel FSL Deduction Algorithm generating 81 fuzzy rules and Fuzzy Ranking Evaluation Algorithm (FREA) to determine the extent to which a group is suspicious. As reviews dataset satisfy the three V’s of big data (Volume, Velocity and Variety), we have considered this problem as a big data problem and used Hadoop for storage and analyzation. We have further demonstrated our proposed algorithm using a sample reviews dataset and Amazon reviews dataset achieving an accuracy of 80.77% which unlike other approaches remains steady for large number of groups and deals well with uncertainty involved in opinion spam detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-view Ensemble Learning Using Rough Set Based Feature Ranking for Opinion Spam Detection

Opinion spam detection framework using hybrid classification scheme

Article 11 June 2019

An Automatic Rating System Based on Review Sentiments and Intuitionistic Fuzzy Sets

Notes

References

Abouelenien M, Perez-Rosas V, Zhao B, Mihalcea R, Burzo M (2017) Gender-based multimodal deception detection. In: Symposium On Applied Computing (SAC) 2017. ACM, Morocco. https://doi.org/10.1145/3019612.3019644
Chapter Google Scholar
Adike MR, Reddy V (2016) Detection of fake review and brand spam using data mining. Int J Recent Trends Eng Res 2(7):251–256
Google Scholar
Agarwal A, Sharma V, Sikka G, Dhir R (2016) Opinion mining of news headlines using SentiWordNet. Symposium on Colossal Data Analysis and Networking (CDAN). IEEE, pp 1–5. https://doi.org/10.1109/CDAN.2016.7570949
Ahuja Y, Yadav SK (2012) Multiclass classification and support vector machine. Global J Comput Sci Technol Interdiscip 12(11):14–20
Google Scholar
Akoglu L, Chandy R, Faloutsos C (2013) Opinion fraud detection in online reviews by network effects. In: Seventh international AAAI conference on weblogs and social media vol 13. AAAI Publications, pp 2–11
Al-Anzi FS, Yadav SK, Soni J (2014) Cloud computing: security model comprising governance, risk management and compliance. In: International conference on data mining and intelligent computing (ICDMIC). IEEE, pp. 1–6
Andrea E, Sebastiani F (2006) Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proceedings of the 5th conference on language resources and evaluation (LREC 2006), vol. 6, pp. 417–422
Ashfaq RAR, Wang XZ, Huang JZ, Abbas H, He YL (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci 378:484–497. https://doi.org/10.1016/j.ins.2016.04.019
Article Google Scholar
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC vol 10. European Language Resources Association, pp 2200–2204
Balazs JA, Velasquez JD (2016) Opinion mining and information fusion: a survey. Inf Fusion 27:95–110. https://doi.org/10.1016/j.inffus.2015.06.002
Article Google Scholar
Benevenuto F, Araujo M, Ribeiro F (2015) Sentiment analysis methods for social media. In: Proceedings of the 21st Brazilian symposium on multimedia and the web. ACM, pp. 11–11. https://doi.org/10.1145/2820426.2820642
Bhushan M, Banerjea S, Yadav SK (2014) Bloom filter based optimization on HBase with MapReduce. In: 2014 International conference on data mining and intelligent computing (ICDMIC). IEEE, pp. 1–5
Bhuta S, Doshi U (2014) A review of techniques for sentiment analysis of twitter data. In: 2014 International conference on issues and challenges in intelligent computing techniques (ICICT). IEEE, pp 583–591. https://doi.org/10.1109/ICICICT.2014.6781346
Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28(2):15–21. https://doi.org/10.1109/MIS.2013.30
Article Google Scholar
Chavan A, Darekar O, Kulkarni O, Jain Y (2017) Spam reviews detection using Hadoop. Int J Eng Comput Sci 6(2):20320–20323. https://doi.org/10.18535/ijecs/v6i2.30
Article Google Scholar
Choo E, Yu T, Chi M (2015) Detecting opinion spammer groups through community discovery and sentiment analysis. In: Samarati P (ed) Data and applications security and privacy XXIX. DBSec 2015. Lecture Notes Computer Science vol 9149. Springer, Cham, pp 170–187. https://doi.org/10.1007/978-3-319-20810-7_11
Cormack GV (2008) Email spam filtering: a systematic review. Found Trends® Inf Retr 1(4):335–455. https://doi.org/10.1561/1500000006
Article Google Scholar
Crawford M et al (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):23. https://doi.org/10.1186/s40537-015-0029-9
Article Google Scholar
DeRoos D, Zikopoulos P, Brown B, Coss R, Melnyk RB (2014) Hadoop for dummies. Wiley, Hoboken
Google Scholar
Dixit S, Agrawal AJ (2013) Survey on review spam detection. Int J Comput Commun Technol 4(2):68–72
Google Scholar
Emmanuel I, Stanier C (2016) Defining big data. In: Proceedings of the international conference on big data and advanced wireless technologies. ACM, p. 5. https://doi.org/10.1145/3010089.3010090
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56 (4): pp 82–89. https://doi.org/10.1145/2436256.2436274
Article Google Scholar
Fusilier DH, Montes-y-Gomez M, Rosso P, Cabrera RG (2015) Detection of opinion spam with character n-grams. In: International conference on intelligent text processing and computational linguistics. Springer, pp. 285–294. https://doi.org/10.1007/978-3-319-18117-2_21
Gimenes G, Cordeiro RL, Rodrigues-Jr JF (2017) ORFEL: efficient detection of defamation or illegitimate promotion in online recommendation. Inf Sci 379:274–287. https://doi.org/10.1016/j.ins.2016.09.006
Article Google Scholar
Gu B, Sheng VS (2016) A robust regularization path algorithm for ν -support vector classification. IEEE Transac Neural Netw Learn Syst 28(5):1241–1248. https://doi.org/10.1109/TNNLS.2016.2527796
Article Google Scholar
Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Transac Neural Netw Learn Syst 28(7):1646–1656. https://doi.org/10.1109/TNNLS.2016.2544779
Article MathSciNet Google Scholar
Heydari A, Tavakoli M, Salim N (2016) Detection of fake opinions using time series. Expert Syst Appl 58(C):83–92. https://doi.org/10.1016/j.eswa.2016.03.020
Article Google Scholar
Hu X, Tang J, Zhang Y, Liu H (2013) Social spammer detection in microblogging. In: Proceedings of the twenty-third international joint conference on artificial intelligence (IJCAI), vol. 13, pp. 2633–2639
Hyun Y, Kim N (2016) Detecting blog spam hashtags using topic modeling. In: Proceedings of the 18th annual international conference on electronic commerce: e-commerce in smart connected world. ACM, p. 43. https://doi.org/10.1145/2971603.2971646
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp. 219–230. https://doi.org/10.1145/1341531.1341560
Kaur A, Gupta V (2013) A survey on sentiment analysis and opinion mining techniques. J Emerging Technol Web Intell 5(4):367–371. https://doi.org/10.4304/jetwi.5.4.367-371
Article Google Scholar
Kim S, Chang H, Lee S, Yu M, Kang J (2015) Deep semantic frame-based deceptive opinion spam analysis. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp. 1131–1140. https://doi.org/10.1145/2806416.2806551
Kumar S, Gao X, Welch I (2016) Novel features for web spam detection. In: 28th international conference on tools with artificial intelligence (ICTAI). IEEE, pp. 593–597. https://doi.org/10.1109/ICTAI.2016.0096
Li H, Chen Z, Mukherjee A, Liu B, Shao J (2015) Analyzing and detecting opinion spam on a large-scale dataset via temporal and spatial patterns. In: International AAAI conference on web and social media. AAAI Press, California pp 634–637
Google Scholar
Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 3, pp. 2488–2493. https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-414
Li J, Ott M, Cardie C, Hovy EH (2014) Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd annual meeting of the association for computational linguistics. ACL, Baltimore, pp. 1566–1576
Li L, Ren W, Qin B, Liu T (2015) Learning document representation for deceptive opinion spam detection. In: Sun M, Liu Z, Zhang M, Liu Y (eds) Chinese computational linguistics and natural language processing based on naturally annotated big data. Lecture NotesComputer Science vol 9427. Springer, Cham, pp 393–403. https://doi.org/10.1007/978-3-319-25816-4_32
Chapter Google Scholar
Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock price return via sentiment analysis. Knowl-Based Syst 69:14–23. https://doi.org/10.1016/j.knosys.2014.04.022
Article Google Scholar
Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, pp. 939–948. https://doi.org/10.1145/1871437.1871557
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187. https://doi.org/10.1016/j.knosys.2015.12.019
Article Google Scholar
Liu B (2012) Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5(1):1–167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
McAuley J, Pandey R, Leskovec J (2015) Inferring networks of substitutable and complementary products. In: Proceedings of the 21th ACM SIGKDD International conference on knowledge discovery and data mining. ACM, pp. 785–794
McAuley J, Targett C, Shi Q, Hengel AVD (2015) Image-based recommendations on styles and substitutes. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp. 43–52
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp. 191–200. https://doi.org/10.1145/2187836.2187863
Nadaf SB, Gujar AD (2016) A survey paper on spam mail detection using RFD. Int J Adv Res Comput Sci Manag Stud 4(1):46–48
Google Scholar
Nandimath JN, Katkar BS, Ghadge VU, Garad AN (2017) Efficiently detecting and analyzing spam reviews using live data feed. Int Res J Eng Technol (IRJET) 4(2):1421–1424
Google Scholar
Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: Proceedings of the 2nd international conference on knowledge capture, pp 70–77. https://doi.org/10.1145/945645.945658
Neviarouskaya A, Prendinger H, Ishizuka M (2011) SentiFul: A lexicon for sentiment analysis. IEEE Transac Affect Comput 2(1):22–36. https://doi.org/10.1109/T-AFFC.2011.1
Article Google Scholar
Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the 15th international conference on World Wide Web. ACM, pp. 83–92. https://doi.org/10.1145/1135777.1135794
Ohana B, Tierney B (2009) Sentiment classification of reviews using SentiWordNet. In: 9th it and t conference, Dublin Institute of Technology, Dublin, p 13. https://doi.org/10.21427/D77S56
Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) 2013. Association for Computational Linguistics, pp. 497–501
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol.1. Association for Computational Linguistics, pp. 309–319
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends® Inf Retr 2(1–2):1–135. https://doi.org/10.1561/150000001
Article Google Scholar
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol. 10. Association for Computational Linguistics, pp. 79–86. https://doi.org/10.3115/1118693.1118704
Peng J, Choo KK, Ashman H (2016) Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. J Netw Comput Appl 70:171–182. https://doi.org/10.1016/j.jnca.2016.04.001
Article Google Scholar
Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108(C):42–49. https://doi.org/10.1016/j.knosys.2016.06.009
Article Google Scholar
Qian T, Liu B (2013) Identifying multiple userids of the same author. In: Proceedings of conference on empirical methods in natural language processing (EMNLP-2013), pp. 1124–1135
Qiu J, Wu Q, Ding G, Xu Y, Feng S (2016) A survey of machine learning for big data processing. EURASIP J Adv Signal Process 2016(1):1–16. https://doi.org/10.1186/s13634-016-0355-x
Article Google Scholar
Rao Y, Xie H, Li J, Jin F, Wang FL, Li Q (2016) Social emotion classification of short text via topic-level maximum entropy model. Inf Manag 53(8):978–986
Article Google Scholar
Ren Y, Ji D (2017) Neural networks for deceptive opinion spam detection: an empirical study. Inf Sci 385(C):213–224. https://doi.org/10.1016/j.ins.2017.01.015
Article Google Scholar
Roul RK, Asthana SR, Kumar G (2016) Spam web page detection using combined content and link features. Int J Data Mining Model Manag 8(3):209–222
Google Scholar
Rout J, Dalmia A, Choo KK, Bakshi S, Jena S (2017) Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5:1319–1327. https://doi.org/10.1109/ACCESS.2017.2655032
Article Google Scholar
Rubin VL (2017) Deception detection and rumor debunking for social media. In: Sloan L, Quan-Haase(eds) A handbook of social media research methods. Sage, London, pp 1–25
Google Scholar
Schuckert M, Liu X, Law R (2016) Insights into suspicious online ratings: direct evidence from TripAdvisor. Asia Pacific J Tourism Res 21(3):259–272. https://doi.org/10.1080/10941605.2015.1029954
Article Google Scholar
Sheela LJ (2016) A review of sentiment analysis in twitter data using Hadoop. Int J Database Theory Appl 9(1):77–86. https://doi.org/10.14257/ijdta.2016.9.1.07
Article Google Scholar
Taddy M (2013) Measuring political sentiment on twitter: factor optimal design for multinomial inverse regression. Technometrics 55(4):415–425. https://doi.org/10.1080/00401706.2013.778791
Article MathSciNet Google Scholar
Tavakoli M, Heydari A, Ismail Z, Salim N (2015) A framework for review spam detection research. World Academy of Science, Engineering and Technology. Int J Comput Electrical Autom Control Inf Eng 10(1):67–71
Google Scholar
Tayal DK, Yadav SK (2015) Word level sentiment analysis using fuzzy sets. Int J Adv Sci Technol. 54: 73–78
Google Scholar
Tayal DK, Yadav SK (2016) Fast retrieval approach of sentimental analysis with implementation of bloom filter on Hadoop. In: 2016 International conference on computational techniques in information and communication technologies (ICCTICT). IEEE, pp. 14–18. https://doi.org/10.1109/ICCTICT.2016.7514544
Tayal DK, Yadav SK (2016) Sentiment analysis on social campaign “Swachh Bharat Abhiyan” using unigram method. AI & SOCIETY, pp 1–13. https://doi.org/10.1007/s00146-016-0672-5
Tayal DK, Yadav S, Gupta K, Rajput B, Kumari K (2014) Polarity detection of sarcastic political tweets. In: 2014 International conference on computing for sustainable global development (INDIACom). IEEE, pp. 625–628. https://doi.org/10.1109/IndiaCom.2014.6828037
Tsang ECC, Chen D, Yeung DS, Wang XZ, Lee JWT (2008) Attributes reduction using fuzzy rough sets. IEEE Transac Fuzzy Syst 16(5):1130–1141. https://doi.org/10.1109/TFUZZ.2006.889960
Article Google Scholar
Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with twitter: What 140 characters reveal about political sentiment. In: Proceedings of the fourth international aaai conference on weblogs and social media(ICWSM), vol. 10, no. 1, pp. 178–185
Tuteja SK (2016) A Survey on classification algorithms for email spam filtering. Int J Eng Sci 6(5):5937–5940. https://doi.org/10.4010/2016.1440
Article Google Scholar
Vashisht P, Gupta V (2015) Big data analytics techniques: a survey. In: Green Computing and Internet of Things (ICGCIoT), 2015 International Conference. IEEE, pp. 264–269. https://doi.org/10.1109/ICGCIoT.2015.7380470
Viviani M, Pasi G (2017) Quantifier guided aggregation for the veracity assessment of online reviews. Int J Intell Syst 32(5):481–501. https://doi.org/10.1002/int.21844
Article Google Scholar
Wang XZ (2015) Learning from big data with uncertainty–editorial. J Intell Fuzzy Syst 28(5):2329–2330
Article MathSciNet Google Scholar
Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654. https://doi.org/10.1109/TFUZZ.2014.2371479
Article Google Scholar
Yadav SK, Bhushan M, Gupta S (2015) Multimodal sentiment analysis: Sentiment analysis using audiovisual format. In: 2015 2nd international conference on computing for Sustainable Global Development (INDIACom). IEEE, pp. 1415–1419
Yadav S, Dhingra K, Kaushik D (2016) Opinion mining using SentiFul. In: 3rd International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, pp. 2406–2411
Ye J, Kumar S, Akoglu L (2016) Temporal opinion spam detection by multivariate indicative signals. In: Proceedings of the tenth international AAAI conference on web and social media. Association for the Advancement of Artificial Intelligence, pp. 743–746
Yen J, Langari R (1998) Fuzzy logic: intelligence, control, and information. Prentice-Hall, Inc., Upper Saddle River
Google Scholar
Zheng X, Lin Z, Wang X, Lin KJ, Song M (2014) Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification. Knowl-Based Syst 61(1):29–47. https://doi.org/10.1016/j.knosys.2014.02.003
Article Google Scholar

Download references

Author information

Authors and Affiliations

Indira Gandhi Delhi Technical University for Women, Delhi, India
Komal Dhingra & Sumit Kr Yadav

Authors

Komal Dhingra
View author publications
You can also search for this author in PubMed Google Scholar
Sumit Kr Yadav
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Komal Dhingra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dhingra, K., Yadav, S.K. Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop. Int. J. Mach. Learn. & Cyber. 10, 2143–2162 (2019). https://doi.org/10.1007/s13042-017-0768-3

Download citation

Received: 22 March 2017
Accepted: 07 December 2017
Published: 16 December 2017
Issue Date: 01 August 2019
DOI: https://doi.org/10.1007/s13042-017-0768-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop

Abstract

Access this article

Similar content being viewed by others

Multi-view Ensemble Learning Using Rough Set Based Feature Ranking for Opinion Spam Detection

Opinion spam detection framework using hybrid classification scheme

An Automatic Rating System Based on Review Sentiments and Intuitionistic Fuzzy Sets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop

Abstract

Access this article

Similar content being viewed by others

Multi-view Ensemble Learning Using Rough Set Based Feature Ranking for Opinion Spam Detection

Opinion spam detection framework using hybrid classification scheme

An Automatic Rating System Based on Review Sentiments and Intuitionistic Fuzzy Sets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation