Abstract
Online reviews are the most easily available free information sources used by both organizations and customers to make decisions. Establishments are utilizing significance of opinions to earn undue profit by hiring professionals known as spammers, giving positive comments on their products and negative opinions on their competitor’s product. This activity is known as opinion spamming and should be identified to give genuine results containing sentiments towards a product. So far, opinion spam detection has been considered as a discrete classification problem, generally as spam and non-spam. However, it involves uncertainty as suspicious behavior of a user might be due to coincidence. As, fuzzy logic handles real world uncertainty very well, we propose a novel fuzzy modeling based solution to the problem. We have proposed four fuzzy input linguistic variable and considered suspicious level of a spammer group to be one of—Ultra, Mega, Immense, Highly, Moderate, Slightly and Feebly. We have defined novel FSL Deduction Algorithm generating 81 fuzzy rules and Fuzzy Ranking Evaluation Algorithm (FREA) to determine the extent to which a group is suspicious. As reviews dataset satisfy the three V’s of big data (Volume, Velocity and Variety), we have considered this problem as a big data problem and used Hadoop for storage and analyzation. We have further demonstrated our proposed algorithm using a sample reviews dataset and Amazon reviews dataset achieving an accuracy of 80.77% which unlike other approaches remains steady for large number of groups and deals well with uncertainty involved in opinion spam detection.
Similar content being viewed by others
References
Abouelenien M, Perez-Rosas V, Zhao B, Mihalcea R, Burzo M (2017) Gender-based multimodal deception detection. In: Symposium On Applied Computing (SAC) 2017. ACM, Morocco. https://doi.org/10.1145/3019612.3019644
Adike MR, Reddy V (2016) Detection of fake review and brand spam using data mining. Int J Recent Trends Eng Res 2(7):251–256
Agarwal A, Sharma V, Sikka G, Dhir R (2016) Opinion mining of news headlines using SentiWordNet. Symposium on Colossal Data Analysis and Networking (CDAN). IEEE, pp 1–5. https://doi.org/10.1109/CDAN.2016.7570949
Ahuja Y, Yadav SK (2012) Multiclass classification and support vector machine. Global J Comput Sci Technol Interdiscip 12(11):14–20
Akoglu L, Chandy R, Faloutsos C (2013) Opinion fraud detection in online reviews by network effects. In: Seventh international AAAI conference on weblogs and social media vol 13. AAAI Publications, pp 2–11
Al-Anzi FS, Yadav SK, Soni J (2014) Cloud computing: security model comprising governance, risk management and compliance. In: International conference on data mining and intelligent computing (ICDMIC). IEEE, pp. 1–6
Andrea E, Sebastiani F (2006) Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proceedings of the 5th conference on language resources and evaluation (LREC 2006), vol. 6, pp. 417–422
Ashfaq RAR, Wang XZ, Huang JZ, Abbas H, He YL (2017) Fuzziness based semi-supervised learning approach for intrusion detection system. Inf Sci 378:484–497. https://doi.org/10.1016/j.ins.2016.04.019
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In: LREC vol 10. European Language Resources Association, pp 2200–2204
Balazs JA, Velasquez JD (2016) Opinion mining and information fusion: a survey. Inf Fusion 27:95–110. https://doi.org/10.1016/j.inffus.2015.06.002
Benevenuto F, Araujo M, Ribeiro F (2015) Sentiment analysis methods for social media. In: Proceedings of the 21st Brazilian symposium on multimedia and the web. ACM, pp. 11–11. https://doi.org/10.1145/2820426.2820642
Bhushan M, Banerjea S, Yadav SK (2014) Bloom filter based optimization on HBase with MapReduce. In: 2014 International conference on data mining and intelligent computing (ICDMIC). IEEE, pp. 1–5
Bhuta S, Doshi U (2014) A review of techniques for sentiment analysis of twitter data. In: 2014 International conference on issues and challenges in intelligent computing techniques (ICICT). IEEE, pp 583–591. https://doi.org/10.1109/ICICICT.2014.6781346
Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28(2):15–21. https://doi.org/10.1109/MIS.2013.30
Chavan A, Darekar O, Kulkarni O, Jain Y (2017) Spam reviews detection using Hadoop. Int J Eng Comput Sci 6(2):20320–20323. https://doi.org/10.18535/ijecs/v6i2.30
Choo E, Yu T, Chi M (2015) Detecting opinion spammer groups through community discovery and sentiment analysis. In: Samarati P (ed) Data and applications security and privacy XXIX. DBSec 2015. Lecture Notes Computer Science vol 9149. Springer, Cham, pp 170–187. https://doi.org/10.1007/978-3-319-20810-7_11
Cormack GV (2008) Email spam filtering: a systematic review. Found Trends® Inf Retr 1(4):335–455. https://doi.org/10.1561/1500000006
Crawford M et al (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):23. https://doi.org/10.1186/s40537-015-0029-9
DeRoos D, Zikopoulos P, Brown B, Coss R, Melnyk RB (2014) Hadoop for dummies. Wiley, Hoboken
Dixit S, Agrawal AJ (2013) Survey on review spam detection. Int J Comput Commun Technol 4(2):68–72
Emmanuel I, Stanier C (2016) Defining big data. In: Proceedings of the international conference on big data and advanced wireless technologies. ACM, p. 5. https://doi.org/10.1145/3010089.3010090
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56 (4): pp 82–89. https://doi.org/10.1145/2436256.2436274
Fusilier DH, Montes-y-Gomez M, Rosso P, Cabrera RG (2015) Detection of opinion spam with character n-grams. In: International conference on intelligent text processing and computational linguistics. Springer, pp. 285–294. https://doi.org/10.1007/978-3-319-18117-2_21
Gimenes G, Cordeiro RL, Rodrigues-Jr JF (2017) ORFEL: efficient detection of defamation or illegitimate promotion in online recommendation. Inf Sci 379:274–287. https://doi.org/10.1016/j.ins.2016.09.006
Gu B, Sheng VS (2016) A robust regularization path algorithm for ν -support vector classification. IEEE Transac Neural Netw Learn Syst 28(5):1241–1248. https://doi.org/10.1109/TNNLS.2016.2527796
Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Transac Neural Netw Learn Syst 28(7):1646–1656. https://doi.org/10.1109/TNNLS.2016.2544779
Heydari A, Tavakoli M, Salim N (2016) Detection of fake opinions using time series. Expert Syst Appl 58(C):83–92. https://doi.org/10.1016/j.eswa.2016.03.020
Hu X, Tang J, Zhang Y, Liu H (2013) Social spammer detection in microblogging. In: Proceedings of the twenty-third international joint conference on artificial intelligence (IJCAI), vol. 13, pp. 2633–2639
Hyun Y, Kim N (2016) Detecting blog spam hashtags using topic modeling. In: Proceedings of the 18th annual international conference on electronic commerce: e-commerce in smart connected world. ACM, p. 43. https://doi.org/10.1145/2971603.2971646
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp. 219–230. https://doi.org/10.1145/1341531.1341560
Kaur A, Gupta V (2013) A survey on sentiment analysis and opinion mining techniques. J Emerging Technol Web Intell 5(4):367–371. https://doi.org/10.4304/jetwi.5.4.367-371
Kim S, Chang H, Lee S, Yu M, Kang J (2015) Deep semantic frame-based deceptive opinion spam analysis. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp. 1131–1140. https://doi.org/10.1145/2806416.2806551
Kumar S, Gao X, Welch I (2016) Novel features for web spam detection. In: 28th international conference on tools with artificial intelligence (ICTAI). IEEE, pp. 593–597. https://doi.org/10.1109/ICTAI.2016.0096
Li H, Chen Z, Mukherjee A, Liu B, Shao J (2015) Analyzing and detecting opinion spam on a large-scale dataset via temporal and spatial patterns. In: International AAAI conference on web and social media. AAAI Press, California pp 634–637
Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 3, pp. 2488–2493. https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-414
Li J, Ott M, Cardie C, Hovy EH (2014) Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd annual meeting of the association for computational linguistics. ACL, Baltimore, pp. 1566–1576
Li L, Ren W, Qin B, Liu T (2015) Learning document representation for deceptive opinion spam detection. In: Sun M, Liu Z, Zhang M, Liu Y (eds) Chinese computational linguistics and natural language processing based on naturally annotated big data. Lecture NotesComputer Science vol 9427. Springer, Cham, pp 393–403. https://doi.org/10.1007/978-3-319-25816-4_32
Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock price return via sentiment analysis. Knowl-Based Syst 69:14–23. https://doi.org/10.1016/j.knosys.2014.04.022
Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, pp. 939–948. https://doi.org/10.1145/1871437.1871557
Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2016) Efficient algorithms for mining high-utility itemsets in uncertain databases. Knowl-Based Syst 96:171–187. https://doi.org/10.1016/j.knosys.2015.12.019
Liu B (2012) Sentiment analysis and opinion mining. Synthesis lectures on human language technologies 5(1):1–167. https://doi.org/10.2200/S00416ED1V01Y201204HLT016
McAuley J, Pandey R, Leskovec J (2015) Inferring networks of substitutable and complementary products. In: Proceedings of the 21th ACM SIGKDD International conference on knowledge discovery and data mining. ACM, pp. 785–794
McAuley J, Targett C, Shi Q, Hengel AVD (2015) Image-based recommendations on styles and substitutes. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp. 43–52
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp. 191–200. https://doi.org/10.1145/2187836.2187863
Nadaf SB, Gujar AD (2016) A survey paper on spam mail detection using RFD. Int J Adv Res Comput Sci Manag Stud 4(1):46–48
Nandimath JN, Katkar BS, Ghadge VU, Garad AN (2017) Efficiently detecting and analyzing spam reviews using live data feed. Int Res J Eng Technol (IRJET) 4(2):1421–1424
Nasukawa T, Yi J (2003) Sentiment analysis: capturing favorability using natural language processing. In: Proceedings of the 2nd international conference on knowledge capture, pp 70–77. https://doi.org/10.1145/945645.945658
Neviarouskaya A, Prendinger H, Ishizuka M (2011) SentiFul: A lexicon for sentiment analysis. IEEE Transac Affect Comput 2(1):22–36. https://doi.org/10.1109/T-AFFC.2011.1
Ntoulas A, Najork M, Manasse M, Fetterly D (2006) Detecting spam web pages through content analysis. In: Proceedings of the 15th international conference on World Wide Web. ACM, pp. 83–92. https://doi.org/10.1145/1135777.1135794
Ohana B, Tierney B (2009) Sentiment classification of reviews using SentiWordNet. In: 9th it and t conference, Dublin Institute of Technology, Dublin, p 13. https://doi.org/10.21427/D77S56
Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) 2013. Association for Computational Linguistics, pp. 497–501
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol.1. Association for Computational Linguistics, pp. 309–319
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends® Inf Retr 2(1–2):1–135. https://doi.org/10.1561/150000001
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol. 10. Association for Computational Linguistics, pp. 79–86. https://doi.org/10.3115/1118693.1118704
Peng J, Choo KK, Ashman H (2016) Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. J Netw Comput Appl 70:171–182. https://doi.org/10.1016/j.jnca.2016.04.001
Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108(C):42–49. https://doi.org/10.1016/j.knosys.2016.06.009
Qian T, Liu B (2013) Identifying multiple userids of the same author. In: Proceedings of conference on empirical methods in natural language processing (EMNLP-2013), pp. 1124–1135
Qiu J, Wu Q, Ding G, Xu Y, Feng S (2016) A survey of machine learning for big data processing. EURASIP J Adv Signal Process 2016(1):1–16. https://doi.org/10.1186/s13634-016-0355-x
Rao Y, Xie H, Li J, Jin F, Wang FL, Li Q (2016) Social emotion classification of short text via topic-level maximum entropy model. Inf Manag 53(8):978–986
Ren Y, Ji D (2017) Neural networks for deceptive opinion spam detection: an empirical study. Inf Sci 385(C):213–224. https://doi.org/10.1016/j.ins.2017.01.015
Roul RK, Asthana SR, Kumar G (2016) Spam web page detection using combined content and link features. Int J Data Mining Model Manag 8(3):209–222
Rout J, Dalmia A, Choo KK, Bakshi S, Jena S (2017) Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5:1319–1327. https://doi.org/10.1109/ACCESS.2017.2655032
Rubin VL (2017) Deception detection and rumor debunking for social media. In: Sloan L, Quan-Haase(eds) A handbook of social media research methods. Sage, London, pp 1–25
Schuckert M, Liu X, Law R (2016) Insights into suspicious online ratings: direct evidence from TripAdvisor. Asia Pacific J Tourism Res 21(3):259–272. https://doi.org/10.1080/10941605.2015.1029954
Sheela LJ (2016) A review of sentiment analysis in twitter data using Hadoop. Int J Database Theory Appl 9(1):77–86. https://doi.org/10.14257/ijdta.2016.9.1.07
Taddy M (2013) Measuring political sentiment on twitter: factor optimal design for multinomial inverse regression. Technometrics 55(4):415–425. https://doi.org/10.1080/00401706.2013.778791
Tavakoli M, Heydari A, Ismail Z, Salim N (2015) A framework for review spam detection research. World Academy of Science, Engineering and Technology. Int J Comput Electrical Autom Control Inf Eng 10(1):67–71
Tayal DK, Yadav SK (2015) Word level sentiment analysis using fuzzy sets. Int J Adv Sci Technol. 54: 73–78
Tayal DK, Yadav SK (2016) Fast retrieval approach of sentimental analysis with implementation of bloom filter on Hadoop. In: 2016 International conference on computational techniques in information and communication technologies (ICCTICT). IEEE, pp. 14–18. https://doi.org/10.1109/ICCTICT.2016.7514544
Tayal DK, Yadav SK (2016) Sentiment analysis on social campaign “Swachh Bharat Abhiyan” using unigram method. AI & SOCIETY, pp 1–13. https://doi.org/10.1007/s00146-016-0672-5
Tayal DK, Yadav S, Gupta K, Rajput B, Kumari K (2014) Polarity detection of sarcastic political tweets. In: 2014 International conference on computing for sustainable global development (INDIACom). IEEE, pp. 625–628. https://doi.org/10.1109/IndiaCom.2014.6828037
Tsang ECC, Chen D, Yeung DS, Wang XZ, Lee JWT (2008) Attributes reduction using fuzzy rough sets. IEEE Transac Fuzzy Syst 16(5):1130–1141. https://doi.org/10.1109/TFUZZ.2006.889960
Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with twitter: What 140 characters reveal about political sentiment. In: Proceedings of the fourth international aaai conference on weblogs and social media(ICWSM), vol. 10, no. 1, pp. 178–185
Tuteja SK (2016) A Survey on classification algorithms for email spam filtering. Int J Eng Sci 6(5):5937–5940. https://doi.org/10.4010/2016.1440
Vashisht P, Gupta V (2015) Big data analytics techniques: a survey. In: Green Computing and Internet of Things (ICGCIoT), 2015 International Conference. IEEE, pp. 264–269. https://doi.org/10.1109/ICGCIoT.2015.7380470
Viviani M, Pasi G (2017) Quantifier guided aggregation for the veracity assessment of online reviews. Int J Intell Syst 32(5):481–501. https://doi.org/10.1002/int.21844
Wang XZ (2015) Learning from big data with uncertainty–editorial. J Intell Fuzzy Syst 28(5):2329–2330
Wang XZ, Xing HJ, Li Y, Hua Q, Dong CR, Pedrycz W (2015) A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE Trans Fuzzy Syst 23(5):1638–1654. https://doi.org/10.1109/TFUZZ.2014.2371479
Yadav SK, Bhushan M, Gupta S (2015) Multimodal sentiment analysis: Sentiment analysis using audiovisual format. In: 2015 2nd international conference on computing for Sustainable Global Development (INDIACom). IEEE, pp. 1415–1419
Yadav S, Dhingra K, Kaushik D (2016) Opinion mining using SentiFul. In: 3rd International Conference on Computing for Sustainable Global Development (INDIACom). IEEE, pp. 2406–2411
Ye J, Kumar S, Akoglu L (2016) Temporal opinion spam detection by multivariate indicative signals. In: Proceedings of the tenth international AAAI conference on web and social media. Association for the Advancement of Artificial Intelligence, pp. 743–746
Yen J, Langari R (1998) Fuzzy logic: intelligence, control, and information. Prentice-Hall, Inc., Upper Saddle River
Zheng X, Lin Z, Wang X, Lin KJ, Song M (2014) Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification. Knowl-Based Syst 61(1):29–47. https://doi.org/10.1016/j.knosys.2014.02.003
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dhingra, K., Yadav, S.K. Spam analysis of big reviews dataset using Fuzzy Ranking Evaluation Algorithm and Hadoop. Int. J. Mach. Learn. & Cyber. 10, 2143–2162 (2019). https://doi.org/10.1007/s13042-017-0768-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-017-0768-3