Skip to main content
Book cover

Dark Web pp 171–201Cite as

Sentiment Analysis

  • Chapter
  • First Online:
  • 5773 Accesses

Part of the book series: Integrated Series in Information Systems ((ISIS,volume 30))

Abstract

The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study, the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of the key features. The proposed features and techniques are evaluated on US and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracy over 95% on the benchmark dataset and over 93% for both the US and Middle Eastern forums. Stylistic features significantly enhanced performance across all test beds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Abbasi, A., and Chen, H. 2005. Identification and comparison of extremist-group web forum messages using authorship analysis, IEEE Intelligent Systems 20, 5, 67–75.

    Article  Google Scholar 

  • Abbasi, A., and Chen, H. 2006. Visualizing authorship for identification, In Proceedings of the 4thIEEE International Conference on Intelligence and Security Informatics, San Diego, CA, 60–71.

    Google Scholar 

  • Abbasi, A., and Chen, H. 2007. Affect intensity analysis of Dark Web forums, In Proceedings of the 5thIEEE International Conference on Intelligence and Security Informatics, New Brunswick, NJ, 282–288.

    Google Scholar 

  • Abbasi, A., and Chen, H. 2008. Analysis of affect intensities in extremist group forums, In Terrorism Informatics, (Eds.) H. Chen, E. Reid, H. Chen, J. Sinai, A. Silke, B. Ganor, Springer-Verlag.

    Google Scholar 

  • Alexouda, G., and Papparrizos, K. 2001. A genetic algorithm approach to the product line design problem using the seller’s return criterion: An extensive comparative computational study, European Journal of Operational Research 134, 165–178.

    Article  MATH  Google Scholar 

  • Aggarwal, C.C., Orlin, J., and Tai, R.P. 1997. Optimized crossover for the independent set problem, Operations Research 45, 2, 226–234.

    Article  MathSciNet  MATH  Google Scholar 

  • Agrawal, R., Rajagopalan, S., Srikant, R. and Xu, Y. 2003. Mining newsgroups using networks arising from social behavior, In Proceedings of the 12thInternational World Wide Web Conference, 529–535.

    Google Scholar 

  • Balakrishnan, P.V., Gupta, R., and Jacob, V.S. 2004. Development of hybrid genetic algorithms for product line designs, IEEE Transactions on Systems, Man, and Cybernetics 34, 1, 468–483.

    Article  Google Scholar 

  • Beineke, P., Hastie, T., and Vaithyanathan, S. 2004. The sentimental factor: Improving review classification via human-provided information, In Proceedings of the 42ndAnnual Meeting of the Association for Computational Linguistics, 263.

    Google Scholar 

  • Burris, V., Smith, E. and Strahm, A. 2000. White supremacist networks on the Internet, Sociological Focus 33, 2, 215–235.

    Article  Google Scholar 

  • Chen, A. and Gey, F. 2002. Building an Arabic stemmer for information retrieval, In Proceedings of the 11thText Retrieval Conference, Gaithersburg, MD, 631–639.

    Google Scholar 

  • Chen, H. 2006. Intelligence and Security Informatics for International Security: Information Sharing and Data Mining, London, Springer Press.

    Book  Google Scholar 

  • Crilley, K. 2001. Information warfare: New battle fields, terrorists, propaganda, and the Internet, Aslib Proceedings 53, 7, 250–264.

    Article  Google Scholar 

  • Dash, M. and Liu, H. 1997. Feature selection for classification, Intelligent Data Analysis 1, 131–156.

    Article  Google Scholar 

  • Dave, K. Lawrence, S. and Pennock, D.M. 2003. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews, In Proceedings of the 12thInternational Conference on the World Wide Web, 519–528.

    Google Scholar 

  • De Vel, O., Anderson, A., Corney, M., and Mohay, G. 2001. Mining e-mail content for author identification forensics, ACM SIGMOD Record 30, 4, 55–64.

    Article  Google Scholar 

  • Donath, J. 1999. Identity and deception in the virtual community, In Kollock, P., and Smith, M. (Eds.), Communities in Cyberspace, London: Routledge, 27–58.

    Google Scholar 

  • Efron, M. 2004. Cultural orientations: Classifying subjective documents by cocitation analysis. In Proceedings of the AAAI Fall Symposium Series on Style and Meaning in Language, Art, Music, and Design, 41–48.

    Google Scholar 

  • Efron, M., Marchionini, G., and Zhiang, J. 2003. Implications of the recursive representation problem for automatic concept identification in on-line government information, In Proceedings of the ASIST SIG-CR Workshop.

    Google Scholar 

  • Fei, Z., Liu, J., and Wu, G. 2004. Sentiment classification using phrase patterns, In Proceedings of the 4thIEEE International Conference on Computer Information Technology, 1147–1152.

    Google Scholar 

  • Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification, Journal of Machine Learning Research 3, 1289–1305.

    MATH  Google Scholar 

  • Gamon, M. 2004. Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis, In Proceedings of the 20th International Conference on Computational Linguistics, 841.

    Google Scholar 

  • Glaser, J., Dixit, J., and Green, D. P. 2002. Studying hate crime with the Internet: What makes racists advocate racial violence? Journal of Social Issues 58, 1, 177–193.

    Article  Google Scholar 

  • Grefenstette, G.., Qu, Y., Shanahan, J. G.. and Evans, D. A. 2004. Coupling niche browsers and affect analysis for an opinion mining application, In Proceedings of the 12th International Conference Recherche d’Information Assistee par Ordinateur, 186–194.

    Google Scholar 

  • Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. 2002. Gene selection for cancer classification using support vector machines, Machine Learning46, 389–422.

    Article  MATH  Google Scholar 

  • Guyon, I., and Elisseeff, A. 2003. An introduction to variable and feature selection, Journal of Machine Learning Research 3, 1157–1182.

    MATH  Google Scholar 

  • Hatzivassiloglou, V. and McKeown, K. R. 1997. Predicting the semantic orientation of adjectives, In Proceedings of the 35thAnnual Meeting of the Association of Computational Linguistics, 174–181.

    Google Scholar 

  • Hearst, M. A. 1992. Direction-based text interpretation as an information access refinement. In P. Jacobs (Ed.), Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval. Mahwah, NJ, Lawrence Erlbaum Associates.

    Google Scholar 

  • Henley, N. M., Miller, M. D., Beazley, J. A., Nguyen, D. N., Kaminsky, D., and Sanders, R. 2002. Frequency and specificity of referents to violence in news reports of anti-gay attacks, Discourse and Society 13, 1, 75–104.

    Article  Google Scholar 

  • Herring, S., Job-Sluder, K., Scheckler, R., and Barab, S. 2002. Searching for safety online: Managing “trolling” in a feminist forum, The Information Society 18, 5, 371–384.

    Article  Google Scholar 

  • Herring, S. and Paolillo, J. C. 2006. Gender and genre variations in weblogs, Journal of Sociolinguistics, 10, 4, 439.

    Article  Google Scholar 

  • Holland, J. 1975.Adaptation in natural and artificial systems. Ann Arbor, University of Michigan Press.

    Google Scholar 

  • Hu, M. and Liu, B. 2004. Mining and summarizing customer reviews. In Proceedings of the ACM SIGKDD International Conference, 168–177.

    Google Scholar 

  • Jain, A. and Zongker, D. 1997. Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 2, 153–158.

    Article  Google Scholar 

  • Jiang, M., Jensen, E., Beitzel, S. and Argamon, S. 2004. Choosing the right bigrams for information retrieval, In Proceedings of the Meeting of the International Federation of Classification Societies.

    Chapter  Google Scholar 

  • Juola, P. and Baayen, H. 2005. A controlled-corpus experiment in authorship identification by cross-entropy, Literary and Linguistic Computing 20, 59–67.

    Article  Google Scholar 

  • Kanayama, H., Nasukawa, T., and Watanabe, H. 2004. Deeper sentiment analysis using machine translation technology, In Proceedings of the 20th International Conference on Computational Linguistics, 494–500.

    Google Scholar 

  • Kaplan, J., and Weinberg, L. 1998.The Emergence of a Euro-American Radical Right., New Brunswick, NJ, Rutgers University Press.

    Google Scholar 

  • Kim, S. and Hovy, E. 2004. Determining the sentiment of opinions, In Proceedings of the 20th International Conference on Computational Linguistics, 1367–1373.

    Google Scholar 

  • Kjell, B., Woods, W.A., and Frieder, O. 1994. Discrimination of authorship using visualization, Information Processing and Management 30, 1, 141–150.

    Article  Google Scholar 

  • Koppel, M., Argamon, S., and Shimoni, A.R. 2002. Automatically categorizing written texts by author gender, Literary and Linguistic Computing 17, 4, 401–412.

    Article  Google Scholar 

  • Koppel, M. and Schler, J. 2003. Exploiting stylistic idiosyncrasies for authorship attribution, In Proceedings of the IJCAI Workshop on Computational Approaches to Style Analysis and Synthesis, Acapulco, Mexico.

    Google Scholar 

  • Levine, D. 1996. Application of a hybrid genetic algorithm to airline crew scheduling, Computers and Operations Research 23, 6, 547–558.

    Article  MATH  Google Scholar 

  • Leets, L. 2001. Responses to Internet hate sites: Is speech too free in cyberspace? Communication Law and Policy 6, 2, 287–317.

    Article  Google Scholar 

  • Li, J., Zheng, R., and Chen, H. 2006. From fingerprint to writeprint, Communications of the ACM 49, 4, 76–82.

    Article  Google Scholar 

  • Li, J. Su, H., Chen, H., and Futscher, B. 2007. Optimal search-based gene subset selection for gene array cancer classification, IEEE Transactions on Information Technology in Biomedicine 11, 4, 398–405.

    Article  Google Scholar 

  • Liu, B., Hu, M., and Cheng, J. 2005. Opinion observer: Analyzing and comparing opinions on the web, In Proceedings of the 14th International World Wide Web Conference, 342–351.

    Google Scholar 

  • Martin, J. R. and White, P.R.R. 2005. The Language of Evaluation: Appraisal in English, London, Palgrave.

    Book  Google Scholar 

  • Mishne, G. 2005. Experiments with mood classification, In Proceedings of the 1stWorkshop on Stylistic Analysis of Text for Information Access, Salvador, Brazil.

    Google Scholar 

  • Mitra, M., Buckley, C., Singhal, A. and Cardie, C. 1997. An analysis of statistical and syntactic phrases, In Proceedings of the 5th International Conference Recherche d’Information Assistee par Ordinateur, Montreal, Canada, 200–214.

    Google Scholar 

  • Mladenic, D., Brank, J., Grobelnik, M., and Milic-Frayling, N. 2004. Feature selection using linear classifier weights: Interaction with classification models, In Proceedings of the 27thACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, 234–241.

    Google Scholar 

  • Morinaga, S., Yamanishi, K., Tateishi, K., and Fukushima, T. 2002. Mining product reputations on the web, In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, 341–349.

    Google Scholar 

  • Mullen, T., and Collier, N. 2004. Sentiment analysis using support vector machines with diverse information sources, In Proceedings of the Empirical Methods in Natural Language Processing, Barcelona, Spain, 412–418.

    Google Scholar 

  • Nasukawa, T., and Yi, J. 2003. Sentiment analysis: Capturing favorability using natural language processing, In Proceedings of the 2nd International Conference on Knowledge Capture, Sanibel Island, Florida, 70–77.

    Google Scholar 

  • Nigam, K., and Hurst, M. 2004. Towards a robust metric of opinion, In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text.

    Google Scholar 

  • Oliveira, L.S., Sabourin, R., Bortolozzi, F., and Suen, C.Y. 2002. Feature selection using multi-objective genetic algorithms for handwritten digit recognition, In Proceedings of the 16th International Conference on Pattern Recognition, 568–571.

    Google Scholar 

  • Pang, B., Lee, L., and Vaithyanathain, S. 2002. Thumbs up? Sentiment classification using machine learning techniques, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 79–86.

    Google Scholar 

  • Pang, B., and Lee, L. 2004. A sentimental education: Sentimental analysis using subjectivity summarization based on minimum cuts, In Proceedings of the 42ndAnnual Meeting of the Association for Computational Linguistics, 271–278.

    Google Scholar 

  • Peng, F., Schuurmans, D., Keselj, V., and Wang, S. 2003. Automated authorship attribution with character level language models. Paper presented at the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003).

    Google Scholar 

  • Picard, R. W. 1997. Affective Computing, Cambridge, MA, MIT Press.

    Google Scholar 

  • Platt, J. 1999. Fast training on SVMs using sequential minimal optimization, In Scholkopf, B., Burges, C., and Smola, A. (Ed.), Advances in Kernel Methods: Support Vector Learning, Cambridge, MA, MIT Press, 185–208.

    Google Scholar 

  • Quinlan, J. R. 1986. Induction of decision trees, Machine Learning 1, 1, 81–106.

    Google Scholar 

  • Riloff, E., Wiebe, J., and Wilson, T. 2003. Learning subjective nouns using extraction pattern bootstrapping, In Proceedings of the Seventh Conference on Natural Language Learning Conference, Edmonton, Canada, 25–32.

    Google Scholar 

  • Robinson, L. 2005. Debating the events of September 11th: Discursive and interactional dynamics in three online for a, Journal of Computer-Mediated Communication 10, 4.

    Article  Google Scholar 

  • Schafer, J. 2002. Spinning the web of hate: Web-based hate propagation by extremist organizations, Journal of Criminal Justice and Popular Culture 9, 2, 69–88.

    Google Scholar 

  • Schler, J., Koppel, M., Argamon, S., and Pennebaker, J. 2006. Effects of age and gender on blogging, In Proceedings of the AAAI Spring Symposium Computational Approaches to Analyzing Weblogs, Menlo Park, CA, 191–197.

    Google Scholar 

  • Sebastiani, F. 2002. Machine learning in automated text categorization, ACM Computing Surveys 34, 1, 1–47.

    Article  MathSciNet  Google Scholar 

  • Shannon, C. E. 1948. A mathematical theory of communication, Bell System Technical Journal 27, 4, 379–423.

    Article  MathSciNet  Google Scholar 

  • Siedlecki, W. and Sklansky, J. 1989. A note on genetic algorithms for large-scale feature selection, Pattern Recognition Letters 10, 5, 335–347.

    Article  MATH  Google Scholar 

  • Stamatatos, E., Fakotakis, N., & Kokkinakis, G. 2001. Computer-based authorship attribution without lexical measures. Computers and the Humanities 35, 2, 193–214.

    Article  MATH  Google Scholar 

  • Subasic, P., and Huettner, A. 2001. Affect analysis of text using fuzzy semantic typing, IEEE Transactions on Fuzzy Systems 9, 4, 483–496.

    Article  Google Scholar 

  • Tong, R. 2001. An operational system for detecting and tracking opinions in on-line discussion, In Proceedings of the ACM SIGIR Workshop on Operational Text Classification. 1–6.

    Google Scholar 

  • Turney, P. D. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews, In Proceedings of the 40th Annual Meetings of the Association for Computational Linguistics, Philadelphia, PA, 417–424.

    Google Scholar 

  • Turney, P, D., and Littman, M, L. 2003. Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems 21, 4, 315–346.

    Article  Google Scholar 

  • Vafaie, H. and Imam, I. F. 1994. Feature selection methods: Genetic algorithms vs. greedy-like search, In Proceedings of the International Conference on Fuzzy and Intelligent Control Systems, 1994.

    Google Scholar 

  • Viegas, F.B., and Smith, M. 2004. Newsgroup crowds and AuthorLines: Visualizing the activity of individuals in conversational cyberspaces, In Proceedings of the 37th Hawaii International Conference on System Sciences, Hawaii, USA.

    Google Scholar 

  • Whitelaw, C., Garg, N., and Argamon, S. 2005. Using appraisal groups for sentiment analysis, In Proceedings of the 14thACM Conference on Information and Knowledge Management, 625–631.

    Google Scholar 

  • Wiebe, J. 1994. Tracking point of view in narrative, Computational Linguistics 20, 2, 233–287.

    Google Scholar 

  • Wiebe, J., Wilson, T., and Bell, M. 2001. Identifying collocations for recognizing opinions, In Proceedings of the ACL/EACL Workshop on Collocation, Toulouse, France.

    Google Scholar 

  • Wiebe, J., Wilson, T., Bruce, R., Bell, M., and Martin, M. 2004. Learning subjective language, Computational Linguistics 30, 3, 277–308.

    Article  Google Scholar 

  • Wiebe, J., Wilson, T., and Cardie, C. 2005. Annotating expressions of opinions and emotions in language, Language Resources and Evaluation 1, 2, 165–210.

    Article  Google Scholar 

  • Witten, I. H., and Frank, E. 2005.Data Mining: Practical machine learning tools and techniques, 2nd Edition,, San Francisco, CA, Morgan Kaufmann.

    MATH  Google Scholar 

  • Wilson, T., Wiebe, J., and Hoffman, P. 2005. Recognizing contextual polarity in phrase-level sentiment analysis, In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, British Columbia, Canada, 347–354.

    Google Scholar 

  • Yang, Y. and Pederson, J. O. 1997. A comparative study on feature selection in text categorization, In Proceedings of the 14thInternational Conference on Machine Learning, 412–420.

    Google Scholar 

  • Yang, J. and Honavar, V. 1998. Feature subset selection using a genetic algorithm, IEEE Intelligent Systems 13, 2, 44–49.

    Article  Google Scholar 

  • Yi, J., Nasukawa, T., Bunescu, R. and Niblack, W. 2003. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques, In Proceedings of the 3 rd IEEE International Conference on Data Mining, 427–434.

    Google Scholar 

  • Yu, H. and Hatzivassiloglou, V. 2003. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 129–136.

    Google Scholar 

  • Zheng, R., Li, J., Huang, Z., and Chen, H. 2006. A framework for authorship analysis of online messages: Writing-style features and techniques, Journal of the American Society for Information Science and Technology 57, 3, 378–393.

    Article  Google Scholar 

  • Zhou, Y., Reid, E., Qin, J., Chen, H., and Lai, G. 2005. U.S. extremist groups on the web: Link and content analysis, IEEE Intelligent Systems 20, 5, 44–51.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hsinchun Chen .

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Chen, H. (2012). Sentiment Analysis. In: Dark Web. Integrated Series in Information Systems, vol 30. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1557-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1557-2_10

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1556-5

  • Online ISBN: 978-1-4614-1557-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics