NgramPOS: a bigram-based linguistic and statistical feature process model for unstructured text classification

Foroozan Yazdani, Sepideh; Tan, Zhiyuan; Kakavand, Mohsen; Mustapha, Aida

doi:10.1007/s11276-018-01909-0

NgramPOS: a bigram-based linguistic and statistical feature process model for unstructured text classification

Published: 11 December 2018

Volume 28, pages 1251–1261, (2022)
Cite this article

Wireless Networks Aims and scope Submit manuscript

Sepideh Foroozan Yazdani¹,
Zhiyuan Tan²,
Mohsen Kakavand³ &
…
Aida Mustapha⁴

407 Accesses
1 Citation
Explore all metrics

Abstract

Research in financial domain has shown that sentiment aspects of stock news have a profound impact on volume trades, volatility, stock prices and firm earnings. In-depth analysis of stock news is now sourced from financial reviews by various social networking and marketing sites to help improve decision making. Nonetheless, such reviews are in the form of unstructured text, which requires natural language processing (NLP) in order to extract the sentiments. Accordingly, in this study we investigate the use of NLP tasks in effort to improve the performance of sentiment classification in evaluating the information content of financial news as an instrument in investment decision support system. At present, feature extraction approach is mainly based on the occurrence frequency of words. Therefore low-frequency linguistic features that could be critical in sentiment classification are typically ignored. In this research, we attempt to improve current sentiment analysis approaches for financial news classification by focusing on low-frequency but informative linguistic expressions. Our proposed combination of low and high-frequency linguistic expressions contributes a novel set of features for sentiment classification. The experimental results show that an optimal Ngram feature selection (combination of optimal unigram and bigram features) enhances sentiment classification accuracy as compared to other types of feature sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Notes

References

Fama, E. F. (1965). The behavior of stock-market prices. Journal of Business, 38(1), 34–105.
Article Google Scholar
Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62(3), 1139–1168.
Article Google Scholar
Li, F. (2010). Textual analysis of corporate disclosures: A survey of the literature. Accounting literature, 29, 143–165.
Google Scholar
Hagenau, M., Liebmann, M., & Neumann, D. (2013). Automated news reading: Stock price prediction based on financial news using context-specific features. Decision Support Systems, 55, 685–697.
Article Google Scholar
Khadjeh Nassirtoussi, A., Aghabozorgi, S., Ying Wah, T., & Ngo, D. C. L. (2015). Text mining of news-headlines for FOREX market prediction: A multi-layer dimension reduction algorithm with semantics and sentiment. Expert Systems With Applications, 42(1), 306–324.
Article Google Scholar
Koppel, M., & Shtrimberg, I. (2006). Good news or bad news? Let the market decide. Computing Attitude and Affect in Text: Theory and Applications, 20, 297–301.
Google Scholar
Groth, S. S., & Muntermann, J. (2011). An intraday market risk management approach based on textual analysis. Decision Support Systems, 50(4), 680–691.
Article Google Scholar
Yu, Y., Duan, W., & Cao, Q. (2013). The impact of social and conventional media on firm equity value: A sentiment analysis approach. Decision Support Systems, 55(4), 919–926.
Article Google Scholar
Généreux, M., Poibeau, T., & Koppel, M. (2011). Sentiment analysis using automatically labelled financial news items. In Affective computing and sentiment analysis (Vol. 45, no. 2, pp. 101–114). The series Text, Speech and Language Technology, Springer.
Zhai, J. J., Cohen, N., & Atreya, A. (2011). CS224N final project: Sentiment analysis of news articles for financial signal prediction (pp. 1–8). https://nlp.stanford.edu/courses/cs224n/2011/reports/nccohen-aatreya-jameszjj.pdf.
Pestov, V. (2013). Is the k-NN classifier in high dimensions affected by the curse of dimensionality? Computers & Mathematics with Applications, 65(10), 1427–1437.
Article MathSciNet Google Scholar
Joshi, K., Bharathi, H. N., & Jyothi, R. (2016). Stock trend prediction using news sentiment analysis. CoRR. abs/1607.0.
Chen, M. Y., & Chen, T. H. (2017). Modeling public mood and emotion: Blog and news sentiment and socio-economic phenomena. Future Generation Computing Systems. https://doi.org/10.1016/j.future.2017.10.028.
Article Google Scholar
Chan, S. W. K., & Chong, M. W. C. (2017). Sentiment analysis in financial texts. Decision Support Systems, 94(2017), 53–64.
Article Google Scholar
Mayne, A. (2010). Sentiment analysis for financial news. Sydney: University of Sydney.
Google Scholar
Foroozan Yazdani, S., Murad, M. A. A., Sharef, N. M., Singh, Y. P., & Latiff, A. R. A. (2016). Sentiment classification of financial news using statistical features. International Journal of Pattern Recognition and Artificial Intelligence, 31(3), 34.
Google Scholar
Pederson, T. (2001). A decision tree of bigrams is an accurate predictor of word sence. In Proceeding of the second NAACL (pp. 79–86).
Dave, K., Way, I., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews, In Proceedings of the 12th International World Wide Web Conference, Budapest, (pp. 519–528).
Mejova, Y., & Srinivasan, P. (2011). Exploring feature definition and selection for sentiment classifiers. In Fifth international AAAI conference on weblogs and social media (pp. 546–549).
Lan, M. L. M., Tan, C. L. T. C. L., Su, J. S. J., & Lu, Y. L. Y. (2009). Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(4), 721–735.
Article Google Scholar
Pham Xuan, N., & Le Quang, H. (2014). A new improved term weighting scheme for text categorization. Advances in Intelligent Systems and Computing, 271, 261–270.
Article Google Scholar
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34, 1–47.
Article Google Scholar
Petrişor, A.-I., Ianoş, I., Iurea, D., & Văidianu, M.-N. (2012). Applications of principal component analysis integrated with GIS. Procedia Environmental Sciences, 14, 247–256.
Article Google Scholar
Alpaydin, E. (2010). Introduction to machine learning, 2nd Edn. The MIT Press.
Khadjeh Nassirtoussi, A., Aghabozorgi, S., Ying Wah, T., & Ngo, D. C. L. (2014). Text mining for market prediction: A systematic review. Expert Systems with Applications, 41(16), 7653–7670.
Article Google Scholar
Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Machine Learning: ECML-98, 1398(2), 137–142.
Google Scholar
Hajek, P., & Henriques, R. (2017). Mining corporate annual reports for intelligent detection of financial statement fraud—A comparative study of machine learning methods. Knowledge-Based Systems, 128, 139-152.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
MATH Google Scholar
Schölkopf, B., & Smola, A. (2005). Support vector machines and kernel algorithms (pp. 1–22).
Ooi, H. S., Schneider, G., Lim, T., Chan, Y., Eisenhaber, B., & Eisenhaber, F. (2010). Data mining techniques for the life sciences (vol. 609, pp 327–348). New York: Humana Press and Springer Bussiness Media.
Hsu, C., Chang, C., & Lin, C. (2010). A practical guide to support vector classification. Bioinformatics, 1(1), 1–16.
MathSciNet Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York: Springer.
Book Google Scholar
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence, 14(12), 1137–1143.
Google Scholar
Taylor, A., Marcus, M., & Santorini, B. (2003). The Penn Treebank: an overview. Treebanks 5–22.
Benamara, F., Cesarano, C., & Reforgiato, D. (2007). Sentiment analysis: Adjectives and adverbs are better than adjectives alone. In Proceedings of the International Conference on Weblogs and Social Media(ICWSM), (pp. 1–4).
Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association Computational Linguistics (pp. 417–424).
Hatzivassiloglou, V., McKeown, K. R., Pang, B., Lee, L., Vaithyanathan, S., Ku, L.-W., et al. (2009). Predicting the semantic orientation of adjectives. ACM Transactions on Information Systems, 21(4), 315–346.
Google Scholar
Han, J., & Kamber, M. (2006). Data mining (concepts and techniques). Burlington: Elsevier (Morgan Kaufmann).
MATH Google Scholar

Download references

Acknowledgements

This work is supported in partial by Universiti Tun Hussein Onn Malaysia.

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, Universiti Putra Malaysia, Seri Kembangan, Malaysia
Sepideh Foroozan Yazdani
School of Computing, Edinburgh Napier University, Edinburgh, UK
Zhiyuan Tan
School of Science and Technology, Sunway University, Subang Jaya, Malaysia
Mohsen Kakavand
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Malaysia
Aida Mustapha

Authors

Sepideh Foroozan Yazdani
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyuan Tan
View author publications
You can also search for this author in PubMed Google Scholar
Mohsen Kakavand
View author publications
You can also search for this author in PubMed Google Scholar
Aida Mustapha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sepideh Foroozan Yazdani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Foroozan Yazdani, S., Tan, Z., Kakavand, M. et al. NgramPOS: a bigram-based linguistic and statistical feature process model for unstructured text classification. Wireless Netw 28, 1251–1261 (2022). https://doi.org/10.1007/s11276-018-01909-0

Download citation

Published: 11 December 2018
Issue Date: April 2022
DOI: https://doi.org/10.1007/s11276-018-01909-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NgramPOS: a bigram-based linguistic and statistical feature process model for unstructured text classification

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

Natural language processing: state of the art, current trends and challenges

A review on sentiment analysis and emotion detection from text

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

NgramPOS: a bigram-based linguistic and statistical feature process model for unstructured text classification

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

Natural language processing: state of the art, current trends and challenges

A review on sentiment analysis and emotion detection from text

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation