Skip to main content

COVID-19 Article Classification Using Word-Embedding and Extreme Learning Machine with Various Kernels

  • 381 Accesses

Part of the Lecture Notes in Networks and Systems book series (LNNS,volume 451)

Abstract

The impact of the COVID-19 pandemic on the socially networked world cannot be understated. Entire industries need the latest information from across the globe at the earliest possible. The business world needs to cope with a very volatile market due to the pandemic. Businesses need to be swift in sensing potential profit opportunities and be updated on the changing consumer demands. Technological advances and medical procedures that successfully deal with COVID-19 can help save lives on the other side of the world. This seamless passage of crucial information, now more than ever, is only possible through the networked world. There are on average 821 articles published online on COVID-19 a day. Manually going through around 800 articles in a day is not feasible and highly time-consuming. This can prevent the industries and businesses from getting to the relevant information in time. We can optimize this task by applying machine learning techniques. In this work, six different word embedding techniques have been applied to the title and content of the articles to get an n-dimensional vector. These vectors are inputs for article classification models that employ Extreme Learning Machine (ELM) with linear, sigmoid, polynomial, and radial basis function kernels to train these models. We have also used feature selection techniques like the Analysis of Variance (ANOVA) test and Principal Component Analysis (PCA) to optimize the models. These models help to filter out relevant articles and speed up the process of getting crucial information to stay ahead of the competition and be the first to exploit new market opportunities. The experimental results highlight that the usage of word embedding techniques, feature selection techniques, and different ELM kernels help improve the accuracy of article classification.

Keywords

  • COVID-19
  • Data imbalance
  • Feature selection
  • Word embedding

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-99619-2_7
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   219.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-99619-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   279.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    https://nlp.stanford.edu/projects/glove/ .

  2. 2.

    https://towardsdatascience.com/nlp-101-word2vec-skip-gram-and-cbow-93512ee2431.

  3. 3.

    https://towardsdatascience.com/nlp-101-word2vec-skip-gram-and-cbow-93512ee2431.

  4. 4.

    https://code.google.com/archive/p/word2vec/.

  5. 5.

    https://FastText.cc/ .

  6. 6.

    https://www.kaggle.com/jannalipenkova/covid19-public-media-dataset.

References

  1. Donthu, N., Gustafsson, A.: Effects of COVID-19 on business and research (2020)

    Google Scholar 

  2. Krishnamurthy, S.: The future of business education: a commentary in the shadow of the COVID-19 pandemic. J. Bus. Res. 117, 1–5 (2020)

    CrossRef  Google Scholar 

  3. Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, vol. 1, pp. 61–67. Stockholom, Sweden (1999)

    Google Scholar 

  4. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2(Feb), 419–444 (2002)

    Google Scholar 

  5. McCallum, A., Nigam, K., et al.: A comparison of event models for Naive Bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48. Citeseer (1998)

    Google Scholar 

  6. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint. arXiv:1607.01759 (2016)

  7. Lai, S., Liu, K., He, S., Zhao, J.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)

    CrossRef  Google Scholar 

  8. Cuevas, A., Febrero, M., Fraiman, R.: An anova test for functional data. Comput. Stat. Data Anal. 47(1), 111–122 (2004)

    MathSciNet  CrossRef  Google Scholar 

  9. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chem. Intell. Lab. Syst. 2(1–3), 37–52 (1987)

    CrossRef  Google Scholar 

  10. Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)

    CrossRef  Google Scholar 

  11. Bergman, S.: The Kernel Function and Conformal Mapping, vol. 5. American Mathematical Society (1970)

    Google Scholar 

  12. Zimmerman, D.W., Zumbo, B.D.: Relative power of the Wilcoxon test, the Friedman test, and repeated-measures anova on ranks. J. Exp. Educ. 62(1), 75–86 (1993)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanidhya Vijayvargiya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Vijayvargiya, S., Kumar, L., Malapati, A., Murthy, L.B., Krishna, A. (2022). COVID-19 Article Classification Using Word-Embedding and Extreme Learning Machine with Various Kernels. In: Barolli, L., Hussain, F., Enokido, T. (eds) Advanced Information Networking and Applications. AINA 2022. Lecture Notes in Networks and Systems, vol 451. Springer, Cham. https://doi.org/10.1007/978-3-030-99619-2_7

Download citation