Skip to main content

Topic Modelling for Automatically Identification of Relevant Concepts Discussed in Academic Documents

  • Conference paper
  • First Online:
Information Technology and Systems (ICITS 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 692))

Included in the following conference series:

  • 175 Accesses

Abstract

A combination of natural language processing and topic modeling identifies topic terms a collection of documents. Latent Dirichlet Allocation (LDA) is an algorithm widely used to infer topics that the document belongs to, on the basis of words contains in it. This research applied LDA to identify topics automatically from academic documents as a way of validating the relevant concepts discussed in the curriculum literature. The experiment involved academic documents about Knowledge Building. We apply some techniques to prepare the data, after data training and validation. Topic modeling was used to identify the topic terms that can be used in the knowledge-building dialogue. Those concepts are meaningful to both the teacher or students because they provide a visualization of the content coherence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://pypi.org/project/gensim/

  2. 2.

    https://pypi.org/project/pyLDAvis/

References

  1. Andone D, Mihaescu V, Vert S, Ternauciuc A, Vasiu R (2020) Students as OERs (open educational resources) co-creators. In: 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT), pp 34–38, Los Alamitos, CA, USA. IEEE

    Google Scholar 

  2. de Jong F (2020) Knowledge in-(ter)-action: responsive learning as knowledge building. Aeres Hogeschool

    Google Scholar 

  3. Balyan R, McCarthy KS, McNamara DS (2017) Combining machine learning and natural language processing to assess literary text comprehension. In: Proceedings of the 10th International Conference on Educational Data Mining, EDM 2017, pp 244–249. International Educational Data Mining Society

    Google Scholar 

  4. Crain SP, Zhou K, Yang SH, Zha H (2012) Dimensionality reduction and topic modeling: from latent semantic indexing to latent dirichlet allocation and beyond. In: Mining Text Data, pp 129–161. Springer US, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_5

  5. Gómez MM, et al (2005) Text Mining using Comparison of Semantic Structures. Technical report

    Google Scholar 

  6. Cvitanic T, Lee B, Song HI, Fu K, Rosen D (2016) LDA v. LSA: a comparison of two computational text analysis tools for the functional categorization of patents. In: ICCBR, pp 41–50, Atlanta, Georgia

    Google Scholar 

  7. Chen W, Zhang X (2017) Research on text categorization model based on LDA—KNN. In: 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp 2719–2726, Chongqing, China. IEEE

    Google Scholar 

  8. Landauer TK, Foltz PW, Laham D (1998) Introduction to latent semantic analysis 25:259–284

    Google Scholar 

  9. Padó S, Lapata M (2007) Dependency-based construction of semantic space models. Comput Linguist 33(2):161–199. https://doi.org/10.1162/coli.2007.33.2.161

    Article  Google Scholar 

  10. Chen Y, Yu B, Zhang X, Yu Y (2016) Topic modeling for evaluating students’ reflective writing: a case study of pre-service teachers’ journals. In: LAK’16

    Google Scholar 

  11. Wiley J et al (2017) Different approaches to assessing the quality of explanations following a multiple-document inquiry activity in science. Int J Artif Intell Educ 27(4):758–790. https://doi.org/10.1007/s40593-017-0138-z

    Article  Google Scholar 

  12. Baroni M, Dinu D, Kruszewski G (2014) Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Association for Computational Linguistics, Baltimore, Maryland, USA

    Google Scholar 

  13. Yan E, Zhu Y (2018) Tracking word semantic change in biomedical literature. Int J Med Inf 109:76–86. https://doi.org/10.1016/j.ijmedinf.2017.11.006

    Article  Google Scholar 

  14. De Jong F, et al (2017) Making a difference: analytics for quality knowledge-building conversations the overall focus of the symposium. In: International Conference on Computer Supported Collaborative Learning, pp 711–718, Philadelphia, Pennsylvania, USA

    Google Scholar 

  15. Sekiya T, Matsuda Y, Yamaguchi K (2015) Curriculum analysis of CS departments based on CS2013 by simplified, supervised LDA. In: Proceedings of the Fifth International Conference on Learning Analytics and Knowledge - LAK ’15, pp 330–339, New York, New York, USA. ACM Press

    Google Scholar 

  16. Blei DM, Ng AY, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  17. Amami M, Pasi G, Stella F, Faiz R (2016) An LDA-based approach to scientific paper recommendation. In: Métais E, Meziane F, Saraee M, Sugumaran V, Vadera S (eds) Natural Language Processing and Information Systems. Springer International Publishing, Cham, pp 200–210. https://doi.org/10.1007/978-3-319-41754-7_17

    Chapter  Google Scholar 

  18. Deveaud R, SanJuan E, Bellot P (2014) Accurate and effective latent concept modeling for ad hoc information retrieval. Document Numérique 17(1):61–84. https://doi.org/10.3166/dn.17.1.61-84

    Article  Google Scholar 

  19. Risch J (2016) Detecting Twitter topics using Latent Dirichlet Allocation. Technical report, Uppsala University, Uppsala, Sweden

    Google Scholar 

  20. Bird S, Loper E (2006) NLTK: The Natural Language Toolkit. Technical report

    Google Scholar 

  21. Sievert C, Shirley KE (2014) LDAvis: a method for visualizing and interpreting topics. In: Workshop on Interactive Language Learning, Visualization, and Interfaces, pp 63–70, Baltimore, Maryland, USA. Association for Computational Linguistics. LNCS Homepage http://www.springer.com/lncs. Accessed 21 Nov 2016

Download references

Acknowledgements

The research team would like to thank Prof. Frank de Jong from University of Applied Sciences in the Netherlands, for contributing with the data. In addition thanks to UTPL, especially to Tecnologıas Avanzadas de la Web y SBC Group.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Segarra-Faggioni Veronica .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Veronica, SF., Sylvie, R., De Frank, J. (2023). Topic Modelling for Automatically Identification of Relevant Concepts Discussed in Academic Documents. In: Rocha, Á., Ferrás, C., Ibarra, W. (eds) Information Technology and Systems. ICITS 2023. Lecture Notes in Networks and Systems, vol 692. Springer, Cham. https://doi.org/10.1007/978-3-031-33261-6_8

Download citation

Publish with us

Policies and ethics