Topic Modelling for Automatically Identification of Relevant Concepts Discussed in Academic Documents

Veronica, Segarra-Faggioni; Sylvie, Ratté; De Frank, Jong

doi:10.1007/978-3-031-33261-6_8

Segarra-Faggioni Veronica¹²,
Ratté Sylvie¹³ &
Jong De Frank¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 692))

Included in the following conference series:

International Conference on Information Technology & Systems

175 Accesses

Abstract

A combination of natural language processing and topic modeling identifies topic terms a collection of documents. Latent Dirichlet Allocation (LDA) is an algorithm widely used to infer topics that the document belongs to, on the basis of words contains in it. This research applied LDA to identify topics automatically from academic documents as a way of validating the relevant concepts discussed in the curriculum literature. The experiment involved academic documents about Knowledge Building. We apply some techniques to prepare the data, after data training and validation. Topic modeling was used to identify the topic terms that can be used in the knowledge-building dialogue. Those concepts are meaningful to both the teacher or students because they provide a visualization of the content coherence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Andone D, Mihaescu V, Vert S, Ternauciuc A, Vasiu R (2020) Students as OERs (open educational resources) co-creators. In: 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT), pp 34–38, Los Alamitos, CA, USA. IEEE
Google Scholar
de Jong F (2020) Knowledge in-(ter)-action: responsive learning as knowledge building. Aeres Hogeschool
Google Scholar
Balyan R, McCarthy KS, McNamara DS (2017) Combining machine learning and natural language processing to assess literary text comprehension. In: Proceedings of the 10th International Conference on Educational Data Mining, EDM 2017, pp 244–249. International Educational Data Mining Society
Google Scholar
Crain SP, Zhou K, Yang SH, Zha H (2012) Dimensionality reduction and topic modeling: from latent semantic indexing to latent dirichlet allocation and beyond. In: Mining Text Data, pp 129–161. Springer US, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_5
Gómez MM, et al (2005) Text Mining using Comparison of Semantic Structures. Technical report
Google Scholar
Cvitanic T, Lee B, Song HI, Fu K, Rosen D (2016) LDA v. LSA: a comparison of two computational text analysis tools for the functional categorization of patents. In: ICCBR, pp 41–50, Atlanta, Georgia
Google Scholar
Chen W, Zhang X (2017) Research on text categorization model based on LDA—KNN. In: 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp 2719–2726, Chongqing, China. IEEE
Google Scholar
Landauer TK, Foltz PW, Laham D (1998) Introduction to latent semantic analysis 25:259–284
Google Scholar
Padó S, Lapata M (2007) Dependency-based construction of semantic space models. Comput Linguist 33(2):161–199. https://doi.org/10.1162/coli.2007.33.2.161
Article Google Scholar
Chen Y, Yu B, Zhang X, Yu Y (2016) Topic modeling for evaluating students’ reflective writing: a case study of pre-service teachers’ journals. In: LAK’16
Google Scholar
Wiley J et al (2017) Different approaches to assessing the quality of explanations following a multiple-document inquiry activity in science. Int J Artif Intell Educ 27(4):758–790. https://doi.org/10.1007/s40593-017-0138-z
Article Google Scholar
Baroni M, Dinu D, Kruszewski G (2014) Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Association for Computational Linguistics, Baltimore, Maryland, USA
Google Scholar
Yan E, Zhu Y (2018) Tracking word semantic change in biomedical literature. Int J Med Inf 109:76–86. https://doi.org/10.1016/j.ijmedinf.2017.11.006
Article Google Scholar
De Jong F, et al (2017) Making a difference: analytics for quality knowledge-building conversations the overall focus of the symposium. In: International Conference on Computer Supported Collaborative Learning, pp 711–718, Philadelphia, Pennsylvania, USA
Google Scholar
Sekiya T, Matsuda Y, Yamaguchi K (2015) Curriculum analysis of CS departments based on CS2013 by simplified, supervised LDA. In: Proceedings of the Fifth International Conference on Learning Analytics and Knowledge - LAK ’15, pp 330–339, New York, New York, USA. ACM Press
Google Scholar
Blei DM, Ng AY, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Google Scholar
Amami M, Pasi G, Stella F, Faiz R (2016) An LDA-based approach to scientific paper recommendation. In: Métais E, Meziane F, Saraee M, Sugumaran V, Vadera S (eds) Natural Language Processing and Information Systems. Springer International Publishing, Cham, pp 200–210. https://doi.org/10.1007/978-3-319-41754-7_17
Chapter Google Scholar
Deveaud R, SanJuan E, Bellot P (2014) Accurate and effective latent concept modeling for ad hoc information retrieval. Document Numérique 17(1):61–84. https://doi.org/10.3166/dn.17.1.61-84
Article Google Scholar
Risch J (2016) Detecting Twitter topics using Latent Dirichlet Allocation. Technical report, Uppsala University, Uppsala, Sweden
Google Scholar
Bird S, Loper E (2006) NLTK: The Natural Language Toolkit. Technical report
Google Scholar
Sievert C, Shirley KE (2014) LDAvis: a method for visualizing and interpreting topics. In: Workshop on Interactive Language Learning, Visualization, and Interfaces, pp 63–70, Baltimore, Maryland, USA. Association for Computational Linguistics. LNCS Homepage http://www.springer.com/lncs. Accessed 21 Nov 2016

Download references

Acknowledgements

The research team would like to thank Prof. Frank de Jong from University of Applied Sciences in the Netherlands, for contributing with the data. In addition thanks to UTPL, especially to Tecnologıas Avanzadas de la Web y SBC Group.

Author information

Authors and Affiliations

Universidad Tecnica Particular de Loja, Loja, Ecuador
Segarra-Faggioni Veronica
Ecole de Technologie Superieure, Montreal, QC, Canada
Ratté Sylvie
Aeres Applied University Wageningen, Wageningen, The Netherlands
Jong De Frank

Authors

Segarra-Faggioni Veronica
View author publications
You can also search for this author in PubMed Google Scholar
Ratté Sylvie
View author publications
You can also search for this author in PubMed Google Scholar
Jong De Frank
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Segarra-Faggioni Veronica .

Editor information

Editors and Affiliations

ISEG, University of Lisbon, Lisbon, Portugal
Álvaro Rocha
Facultade de Geografía e Historia, University of Santiago de Compostela, Santiago de Compostela, Spain
Carlos Ferrás
Departamento de Informática, Universidad Nacional de San Antonio Abad del Cusco, Cusco, Peru
Waldo Ibarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Veronica, SF., Sylvie, R., De Frank, J. (2023). Topic Modelling for Automatically Identification of Relevant Concepts Discussed in Academic Documents. In: Rocha, Á., Ferrás, C., Ibarra, W. (eds) Information Technology and Systems. ICITS 2023. Lecture Notes in Networks and Systems, vol 692. Springer, Cham. https://doi.org/10.1007/978-3-031-33261-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-33261-6_8
Published: 20 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33260-9
Online ISBN: 978-3-031-33261-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics