Abstract
A combination of natural language processing and topic modeling identifies topic terms a collection of documents. Latent Dirichlet Allocation (LDA) is an algorithm widely used to infer topics that the document belongs to, on the basis of words contains in it. This research applied LDA to identify topics automatically from academic documents as a way of validating the relevant concepts discussed in the curriculum literature. The experiment involved academic documents about Knowledge Building. We apply some techniques to prepare the data, after data training and validation. Topic modeling was used to identify the topic terms that can be used in the knowledge-building dialogue. Those concepts are meaningful to both the teacher or students because they provide a visualization of the content coherence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andone D, Mihaescu V, Vert S, Ternauciuc A, Vasiu R (2020) Students as OERs (open educational resources) co-creators. In: 2020 IEEE 20th International Conference on Advanced Learning Technologies (ICALT), pp 34–38, Los Alamitos, CA, USA. IEEE
de Jong F (2020) Knowledge in-(ter)-action: responsive learning as knowledge building. Aeres Hogeschool
Balyan R, McCarthy KS, McNamara DS (2017) Combining machine learning and natural language processing to assess literary text comprehension. In: Proceedings of the 10th International Conference on Educational Data Mining, EDM 2017, pp 244–249. International Educational Data Mining Society
Crain SP, Zhou K, Yang SH, Zha H (2012) Dimensionality reduction and topic modeling: from latent semantic indexing to latent dirichlet allocation and beyond. In: Mining Text Data, pp 129–161. Springer US, Boston, MA. https://doi.org/10.1007/978-1-4614-3223-4_5
Gómez MM, et al (2005) Text Mining using Comparison of Semantic Structures. Technical report
Cvitanic T, Lee B, Song HI, Fu K, Rosen D (2016) LDA v. LSA: a comparison of two computational text analysis tools for the functional categorization of patents. In: ICCBR, pp 41–50, Atlanta, Georgia
Chen W, Zhang X (2017) Research on text categorization model based on LDA—KNN. In: 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), pp 2719–2726, Chongqing, China. IEEE
Landauer TK, Foltz PW, Laham D (1998) Introduction to latent semantic analysis 25:259–284
Padó S, Lapata M (2007) Dependency-based construction of semantic space models. Comput Linguist 33(2):161–199. https://doi.org/10.1162/coli.2007.33.2.161
Chen Y, Yu B, Zhang X, Yu Y (2016) Topic modeling for evaluating students’ reflective writing: a case study of pre-service teachers’ journals. In: LAK’16
Wiley J et al (2017) Different approaches to assessing the quality of explanations following a multiple-document inquiry activity in science. Int J Artif Intell Educ 27(4):758–790. https://doi.org/10.1007/s40593-017-0138-z
Baroni M, Dinu D, Kruszewski G (2014) Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Association for Computational Linguistics, Baltimore, Maryland, USA
Yan E, Zhu Y (2018) Tracking word semantic change in biomedical literature. Int J Med Inf 109:76–86. https://doi.org/10.1016/j.ijmedinf.2017.11.006
De Jong F, et al (2017) Making a difference: analytics for quality knowledge-building conversations the overall focus of the symposium. In: International Conference on Computer Supported Collaborative Learning, pp 711–718, Philadelphia, Pennsylvania, USA
Sekiya T, Matsuda Y, Yamaguchi K (2015) Curriculum analysis of CS departments based on CS2013 by simplified, supervised LDA. In: Proceedings of the Fifth International Conference on Learning Analytics and Knowledge - LAK ’15, pp 330–339, New York, New York, USA. ACM Press
Blei DM, Ng AY, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Amami M, Pasi G, Stella F, Faiz R (2016) An LDA-based approach to scientific paper recommendation. In: Métais E, Meziane F, Saraee M, Sugumaran V, Vadera S (eds) Natural Language Processing and Information Systems. Springer International Publishing, Cham, pp 200–210. https://doi.org/10.1007/978-3-319-41754-7_17
Deveaud R, SanJuan E, Bellot P (2014) Accurate and effective latent concept modeling for ad hoc information retrieval. Document Numérique 17(1):61–84. https://doi.org/10.3166/dn.17.1.61-84
Risch J (2016) Detecting Twitter topics using Latent Dirichlet Allocation. Technical report, Uppsala University, Uppsala, Sweden
Bird S, Loper E (2006) NLTK: The Natural Language Toolkit. Technical report
Sievert C, Shirley KE (2014) LDAvis: a method for visualizing and interpreting topics. In: Workshop on Interactive Language Learning, Visualization, and Interfaces, pp 63–70, Baltimore, Maryland, USA. Association for Computational Linguistics. LNCS Homepage http://www.springer.com/lncs. Accessed 21 Nov 2016
Acknowledgements
The research team would like to thank Prof. Frank de Jong from University of Applied Sciences in the Netherlands, for contributing with the data. In addition thanks to UTPL, especially to Tecnologıas Avanzadas de la Web y SBC Group.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Veronica, SF., Sylvie, R., De Frank, J. (2023). Topic Modelling for Automatically Identification of Relevant Concepts Discussed in Academic Documents. In: Rocha, Á., Ferrás, C., Ibarra, W. (eds) Information Technology and Systems. ICITS 2023. Lecture Notes in Networks and Systems, vol 692. Springer, Cham. https://doi.org/10.1007/978-3-031-33261-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-33261-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33260-9
Online ISBN: 978-3-031-33261-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)