Abstract
The grow of data on the Internet has brought to people many information and it also opened some important problem in Information retrieval…Along with it, some search engines have developed for user’s purpose. User can retrieve information by content, keyword or anything what they need. However, data on the Internet is too huge, the results feedback is often millions or hundreds millions for each query. Therefore, with the narrow field, we will meet a difficult to find related information, especially technical information that contain formulas. In this paper, we present a method for building Vietnamese technical text based on topic modeling and MathML for indexing. System has built and tested with over 500 Vietnamese technical text shown that, this system satisfied users’ requires in accuracy and speed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Vulic, I., De Smet, W., Moens, M.F.: Cross language information retrieval models based on latent topic models traned with document aligned comparable corpora. Inf. Retrieval 16(3), 331–368. Springer (2013)
Mišutka, J., Galamboš, L.: Extending full text search engine for mathematical content. Charles University in Prague, Ke Karlovu 3, 121 16 Prague, Czech Republic (2008)
Lau, J.H., Newman, D., Karimi, S., Baldwin, T.: Best topic word selection for topic labelling. In: Coling 2010: Posters, pp. 605–613 (2010)
Thu, H.N.T., Thanh, T.D., Hai, T.N., Ngoc, V.H.: Building Vietnamese topic modeling based on core terms and applying in text classification. In: Proceedings of the Fifth IEEE International Conference on Communication Systems and Network Technologies, pp. 1284–1288 (2015). doi: 10.1109/CSNT.2015.22
Kohlhase, M., Prodescu, C.: MathWebSearch: low-latency unification-based search. Center for Advanced Systems Engineering, Jacobs University Bremen, Germany, NTCIR-10 (2013)
Růžička, M.: Maths information retrieval for digital libraries. Technical report, Brno University (2013)
Adeel, M., Cheung, H.S., Khiyal, S.H.: Math go! Prototype of a content based mathematical formula search engine. J. Appl. Theor. Inf. Technol. JATIT 4(10), 1002 (2008)
Kohlhase, M.: An open markup format for mathematical documents. Technical report, Computer Science, International University Bremen (2009)
Moens, M.-F., Vulić, I.: Monolingual and cross-lingual probabilistic topic models and their applications in information retrieval. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 874–877. Springer, Heidelberg (2013)
Caprotti, O., Cohen, A.M., Cuypers, H., Sterk, H.: OpenMath technology for interactive mathematical documents. Technical report, Department of Mathematics and Computing Science, Eindhoven University of Technology, P.O. Box 513, NL-5600 MB Eindhoven, The Netherlands (2002)
Sojka, P., Líška, M.: Indexing and searching mathematics in digital libraries. Masaryk University, Faculty of Informatics, Botanická 68a, 602 00 Brno, Czech Republic (2011)
Ion, P.D.F.: MathML: a key to math on the web. Mathematical Reviews, P.O. Box 8604, Ann Arbor, MI 48107, USA (1999)
Anca, S., Kohlhase, M.: MaTeSearch, a combined math and text search engine. Jacobs University (2007)
Oetiker, T., Partl, H., Hyna, I., Schlegl, E.: The not so short introduction to LATEX. Version 5.04 (2014)
Trung Hung, V., Tuan, C.X.: MathML for the management of mathematical formula in text editor. Int. J. Eng. Res. Technol. 4(05) (2015)
Trung Hung, V., Tuan, C.X.: VM-SEMWEB: a semantic web for vietnamese mathematical documents. Int. J. Eng. Res. Technol. 4(05) (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Xuan, T.C., Khanh, L.B., Trung, H.V., Thu, H.N.T., Thanh, T.D. (2016). Indexing Based on Topic Modeling and MATHML for Building Vietnamese Technical Document Retrieval Effectively. In: Vinh, P., Alagar, V. (eds) Context-Aware Systems and Applications. ICCASA 2015. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 165. Springer, Cham. https://doi.org/10.1007/978-3-319-29236-6_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-29236-6_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29235-9
Online ISBN: 978-3-319-29236-6
eBook Packages: Computer ScienceComputer Science (R0)