Skip to main content

Indexing Based on Topic Modeling and MATHML for Building Vietnamese Technical Document Retrieval Effectively

  • Conference paper
  • First Online:
Context-Aware Systems and Applications (ICCASA 2015)

Included in the following conference series:

  • 620 Accesses

Abstract

The grow of data on the Internet has brought to people many information and it also opened some important problem in Information retrieval…Along with it, some search engines have developed for user’s purpose. User can retrieve information by content, keyword or anything what they need. However, data on the Internet is too huge, the results feedback is often millions or hundreds millions for each query. Therefore, with the narrow field, we will meet a difficult to find related information, especially technical information that contain formulas. In this paper, we present a method for building Vietnamese technical text based on topic modeling and MathML for indexing. System has built and tested with over 500 Vietnamese technical text shown that, this system satisfied users’ requires in accuracy and speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Vulic, I., De Smet, W., Moens, M.F.: Cross language information retrieval models based on latent topic models traned with document aligned comparable corpora. Inf. Retrieval 16(3), 331–368. Springer (2013)

    Google Scholar 

  3. Mišutka, J., Galamboš, L.: Extending full text search engine for mathematical content. Charles University in Prague, Ke Karlovu 3, 121 16 Prague, Czech Republic (2008)

    Google Scholar 

  4. Lau, J.H., Newman, D., Karimi, S., Baldwin, T.: Best topic word selection for topic labelling. In: Coling 2010: Posters, pp. 605–613 (2010)

    Google Scholar 

  5. Thu, H.N.T., Thanh, T.D., Hai, T.N., Ngoc, V.H.: Building Vietnamese topic modeling based on core terms and applying in text classification. In: Proceedings of the Fifth IEEE International Conference on Communication Systems and Network Technologies, pp. 1284–1288 (2015). doi: 10.1109/CSNT.2015.22

  6. Kohlhase, M., Prodescu, C.: MathWebSearch: low-latency unification-based search. Center for Advanced Systems Engineering, Jacobs University Bremen, Germany, NTCIR-10 (2013)

    Google Scholar 

  7. Růžička, M.: Maths information retrieval for digital libraries. Technical report, Brno University (2013)

    Google Scholar 

  8. Adeel, M., Cheung, H.S., Khiyal, S.H.: Math go! Prototype of a content based mathematical formula search engine. J. Appl. Theor. Inf. Technol. JATIT 4(10), 1002 (2008)

    Google Scholar 

  9. Kohlhase, M.: An open markup format for mathematical documents. Technical report, Computer Science, International University Bremen (2009)

    Google Scholar 

  10. Moens, M.-F., Vulić, I.: Monolingual and cross-lingual probabilistic topic models and their applications in information retrieval. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 874–877. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  11. Caprotti, O., Cohen, A.M., Cuypers, H., Sterk, H.: OpenMath technology for interactive mathematical documents. Technical report, Department of Mathematics and Computing Science, Eindhoven University of Technology, P.O. Box 513, NL-5600 MB Eindhoven, The Netherlands (2002)

    Google Scholar 

  12. Sojka, P., Líška, M.: Indexing and searching mathematics in digital libraries. Masaryk University, Faculty of Informatics, Botanická 68a, 602 00 Brno, Czech Republic (2011)

    Google Scholar 

  13. Ion, P.D.F.: MathML: a key to math on the web. Mathematical Reviews, P.O. Box 8604, Ann Arbor, MI 48107, USA (1999)

    Google Scholar 

  14. Anca, S., Kohlhase, M.: MaTeSearch, a combined math and text search engine. Jacobs University (2007)

    Google Scholar 

  15. Oetiker, T., Partl, H., Hyna, I., Schlegl, E.: The not so short introduction to LATEX. Version 5.04 (2014)

    Google Scholar 

  16. Trung Hung, V., Tuan, C.X.: MathML for the management of mathematical formula in text editor. Int. J. Eng. Res. Technol. 4(05) (2015)

    Google Scholar 

  17. Trung Hung, V., Tuan, C.X.: VM-SEMWEB: a semantic web for vietnamese mathematical documents. Int. J. Eng. Res. Technol. 4(05) (2015)

    Google Scholar 

  18. https://en.wikipedia.org/wiki/Egomath

  19. http://www.wiris.com/

  20. http://www.leactivemath.org/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ha Nguyen Thi Thu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Xuan, T.C., Khanh, L.B., Trung, H.V., Thu, H.N.T., Thanh, T.D. (2016). Indexing Based on Topic Modeling and MATHML for Building Vietnamese Technical Document Retrieval Effectively. In: Vinh, P., Alagar, V. (eds) Context-Aware Systems and Applications. ICCASA 2015. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 165. Springer, Cham. https://doi.org/10.1007/978-3-319-29236-6_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29236-6_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29235-9

  • Online ISBN: 978-3-319-29236-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics