Skip to main content

Text Categorization for Generation of a Historical Shipbuilding Ontology

  • Conference paper

Part of the Communications in Computer and Information Science book series (CCIS,volume 468)

Abstract

This paper deals with the task of developing a text corpus for the automatic generation of a historical shipbuilding domain ontology. Standard methods of analysis produce unsatisfactory results due to the limited nomenclature of available texts and lexical evolution of language. In this work, a parser developed by authors is used for lemmatization and word-sense disambiguation. The parser is based on an external classifier and provides the unambiguous relationship between each lexeme and class. The documents are represented as vectors in the topic space. The experiments show that the proposed method of categorization produces results very close to the expert opinion and at the same time is sufficiently resistant to the historical dynamics of the vocabulary.

Keywords

  • Text categorization
  • historical shipbuilding domain
  • ontology
  • parsing
  • space of topics

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-11716-4_1
  • Chapter length: 14 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-11716-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The CIDOC conceptual reference model (CRM), www.cidoc-crm.org/

  2. Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. Journal of Machine Learning Research 3, 1183–1208 (2003)

    MATH  Google Scholar 

  3. Blei, D., Lafferty, J.: Topic models. Text Mining: Classification, Clustering, and Applications, 71–94 (2009)

    Google Scholar 

  4. Boyarsky, K.K., Kanevsky, E.A.: Rules language for creation of a syntactic tree. In: Internet and Modern Society: XIV All-Russian Joint Conference, pp. 233–237. Multi Project System Service Publishing, Sankt-Petersburg (2011)

    Google Scholar 

  5. Curti, O.: Modelli Navali. Encyclopedia del Modellismo Navale. Sudostrojenie Publishing (1977)

    Google Scholar 

  6. Gavrilova, T.A., Horoshevsky, V.F.: Knowledge bases of intellectual systems. Piter Publishing, Sankt-Petersburg (2000)

    Google Scholar 

  7. Isa, D., Kallimani, V.P., Lee, L.H.: Using the self organizing map for clustering of text documents. Expert Systems with Applications 36, 9584–9591 (2009)

    CrossRef  Google Scholar 

  8. Kanevsky, E.A., Boyarsky, K.K.: Semantic-syntactical analyzer semsin. In: International Conference on Computational Linguistics Dialog 2012, Bekasovo, May 30-June 3 (2012), http://www.dialog-21.ru/digest/2012/?type=doc

  9. Karlgren, J., Cutting, D.: Recognizing text genres with simple metrics using discriminant analysis. In: Proc. 15th Int. Conf. on Computational Linguistics (COLING), Kyoto, vol. 2, pp. 1071–1075 (1994)

    Google Scholar 

  10. de Knijff, J., Frasincar, F., Hogenboom, F.: Domain taxonomy learning from text: The subsumption method versus hierarchical clustering. Data & Knowledge Engineering 83, 54–69 (2013)

    CrossRef  Google Scholar 

  11. Korshunov, A., Gomzin, A.: Topic modeling in natural language texts. In: Works of Institute of System Design of the Russian Academy of Sciences (2012)

    Google Scholar 

  12. Lee, C.S., Kao, Y.F., Kuo, Y.H., Wang, M.H.: Automated ontology construction for unstructured text documents. Data & Knowledge Engineering 60, 547–566 (2007)

    CrossRef  Google Scholar 

  13. Luo, C., Li, Y., Chung, S.M.: Text document clustering based on neighbors. Data & Knowledge Engineering 68, 1271–1288 (2009)

    CrossRef  Google Scholar 

  14. Mashechkin, I.V., Petrovsky, M.I., Tsarov, D.: Methods of calculation of relevance of text fragments using topic models in a problem of automatic annotation. Computing Methods and Programming 14, 91–102 (2013)

    Google Scholar 

  15. Mozzherina, E.: Approach to improving the classification of the new york times annotated corpus. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 83–91. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  16. Nasir, J.A., Varlamis, I., Karim, A., Tsatsaronis, G.: Semantic smoothing for text clustering. Knowledge-Based Systems 54, 216–229 (2013)

    CrossRef  Google Scholar 

  17. Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California, pp. 100–108 (June 2010)

    Google Scholar 

  18. Nouman, A., JingTao, Y.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Systems with Applications 39, 4760–4768 (2012)

    CrossRef  Google Scholar 

  19. Pinheiro, R., Cavalcanti, G., Correa, R., Ren, T.I.: A global-ranking local feature selection method for text categorization. Expert Systems with Applications 39, 12851–12857 (2012)

    CrossRef  Google Scholar 

  20. Romme, C.: L’Art de la marine, ou principes et prceptes gnraux de l’art de construire et d’armer les vaisseaux. Sea military school Publishing (1793, 1795)

    Google Scholar 

  21. Rubashkin, V.S.: Ontologic semantics. Knowledge. Ontologies. Ontologically focused methods of the information analysis of the text. Fizmatlit Publishing (2013)

    Google Scholar 

  22. Rykov, V.V.: Text corpus as realization of an object-oriented paradigm. In: Workshop Dialog 2002. Nauka Publishing (2002)

    Google Scholar 

  23. Song, W., Li, C.H., Park, S.C.: Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures. Expert Systems with Applications 36, 9095–9104 (2009)

    CrossRef  Google Scholar 

  24. Tuzov, V.A.: Computer semantics of Russian. Sankt-Petersburg State University (2004)

    Google Scholar 

  25. Varfolomeyev, A., Ivanovs, A.: Representation of historical sources on the semantic web by means of attempto controlled english. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 177–190. Springer, Heidelberg (2013)

    CrossRef  Google Scholar 

  26. Vorontsov, K.B.: Probabilistic topic models of text documents collections, http://www.machinelearning.ru/wiki/images/7/7e/Voron-ML-TopicModels-slides.pdf

  27. de Vries, G., Malaisé, V., van Someren, M., Adriaans, P., Chreiber, G.: Semi-automatic ontology extension in the maritime domain. In: Proceedings of the Twentieth Belgian-Dutch Conference on Artificial Intelligence, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science, pp. 265–272 (2008), http://dare.uva.nl/en/record/315959

  28. Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Information Processing and Management 48, 741–754 (2012)

    CrossRef  Google Scholar 

  29. Zagidulin, I.: Methods and means of an automatic text categorization (2008), http://www.cv.imm.uran.ru/uploads/f1/s/0/299/basic/7/858/Metodyi_i_sredstva_TK.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Artemova, G. et al. (2014). Text Categorization for Generation of a Historical Shipbuilding Ontology. In: Klinov, P., Mouromtsev, D. (eds) Knowledge Engineering and the Semantic Web. KESW 2014. Communications in Computer and Information Science, vol 468. Springer, Cham. https://doi.org/10.1007/978-3-319-11716-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11716-4_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11715-7

  • Online ISBN: 978-3-319-11716-4

  • eBook Packages: Computer ScienceComputer Science (R0)