Abstract
In this paper we present a novel system that can automatically mark up text documents into XML. The system uses the Self-Organizing Map (SOM) algorithm to organize marked documents on a map so that similar documents are placed on nearby locations. Then by using the inductive learning algorithm C5, it automatically generates and applies the markup rules from the nearest SOM neighbours of an unmarked document. The system is adaptive in nature and learns from errors in the automatically marked-up document to improve accuracy. The automatically marked-up documents are again arranged on the SOM.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Kohonen, T. (1997). Self-Organizing Maps. Springer Series in Information Science, Berlin, Heidelberg, New York.
Quinlan, J. R. (1993). C4.5: Programs For Machine Learning. Morgan Kauffman Publishers, San Mateo, Calif.
Quinlan, J. R. (2000). Data Mining Tools See5 and C5.0. [http://www.rulequest.com/see5-info.html]
Schreibman, S. (1998). The MacGreevy Archive. [http://www.ucd.ie/~cosei/archive.htm]
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Akhtar, S., Reilly, R.G., Dunnion, J. (2002). AutoMarkup: A Tool for Automatically Marking up Text Documents. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45715-1_46
Download citation
DOI: https://doi.org/10.1007/3-540-45715-1_46
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43219-7
Online ISBN: 978-3-540-45715-2
eBook Packages: Springer Book Archive