UAHCI 2007: Universal Access in Human-Computer Interaction. Applications and Services pp 299-308 | Cite as
A Bayesian Network Approach to Semantic Labelling of Text Formatting in XML Corpora of Documents
Abstract
The wide-spread applications of document digitization have lead to the use of structured digital representation methods such as the XML language. Extraction methodologies for the formatting metadata can be used on such structured documents for enhancing their accessibility, including augmented audio representation of documents. To the best of our knowledge, an effort has yet to be made to produce an automatic extraction system of semantic information of the document formatting, solely from document layout, without the use of natural language processing. In this study a corpus of XML representations of several issues of a Greek newspaper is used in order to create and evaluate a semantic classifier of text formatting, based on Bayesian Networks.
Keywords
document accessibility document analysis semantic labelingPreview
Unable to display preview. Download preview PDF.
References
- 1.Conway, A.: Page grammars and page parsing: a syntactic approach to document layout recognition. In: Proc. Int. Conf. on Document Analysis and Recognition, pp. 761–764 (1993)Google Scholar
- 2.Yamashita, A., Amano, T., Takahashi, I., Toyokawa, K.: A model based layout understanding method for the document recognition system. In: Proc. Int. Conf. on Document Analysis and Recognition, Saint Malo, France, pp. 130–138 (September 1991)Google Scholar
- 3.Chicago Manual of Style. 15th edn. University of Chicago Press, Chicago (2003) http://www.chicagomanualofstyle.org/
- 4.Derrien-Peden, D.: Frame-based system for macro-typographical structure analysis in scientific papers. In: Proc. Int. Conf. on Document Analysis and Recognition, Saint-Malo, France, pp. 311–319 (1991)Google Scholar
- 5.Rish, I., Hellerstein, J., Jayram, T.: An analysis of data characteristics that affect naive Bayes performance. Technical Report RC21993, IBM Watson Research Center (2001)Google Scholar
- 6.Langley, P., Iba, W., Thompson, K.: An analysis of Bayesian classifiers. In: Proc. 10th Nat. Conf. Artificial Intelligence, pp. 399–406. AAAI Press and MIT Press (1992)Google Scholar
- 7.Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proc. 10th Conf. Uncertainty in Artificial Intelligence, pp. 223–228. Morgan Kaufmann, San Francisco (1994)Google Scholar
- 8.Krishnamoorthy, M., Nagy, G., Seth, S., Viswanathan, M.: Syntactic segmentation and labeling of digitized pages from technical journals. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, 737–747 (1993)CrossRefGoogle Scholar
- 9.Maragoudakis, M., Kermanidis, K., Fakotakis, N., Kokkinakis, G.: Combining bayesian and support vector machines learning to automatically complete syntactical information for HPSG-like formalisms. In: Proceedings of International Conference on Language Resources and Evaluation, Las Palmas, Spain, pp. 93–100 (2002)Google Scholar
- 10.Maragoudakis, M., Ganchev, T., Fakotakis, N.: Bayesian reinforcement for a probabilistic neural net part-of-speech tagger. In: Proc. Int. Conf. on Text Speech and Dialogue, Brno, Chech Republic, pp. 137–145 (2004)Google Scholar
- 11.Bringhurst, R.: The Elements of Typographic Style, 2nd edn., pp. 93–119. Hartley & Marks Publishers, Vancouver Canada (2002)Google Scholar
- 12.Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Proceedings of SPIE 5010, pp. 197–207 (2003)Google Scholar
- 13.Souafi-Bensafi, S., Parizeau, M., Lebourgeois, F., Emptoz, H.: Bayesian networks classifiers applied to documents. In: Proc. IEEE ICPR, vol. 1, pp. 483–486 (2002)Google Scholar
- 14.Souafi-Bensafi, S., Parizeau, M., Lebourgeois, F., Emptoz, H.: Logical labeling using bayesian networks. In: Proceedings of IEEE ICDAR, pp. 832–836 (2001)Google Scholar
- 15.Tsujimoto, S., Asada, H.: Understanding multi-articled document. In: Proc. Int. Conf. on Pattern Recognition, Atlantic City, NJ, pp. 551–556 (1990)Google Scholar
- 16.Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. pp. 13–26. Academic Press, San Diego (2006)MATHGoogle Scholar
- 17.The American Psychological Association. Publication Manual, Washington DC, pp. 94–103, 111–130 (2001)Google Scholar
- 18.The Economist Style Guide http://www.economist.com/research/StyleGuide/
- 19.Tateisi, Y., Itoh, N.: Using stochastic syntactic analysis for extracting a logical structure from a document image. In: Proc. Int. Conf. on Pattern Recognition, Israel, pp. 391–394 (1994)Google Scholar
- 20.Xydas, G., Argyropoulos, V., Karakosta, T., Kouroupetroglou, G.: An Experimental Approach in Recognizing Synthesized Auditory Components in a Non-Visual Interaction with Documents. In: Proc. 11th Int. Conf. Human-Computer Interaction, Las Vegas, pp. 411–420 (2005)Google Scholar
- 21.Xydas, G., Kouroupetroglou, G.: Augmented Auditory Representation of e-Texts for Text-to-Speech Systems. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 134–141. Springer, Heidelberg (2001)Google Scholar
- 22.Web Accessibility Initiative (WAI) http://www.w3.org/WAI/