Abstract
The large volume of web content needs to be annotated by ontologies (called Semantic Annotation), and our empirical study shows that strong dependencies exist across different types of information (it means that identification of one kind of information can be used for identifying the other kind of information). Conditional Random Fields (CRFs) are the state-of-the-art approaches for modeling the dependencies to do better annotation. However, as information on a Web page is not necessarily linearly laid-out, the previous linear-chain CRFs have their limitations in semantic annotation. This paper is concerned with semantic annotation on hierarchically dependent data (hierarch-ical semantic annotation). We propose a Tree-structured Conditional Random Field (TCRF) model to better incorporate dependencies across the hierarchic-ally laid-out information. Methods for performing the tasks of model-parameter estimation and annotation in TCRFs have been proposed. Experimental results indicate that the proposed TCRFs for hierarchical semantic annotation can significantly outperform the existing linear-chain CRF model.
Supported by the National Natural Science Foundation of China under Grant No. 90604025.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Benjamins, R., Contreras, J.: Six challenges for the semantic web. Intelligent Software Components. Intelligent Software for the Networked Economy (isoco) (2002)
Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A maximum entropy approach to natural language processing. Computational Linguistics 22, 39–71 (1996)
Bunescu, R.C., Mooney, R.J.: Collective information extraction with relational Markov networks. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), pp. 439–446 (2004)
Ciravegna, F.: (LP)2, an adaptive algorithm for information extraction from web-related texts. In: Proceedings of the IJCAI 2001 Workshop on Adaptive Text Extraction and Mining held in conjunction with 17th IJCAI 2001, Seattle, USA, pp. 1251–1256 (2001)
Collins, M.: Discriminative training methods for hidden Markov models: Theory and Experiments with Perceptron Algorithms. In: Proceedings of EMNLP 2002 (2002)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 273–297 (1995)
Finn, A., Kushmerick, N.: Multi-level Boundary Classification for Information Extraction. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 156–167. Springer, Heidelberg (2004)
Ghahramani, Z., Jordan, M.I.: Factorial hidden Markov models. Machine Learning 29, 245–273 (1997)
Gillick, L., Cox, S.: Some statistical issues in the compairson of speech recognition algorithms. In: International Conference on Acoustics Speech and Signal Processing, vol. 1, pp. 532–535 (1989)
Hammersley, J., Clifford, P.: Markov fields on finite graphs and lattices (unpublished manuscript, 1971)
Hammond, B., Sheth, A., Kochut, K.: Semantic enhancement engine: a modular document enhancement platform for semantic applications over heterogeneous content, in real world semantic web applications, pp. 29–49. IOS Press, Amsterdam (2002)
Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM – semi-automatic cREAtion of metadata. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS, vol. 2473, pp. 358–372. Springer, Heidelberg (2002)
Kushmerick, N., Weld, D.S., Doorenbos, R.B.: Wrapper induction for information extraction. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Nagoya, Japan, pp. 729–737 (1997)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001), pp. 282–289 (2001)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Mathematical Programming, 503–528 (1989)
Lou, T., Song, R., Li, W.L., Luo, Z.Y.: The design and implementation of a modern general purpose segmentation system. Journal of Chinese Information Processing (5) (2001)
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 591–598 (2000)
Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Ognyanoff, D., Goranov, M.: KIM - semantic annotation platform. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 834–849. Springer, Heidelberg (2003)
Reeve, L.: Integrating hidden Markov models into semantic web annotation platforms. Technique Report (2004)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology, NAACL (2003)
Sutton, C., Rohanimanesh, K., McCallum, A.: Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data. In: Proceedings of ICML 2004 (2004)
Tang, J., Li, J., Lu, H., Liang, B., Wang, K.: iASA: learning to annotate the semantic web. Journal on Data Semantic IV, 110–145 (2005a)
Tang, J., Li, H., Cao, Y., Tang, Z.: Email data cleaning. In: Proceedings of SIGKDD 2005, Chicago, Illinois, USA, August 21-24, 2005, pp. 489–499, Full paper (2005)
Wainwright, M., Jaakkola, T., Willsky, A.: Tree-based reparameterization for approximate estimation on graphs with cycles. In: Proceedings of Advances in Neural Information Processing Systems (NIPS 2001), pp. 1001–1008 (2001)
Yedidia, J., Freeman, W., Weiss, Y.: Generalized belief propagation. In: Advances in Neural Information Processing Systems (NIPS) (2000)
Zhu, J., Nie, Z., Wen, J., Zhang, B., Ma, W.: 2D conditional random fields for web information extraction. In: Proceedings of ICML 2005 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tang, J., Hong, M., Li, J., Liang, B. (2006). Tree-Structured Conditional Random Fields for Semantic Annotation. In: Cruz, I., et al. The Semantic Web - ISWC 2006. ISWC 2006. Lecture Notes in Computer Science, vol 4273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11926078_46
Download citation
DOI: https://doi.org/10.1007/11926078_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49029-6
Online ISBN: 978-3-540-49055-5
eBook Packages: Computer ScienceComputer Science (R0)