Tree-Structured Conditional Random Fields for Semantic Annotation

Tang, Jie; Hong, Mingcai; Li, Juanzi; Liang, Bangyong

doi:10.1007/11926078_46

Jie Tang²⁴,
Mingcai Hong²⁴,
Juanzi Li²⁴ &
…
Bangyong Liang²⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4273))

Included in the following conference series:

International Semantic Web Conference

3303 Accesses
20 Citations

Abstract

The large volume of web content needs to be annotated by ontologies (called Semantic Annotation), and our empirical study shows that strong dependencies exist across different types of information (it means that identification of one kind of information can be used for identifying the other kind of information). Conditional Random Fields (CRFs) are the state-of-the-art approaches for modeling the dependencies to do better annotation. However, as information on a Web page is not necessarily linearly laid-out, the previous linear-chain CRFs have their limitations in semantic annotation. This paper is concerned with semantic annotation on hierarchically dependent data (hierarch-ical semantic annotation). We propose a Tree-structured Conditional Random Field (TCRF) model to better incorporate dependencies across the hierarchic-ally laid-out information. Methods for performing the tasks of model-parameter estimation and annotation in TCRFs have been proposed. Experimental results indicate that the proposed TCRFs for hierarchical semantic annotation can significantly outperform the existing linear-chain CRF model.

Supported by the National Natural Science Foundation of China under Grant No. 90604025.

Download to read the full chapter text

Chapter PDF

Semi-structured Document Annotation Using Entity and Relation Types

Exploiting Structural Consistencies with Stacked Conditional Random Fields

Assigning Semantic Labels to Data Sources

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Benjamins, R., Contreras, J.: Six challenges for the semantic web. Intelligent Software Components. Intelligent Software for the Networked Economy (isoco) (2002)
Google Scholar
Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A maximum entropy approach to natural language processing. Computational Linguistics 22, 39–71 (1996)
Google Scholar
Bunescu, R.C., Mooney, R.J.: Collective information extraction with relational Markov networks. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), pp. 439–446 (2004)
Google Scholar
Ciravegna, F.: (LP)2, an adaptive algorithm for information extraction from web-related texts. In: Proceedings of the IJCAI 2001 Workshop on Adaptive Text Extraction and Mining held in conjunction with 17th IJCAI 2001, Seattle, USA, pp. 1251–1256 (2001)
Google Scholar
Collins, M.: Discriminative training methods for hidden Markov models: Theory and Experiments with Perceptron Algorithms. In: Proceedings of EMNLP 2002 (2002)
Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 273–297 (1995)
MATH Google Scholar
Finn, A., Kushmerick, N.: Multi-level Boundary Classification for Information Extraction. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS, vol. 3201, pp. 156–167. Springer, Heidelberg (2004)
Chapter Google Scholar
Ghahramani, Z., Jordan, M.I.: Factorial hidden Markov models. Machine Learning 29, 245–273 (1997)
Article MATH Google Scholar
Gillick, L., Cox, S.: Some statistical issues in the compairson of speech recognition algorithms. In: International Conference on Acoustics Speech and Signal Processing, vol. 1, pp. 532–535 (1989)
Google Scholar
Hammersley, J., Clifford, P.: Markov fields on finite graphs and lattices (unpublished manuscript, 1971)
Google Scholar
Hammond, B., Sheth, A., Kochut, K.: Semantic enhancement engine: a modular document enhancement platform for semantic applications over heterogeneous content, in real world semantic web applications, pp. 29–49. IOS Press, Amsterdam (2002)
Google Scholar
Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM – semi-automatic cREAtion of metadata. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS, vol. 2473, pp. 358–372. Springer, Heidelberg (2002)
Chapter Google Scholar
Kushmerick, N., Weld, D.S., Doorenbos, R.B.: Wrapper induction for information extraction. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Nagoya, Japan, pp. 729–737 (1997)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001), pp. 282–289 (2001)
Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Mathematical Programming, 503–528 (1989)
Google Scholar
Lou, T., Song, R., Li, W.L., Luo, Z.Y.: The design and implementation of a modern general purpose segmentation system. Journal of Chinese Information Processing (5) (2001)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 591–598 (2000)
Google Scholar
Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Ognyanoff, D., Goranov, M.: KIM - semantic annotation platform. In: Fensel, D., Sycara, K.P., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 834–849. Springer, Heidelberg (2003)
Chapter Google Scholar
Reeve, L.: Integrating hidden Markov models into semantic web annotation platforms. Technique Report (2004)
Google Scholar
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of Human Language Technology, NAACL (2003)
Google Scholar
Sutton, C., Rohanimanesh, K., McCallum, A.: Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data. In: Proceedings of ICML 2004 (2004)
Google Scholar
Tang, J., Li, J., Lu, H., Liang, B., Wang, K.: iASA: learning to annotate the semantic web. Journal on Data Semantic IV, 110–145 (2005a)
Google Scholar
Tang, J., Li, H., Cao, Y., Tang, Z.: Email data cleaning. In: Proceedings of SIGKDD 2005, Chicago, Illinois, USA, August 21-24, 2005, pp. 489–499, Full paper (2005)
Google Scholar
Wainwright, M., Jaakkola, T., Willsky, A.: Tree-based reparameterization for approximate estimation on graphs with cycles. In: Proceedings of Advances in Neural Information Processing Systems (NIPS 2001), pp. 1001–1008 (2001)
Google Scholar
Yedidia, J., Freeman, W., Weiss, Y.: Generalized belief propagation. In: Advances in Neural Information Processing Systems (NIPS) (2000)
Google Scholar
Zhu, J., Nie, Z., Wen, J., Zhang, B., Ma, W.: 2D conditional random fields for web information extraction. In: Proceedings of ICML 2005 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tsinghua University, 12#109, Tsinghua University, Beijing, 100084., China
Jie Tang, Mingcai Hong & Juanzi Li
NEC Labs China, 11th Floor, Innovation Plaza, Tsinghua Science Park, Beijing, 100084, China
Bangyong Liang

Authors

Jie Tang
View author publications
You can also search for this author in PubMed Google Scholar
Mingcai Hong
View author publications
You can also search for this author in PubMed Google Scholar
Juanzi Li
View author publications
You can also search for this author in PubMed Google Scholar
Bangyong Liang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Illinois at Chicago, 851 South Morgan Street (M/C 152), 60607, Chicago, IL, USA
Isabel Cruz
Digital Enterprise Research Institute, National University of Ireland, Galway, IDA Business Park, Lower Dangan, Galway, Ireland
Stefan Decker
TopQuadrant, 22314, VA, USA
Dean Allemang
HP Laboratories, Bristol, UK
Chris Preist
Departamento de Informática – Pontifícia, Universidade Católica do Rio de Janeiro,, (PUC Rio) – Caixa Postal 38.097, 22.453-900, Rio de Janeiro, RJ, Brazil
Daniel Schwabe
Yahoo! Research, Barcelona, Spain
Peter Mika
Boeing, Phantom Works, P.O. Box 3707, m/s 7L-40, 98124-2207, Seattle, WA, USA
Mike Uschold
Technische Universiteit Eindhoven, P.O. Box 513, NL 5600, Eindhoven, MB, The Netherlands
Lora M. Aroyo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, J., Hong, M., Li, J., Liang, B. (2006). Tree-Structured Conditional Random Fields for Semantic Annotation. In: Cruz, I., et al. The Semantic Web - ISWC 2006. ISWC 2006. Lecture Notes in Computer Science, vol 4273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11926078_46

Download citation

DOI: https://doi.org/10.1007/11926078_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49029-6
Online ISBN: 978-3-540-49055-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tree-Structured Conditional Random Fields for Semantic Annotation

Abstract

Chapter PDF

Similar content being viewed by others

Semi-structured Document Annotation Using Entity and Relation Types

Exploiting Structural Consistencies with Stacked Conditional Random Fields

Assigning Semantic Labels to Data Sources

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Tree-Structured Conditional Random Fields for Semantic Annotation

Abstract

Chapter PDF

Similar content being viewed by others

Semi-structured Document Annotation Using Entity and Relation Types

Exploiting Structural Consistencies with Stacked Conditional Random Fields

Assigning Semantic Labels to Data Sources

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation