Skip to main content

iASA: Learning to Annotate the Semantic Web

  • Conference paper
Journal on Data Semantics IV

Part of the book series: Lecture Notes in Computer Science ((JODS,volume 3730))

Abstract

With the advent of the Semantic Web, there is a great need to upgrade existing web content to semantic web content. This can be accomplished through semantic annotations. Unfortunately, manual annotation is tedious, time consuming and error-prone. In this paper, we propose a tool, called iASA, that learns to automatically annotate web documents according to an ontology. iASA is based on the combination of information extraction (specifically, the Similarity-based Rule Learner—SRL) and machine learning techniques. Using linguistic knowledge and optimal dynamic window size, SRL produces annotation rules of better quality than comparable semantic annotation systems. Similarity-based learning efficiently reduces the search space by avoiding pseudo rule generalization. In the annotation phase, iASA exploits ontology knowledge to refine the annotation it proposes. Moreover, our annotation algorithm exploits machine learning methods to correctly select instances and to predict missing instances. Finally, iASA provides an explanation component that explains the nature of the learner and annotator to the user. Explanations can greatly help users understand the rule induction and annotation process, so that they can focus on correcting rules and annotations quickly. Experimental results show that iASA can reach high accuracy quickly.

Supported by the National Natural Science Foundation of China under Grant No. 60443002.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alani, H., Kim, S., Millard, D., Weal, M., Hall, W., Lewis, P., Shadbolt, N.: Automatic Ontology-Based Knowledge Extraction from Web Documents. IEEE Intelligent Systems 18(1), 14–21 (2003)

    Article  Google Scholar 

  2. Benjamins, R., Contreras, J.: White Paper Six Challenges for the Semantic Web. Intelligent Software Components. Intelligent software for the networked economy, isoco (April 2002)

    Google Scholar 

  3. Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics 22, 39–71 (1996)

    Google Scholar 

  4. Berners-Lee, T., Fischetti, M., Dertouzos, M.L.: Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web (1999)

    Google Scholar 

  5. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 34–43 (2001)

    Article  Google Scholar 

  6. Buitelaar, P., Declerck, T.: Linguistic Annotation for the Semantic Web. In: Annotation for the Semantic Web. Frontiers in Artificial Intelligence and Applications Series, vol. 96. IOS Press, Amsterdam (2003)

    Google Scholar 

  7. Califf, M.E.: Relational Learning Techniques for Natural Language Information Extraction. Ph.D. thesis. University of Texas, Austin (1998)

    Google Scholar 

  8. Chieu, H.L., Ng, H.T.: A Maximum Entropy Approach to Information Extraction from Semi-Structured and Free Text. In: Eighteenth national conference on Artificial intelligence (2002)

    Google Scholar 

  9. Ciravegna, F.: (LP)2, an Adaptive Algorithm for Information Extraction from Web-related Texts. In: Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining held in conjunction with 17th International Joint Conference on Artificial Intelligence (IJCAI), Seattle, Usa (August 2001)

    Google Scholar 

  10. Ciravegna, F., Dingli, A., Iria, J., Wilks, Y.: Multi-strategy Definition of Annotation Services in Melita. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 97–107. Springer, Heidelberg (2003)

    Google Scholar 

  11. Cohen, W., Jensen, L.: A Structured Wrapper Induction System for Extracting Information from Semi-structured Documents. In: Proceedings of the Workshop on Adaptive Text Extraction and Mining, IJCAI 2001 (2001)

    Google Scholar 

  12. Collins, M.: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In: Proceedings of the Conference on Empirical Methods in NLP (2002)

    Google Scholar 

  13. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  14. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)

    Google Scholar 

  15. Dean, M., Schreiber, G., Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D.L., Patel-Schneider, P.F., Andrea Stein, L.: OWL Web Ontology Language Reference. W3C Recommendation (February 10, 2004), http://www.w3.org/TR/owl-ref/

  16. Dhamankar, R., Lee, Y., Doan, A.H., Halevy, A., Domingos, P.: iMAP: Discovering Complex Semantic Matches between Database Schemas. In: SIGMOD 2004, Paris, France (June 13–18, 2004)

    Google Scholar 

  17. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., Jhingran, A., Kanungo, T., McCurley, K.S., Rajagopalan, S., Tomkins, A., Tomlin, J.A., Zien, J.Y.: A Case for Automated Large-scale Semantic Annotation. Journal of Web Semantics: Science, Services and Agents on the World Wide Web, 115–132 (July 2003)

    Google Scholar 

  18. Eriksson, H., Fergerson, R., Shahar, Y., Musen, M.: Automatic Generation of Ontology Editors. In: Proceedings of the 12th Banff Knowledge Acquisition Workshop, Banff Alberta, Canada (1999)

    Google Scholar 

  19. Fensel, D., Decker, S., Erdmann, M., Studer, R.: Ontobroker: Or how to enable intelligent access to the WWW. In: Proceedings of 11th Banff Knowledge Acquisition for Knowledge-Based SystemsWorkshop, Banff, Canada (1998)

    Google Scholar 

  20. Freitag, D., Kushmerick, N.: Boosted Wrapper Induction. In: Proceedings of 17th National Conference on Artificial Intelligence (2000)

    Google Scholar 

  21. Ghahramani, Z., Jordan, M.I.: Factorial Hidden Markov Models. Machine Learning 29, 245–273 (1997)

    Article  MATH  Google Scholar 

  22. Hammond, B., Sheth, A., Kochut, K.: Semantic Enhancement Engine: A Modular Document Enhancement Platform for Semantic Applications over Heterogeneous Content. In: Kashyap, V., Shklar, L. (eds.) Real World Semantic Web Applications, December 2002, pp. 29–49. IOS Press, Amsterdam (2002)

    Google Scholar 

  23. Han, H., Giles, L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.: Automatic Document Metadata Extraction Using Support Vector Machine. In: Proceedings of Joint Conference on Digital Libraries (JCDL 2003), pp. 37–48 (2003)

    Google Scholar 

  24. Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM—Semi-automatic Creation of Metadata, In Proceedings of the 13th International Conference on Knowledge Engineering and Management (EKAW 2002), Siguenza, Spain. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 358–372. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  25. Handschuh, S., Staab, S.: Annotation for the Semantic Web. Frontiers in Artificial Intelligence and Applications, vol. 96. New IOS Publication (2003)

    Google Scholar 

  26. Heflin, J., Hendler, J.: Searching the Web with SHOE. In: Proceedings of AAAI-2000 Workshop on AI for Web Search, Austin, Texas (2000)

    Google Scholar 

  27. Kahan, J., Koivunen, M.R.: Annotea: an Open RDF Infrastructure for Shared Web Annotations. In: Proceedings of World Wide Web, pp. 623–632 (2001)

    Google Scholar 

  28. Kogut, P., Holmes, W.: AeroDAML: Applying Information Extraction to Generate DAML Annotations from Web Pages (2001)

    Google Scholar 

  29. Kushmerick, N., Weld, D.S., Doorenbos, R.B.: Wrapper Induction for Information Extraction. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Nagoya, Japan, pp. 729–737 (1997)

    Google Scholar 

  30. Leonard, T., Glaser, H.: Large Scale Acquisition and Maintenance from the Web without Source Access (2001), http://www.semannot2001.aifb.uni-karlsruhe.de/positionpapers/Leonard.pdf

  31. Lerman, K., Knoblock, C., Minton, S.: Automatic data extraction from lists and tables in web sources. In: IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, WA (August 2001)

    Google Scholar 

  32. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML 2001 (2001)

    Google Scholar 

  33. Lavelli, A., Califf, M., Ciravegna, F., Freitag, F., Giuliano, D., Kushmerick, C., Romano, N.: A Critical Survey of the Methodology for IE Evaluation. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (2004)

    Google Scholar 

  34. Li, J., Yu, Y.: Learning to Generate Semantic Annotation for Domain Specific Sentences. In: Proceedings of the Knowledge Markup and Semantic Annotation Workshop in K-CAP 2001, Victoria, BC (2001)

    Google Scholar 

  35. Martin, P., Eklund, P.: Embedding Knowledge in Web Documents. In: Proceedings of the 8th International World Wide Web Conf (WWW 1998), Toronto, May 1999, pp. 1403–1419. Elsevier Science B.V, Amsterdam (1999)

    Google Scholar 

  36. McCallum, A., Freitag, D., Pereira, F.: Maximum Entropy Markov Models for Information Extraction and Segmentation. In: Proceedings of the ICML Coference (2000)

    Google Scholar 

  37. Mukherjee, S., Yang, G., Ramakrishnan, I.V.: Automatic Annotation of Content-Rich HTML Documents: Structural and Semantic Analysis. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 533–549. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  38. Muslea, I.: Active Learning with Multiple Views. Ph.D. dissertation USC (2002)

    Google Scholar 

  39. Nahm, U.Y., Mooney, R.J.: Using Soft-Matching Mined Rules to Improve Information Extraction. In: Proceedings of the AAAI-2004 Workshop on Adaptive Text Extraction and Mining (ATEM-2004), San Jose, CA, July 2004, pp. 27–32 (2004)

    Google Scholar 

  40. Peng, F., McCallum, A.: Accurate Information Extraction from Research Papers using Conditional Random Fields. In: Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics, HLT-NAACL (2004)

    Google Scholar 

  41. Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table Extraction Using Conditional Random Fields. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval (2003)

    Google Scholar 

  42. Popov, B., Kiryakov, A., Manov, D., Kirilov, A., Ognyanoff, D., Goranov, M.: Towards Semantic Web Information Extraction. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 1–21. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  43. Schaffer, C.: Selecting a Classification method by Cross-Validation. Machine Learning 13(1), 135–143 (1993)

    Google Scholar 

  44. Seymore, K., McCallum, A., Rosenfeld, R.: Learning Hidden Markov Model Structure for Information Extraction. In: Proceedings of AAAI 1999 Workshop on Machine Learning for Information Extraction (1999)

    Google Scholar 

  45. Soderland, S.: Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning, 1–44 (January 1999)

    Google Scholar 

  46. Soo, V.W., Lee, C.Y., Li, C.–C., Chen, S.L., Chen, C.: Automated Semantic Annotation and Retrieval Based on Sharable Ontology and Case-based Learning Techniques. In: Proceedings of the 2003 Joint Conference on Digital Libraries. IEEE, Los Alamitos (2003)

    Google Scholar 

  47. Vapnik, V.: Statistical Learning Theroy. Springer, New York (1998)

    Google Scholar 

  48. Vargas-Vera, M., Motta, E., Domingue, J., Buckingham Shum, S., Lanzoni, M.: Knowledge Extraction by Using an Ontology-based Annotation Tool. In: Proceedings of K-CAP 2001 Workshop on Knowledge Markup and Semantic Annotation, Victoria, BC, Canada (October 2001)

    Google Scholar 

  49. Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology Driven Semiautomatic and Automatic Support for Semantic Markup. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, p. 379. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  50. Zhang, K., Xu, P., Li, J.: Optimal Hierarchical Clustering based Logic Structure Extraction. Journal of Tsinghua Science and Technology (2005)

    Google Scholar 

  51. Zhang, L., Pan, Y., Zhang, T.: Recognising and using named entities: Focused named entity recognition using machine learning. In: Proceedings of the SIGIR 2004 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tang, J., Li, J., Lu, H., Liang, B., Huang, X., Wang, K. (2005). iASA: Learning to Annotate the Semantic Web. In: Spaccapietra, S. (eds) Journal on Data Semantics IV. Lecture Notes in Computer Science, vol 3730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11603412_4

Download citation

  • DOI: https://doi.org/10.1007/11603412_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31001-3

  • Online ISBN: 978-3-540-31447-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics