New Generation Computing

, Volume 25, Issue 3, pp 277–292 | Cite as

VN-KIM IE: Automatic Extraction of Vietnamese Named-Entities on the Web

Article

Abstract

The most fascinating advantage of the semantic web would be its capability of understanding and processing the contents of web pages automatically. Basically, the semantic web realization involves two main tasks: (1) Representation and management of a large amount of data and metadata for web contents; (2) Information extraction and annotation on web pages. On the one hand, recognition of named-entities is regarded as a basic and important problem to be solved, before deeper semantics of a web page could be extracted. On the other hand, semantic web information extraction is a language-dependent problem, which requires particular natural language processing techniques. This paper introduces VN-KIM IE, the information extraction module of the semantic web system VN-KIM that we have developed. The function of VN-KIM IE is to automatically recognize named-entities in Vietnamese web pages, by identifying their classes, and addresses if existing, in the knowledge base of discourse. That information is then annotated to those web pages, providing a basis for NE-based searching on them, as compared to the current keyword-based one. The design, implementation, and performance of VN-KIM IE are presented and discussed.

Keywords

Semantic Web Information Extraction Named-Entity Semantic Annotation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berners-Lee T., Hendler J. and Lassila O., The Semantic Web. Scientific American, 2001.Google Scholar
  2. 2.
    Bontcheva K., Kiryakov A., Cunningham H., Popov B. and Dimitrov M., “Semantic Web Enabled, Open Source Language Technology,” in Proc. of EACL Workshop dedicated to Language Technology and the Semantic WebBudapest, Hungary, 2003.Google Scholar
  3. 3.
    Brickley D. and Guha R., “Resource Description Framework (RDF) Schema Specificatio,” W3C Technical Report, 1999.Google Scholar
  4. 4.
    Cao T.H., Do H.T., Pham B.T.N., Huynh T.N. and Vu D.Q., “Conceptual Graphs for Knowledge Querying in VN-KIM,” in Contributions to the 13 th International Conference on Conceptual Structures, Kassel, Germany, Kassel University Press, pp. 27-40, 2005.Google Scholar
  5. 5.
    Cao T.H. and Huynh D.T., “Approximate Knowledge Graph Retrieval: Measures and Realization,” in Fuzzy Logic and the Semantic Web (Sanchez E. ed.), Elsevier Science, to appear, 2006.Google Scholar
  6. 6.
    Cao T.H., Nguyen T.H.D. and Qui T.C.T., “Searching the Web: a Semantics-Based Approach,” in Modelling, Simulation and Optimization of Complex Processes (H.G. et al. eds.), Springer, Berlin, pp. 57-68, 2005.Google Scholar
  7. 7.
    Cao T.H., Ta M.T.H. and Luong T.Q., “A Domain-Specific Concept-Based Searching System,” in Proc. of the Vietnam-Japan Workshop on Active Mining, Ha Noi, Vietnam, Japanese Artificial Intelligence Society SIG-KBS-A403, pp. 197-200, 2004.Google Scholar
  8. 8.
    Chau N.Q., Tuoi P.T and Cao T.H., “Vietnamese Proper Noun Recognition,” in Proc. of the 4 th IEEE Int. Conf. on Computer Sciences, Ho Chi Minh City, Vietnam, pp. 145-152, 2006.Google Scholar
  9. 9.
    Chinchor N. and Robinson P., “MUC-7 Named Entity Task Definition,” in Proc. of the MUC, 1998.Google Scholar
  10. 10.
    Cunningham H., Maynard D. and Tablan V., “JAPE: a Java Annotation Patterns Engine (2nd Edition),” Technical Report CS–00–10, Department of Computer Science, University of Sheffield, 2000.Google Scholar
  11. 11.
    Cunningham H., Maynard D., Bontcheva K. and Tablan V., “GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications,” in Proc. of the 40 th Anniversary Meeting of the Association for Computational Linguistics 2002.Google Scholar
  12. 12.
    Erdmann M., Maedche A., Schnurr H. and Staab S., “From Manual to Semi-Automatic Semantic Annotation: About Ontology-Based Text Annotation Tools,” in Proc. of the COLING Workshop on Semantic Annotation and Intelligent Content, 2000.Google Scholar
  13. 13.
    Grishman R. and Sundheim B., “Message Understanding Conference – 6: A Brief History,” in Proc. of COLING-96, 1996.Google Scholar
  14. 14.
    Handschuh S., Staab S. and Ciravegna F., “S-CREAM – Semi-Automatic CREAtion of Metadata,” in Proc. of the 13 th Int. Conf. on Knowledge Engineering and Management, Springer Verlag, 2002.Google Scholar
  15. 15.
    Kahan J., Koivunen M., Prud’Hommeaux E. and Swick R., “Annotea: An Open RDF Infrastructure for Shared Web Annotations,” in Proc. of WWW10 Conf., Hong Kong pp. 623-632.Google Scholar
  16. 16.
    Kampman A., Harmelen F. and Broekstra J., “Sesame: a Generic Architecture for Storing and Querying RDF and RDF Schema,” in Proc. of the 1 st Int. Semantic Web Conf., 2002.Google Scholar
  17. 17.
    Kiryakov A., Popov B., Terziev I., Manov D. and Ognyanoff D., “Semantic Annotation, Indexing, and Retrieval,” in Web Semantics, 2, 1, 2005.Google Scholar
  18. 18.
    Lassila O. and Swick R., “Resource Description Framework (RDF) Model and Syntax Specification,” W3C Technical Report, 1999.Google Scholar
  19. 19.
    Le P. and Cao T.H., “Automatic News Extraction from Web Pages,” in Addendum Contributions to the 4 th IEEE Int. Conf. on Computer Sciences, Ho Chi Minh City, Vietnam, pp. 47-52, 2006.Google Scholar
  20. 20.
    Luong X.Vu., Rules of Vietnamese Accent Placement (in Vietnamese), Vietnam Lexicography Centre.Google Scholar
  21. 21.
    Noy N.F., Sintek M., Decker S., Crubezy M., Fergerson R. W. and Musen M. A., “Creating Semantic Web Contents with Protégé-2000,” in IEEE Intelligent Systems,2, 16, pp. 60–71, 2001.Google Scholar
  22. 22.
    Popov B., Kiryakov A., Kirilov A., Manov D., Ognyanoff D. and Goranov M., “KIM – Semantic Annotation Platform,” in Proc. of the 2 nd Int. Semantic Web Conf., Florida, USA 2003.Google Scholar
  23. 23.
    Popov B., Kiryakov A., Ognyanoff D., Manov D., Kirilov A. and Goranov M., “Towards Semantic Web Information Extraction,” in Proc. of 2 nd Int. Semantic Web Conf., Florida, USA, 2003.Google Scholar
  24. 24.
    Staab S., Mädche A. and Handschuh S., “An Annotation Framework for the Semantic Web,” in Proc. of the 1 st Int. Workshop on Multimedia Annotation, Tokyo, Japan 2001.Google Scholar
  25. 25.
    Thin T.T., “Vietnamese Scripts on Computers – A Simple Idea about Accent Placement on Vietnamese Documents (in Vietnamese),” in Language Magazine, 1, pp. 72-76, 1995.Google Scholar
  26. 26.
    Vargas-Vera M., Motta E., Domingue J., Lanzoni M., Stutt A. and Ciravegna F., “MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup,” in Proc. of EKAW-02, Springer Verlag, 2002.Google Scholar

Copyright information

©  2007

Authors and Affiliations

  1. 1.Faculty of Computer Science and EngineeringHo Chi Minh City University of TechnologyHo Chi Minh CityVietnam

Personalised recommendations