Automatically Generated DAML Markup for Semistructured Documents
The semantic web is becoming a realizable technology due to the efforts of researchers to develop semantic markup languages such as the DARPA Agent Markup Language (DAML). A major problem that faces the semantic web community is that most information sources on the web today lack semantic markup. To fully realize the potential of the semantic web, we must find a way to automatically upgrade information sources with semantic markup. We have developed a system based on the STALKER algorithm that automatically generates DAML markup for a set of documents based on previously seen labeled training documents. Our rule-learning approach to semantic markup is highly effective when dealing with semistructured documents.
KeywordsFull System Average Recall Training Document Forward Rule Ontology Element
Unable to display preview. Download preview PDF.
- 1.Ciravegna, F.: (LP)2, an Adaptive Algorithm for Information Extraction from Web-related Texts. In: Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining held in conjunction with 17th International Joint Conference on Artificial Intelligence, IJCAI (2001)Google Scholar
- 5.Knoblock, C.A., Lerman, K., Minton, S., Muslea, I.: Accurately and reliably extracting data from the web: A machine learning approach. Data Engineering BulletinGoogle Scholar
- 6.Muslea, I., Minton, S., Knoblock, C.: Hierarchical wrapper induction for semistructured information sources. Journal of Autonomous Agents and Multi-Agent Systems (2001)Google Scholar