Advertisement

Using Support Vector Machines for Terrorism Information Extraction

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2665)

Abstract

Information extraction (IE) is of great importance in many applications including web intelligence, search engines, text understanding, etc. To extract information from text documents, most IE systems rely on a set of extraction patterns. Each extraction pattern is defined based on the syntactic and/or semantic constraints on the positions of desired entities within natural language sentences. The IE systems also provide a set of pattern templates that determines the kind of syntactic and semantic constraints to be considered. In this paper, we argue that such pattern templates restricts the kind of extraction patterns that can be learned by IE systems. To allow a wider range of context information to be considered in learning extraction patterns, we first propose to model the content and context information of a candidate entity to be extracted as a set of features. A classification model is then built for each category of entities using Support Vector Machines (SVM). We have conducted IE experiments to evaluate our proposed method on a text collection in the terrorism domain. From the preliminary experimental results, we conclude that our proposed method can deliver reasonable accuracies.

Keywords

Information extraction terrorism-related knowledge discovery 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Baluja, V. Mittal, and R. Sukthankar. Applying machine learning for high performance named-entity extraction. Computational Intelligence, 16(4):586–595, November 2000.CrossRefGoogle Scholar
  2. 2.
    S. T. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management, pages 148–155, Bethesda, Maryland, November 1998.Google Scholar
  3. 3.
    D. Freitag. Information extraction from HTML: Application of a general machine learning approach. In Proceedings of the 15th Conference on Artificial Intelligence (AAAI-98) 10th Conference on Innovation Applications of Artificial Intelligence (IAAI-98), pages 517–523, Madison, Wisconsin, July 1998.Google Scholar
  4. 4.
    D. Freitag and A. K. McCallum. Information extraction with hmms and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, pages 31–36, Orlando, FL., July 1999.Google Scholar
  5. 5.
    T. Joachims. Text categorization with support vector machines: learning with many relevant features. In Proceedings of the 10th European Conference on Machine Learning, pages 137–142, Chemnitz, DE, 1998.Google Scholar
  6. 6.
    T. Joachims. Making large-scale svm learning practical. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods-Support Vector Learning. MIT-Press, 1999.Google Scholar
  7. 7.
    J.-T. Kim and D. I. Moldovan. Acquisition of linguistic patterns for knowledgebased information extraction. IEEE Transaction on Knowledge and Data Engineering, 7(5):713–724, 1995.CrossRefGoogle Scholar
  8. 8.
    MUC. Proceedings of the 4th message understanding conference (muc-4), 1992.Google Scholar
  9. 9.
    I. Muslea. Extraction patterns for information extraction tasks: A survey. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, pages 1–6, Orlando, Florida, July 1999.Google Scholar
  10. 10.
    D. D. Palmer and M. A. Hearst. Adaptive sentence boundary disambiguation. In Proceedings of the 4th Conference on Applied Natural Language Processing, pages 78–83, Stuttgart, Germany, October 1994.Google Scholar
  11. 11.
    E. Riloff. Automatically generating extraction patterns from untagged text. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 1044–1049, Portland, Oregon, 1996.Google Scholar
  12. 12.
    E. Riloff. An empirical study of automated dictionary construction for information extraction in three domains. Artificial Intelligence, 85(1–2):101–134, 1996.CrossRefGoogle Scholar
  13. 13.
    E. Riloff and R. Jones. Learning dictionaries for information extraction by multi-level boot-strapping. In Proceedings of the 16th National Conference on Artificial Intelligence, pages 1044–1049, 1999.Google Scholar
  14. 14.
    D. Sleator and D. Temperley. Parsing english with a link grammar. Technical Report CMU-CS-91-196, Computer Science, Carnegie Mellon University, October 1991.Google Scholar
  15. 15.
    S. Soderland. Learning information extraction rules for semi-structured and free text. Journal of Machine Learning, 34(1–3):233–272, 1999.zbMATHCrossRefGoogle Scholar
  16. 16.
    V. N. Vapnik. The nature of statistical learning theory. Springer Verlag, Heidelberg, DE, 1995.zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  1. 1.Centre for Advanced Information Systems, School of Computer EngineeringNanyang Technological UniversitySingaporeSingapore
  2. 2.Department of Systems Engineering and Engineering ManagementChinese University of Hong KongShatin, New TerritoriesHong Kong SAR

Personalised recommendations